like memmove, but for registers. It handles cases where (for example), the register allocator allocates a vector to registers r20-40, then later needs to move it to r10-30. Copying to a temporary location isn't a good solution since there may not be enough spare registers and load/store is much slower.