at the L0 Cache/Buffer level, and in the address-match matrix, a massive array of XOR gates (comparators) is needed. this is a huge power drain. however back at the vector-issue phase, information is known between multiple units: there is a very high probability that related element addresses will not have changed (especially on element-strided LD/STs) and, furthermore, it is very easy to detect. https://groups.google.com/d/msg/comp.arch/cbGAlcCjiZE/IDhmQPS6AAAJ therefore as an enhancement, when Vector LD/STs are involved, provide additional information down to the address-match and L0 cache/buffer that saves huge amounts of power.