Shadows need to be cast across instructions, to preserve instruction order, however it is quite detailed and needs two refcounts, one for registers to protect inorder writes to the regfile, and another for the MemRefs. Also if using CSR Dependency Management, one may be needed there, too.
Detailed writeup here
The proposed idea has a problem: the counter compares are effectively CAM compares. Multiple AND gates to check that the high bits are nonzero.
This for every single FU, it is just too expensive.
An alternative idea is to preallocate, at instruction issue time, the register write port to which the instruction will commit its result.
This by having a separate set of shadows for the write ports.
This unfortunately has the side effect of reducing parallelism as the instructions may complete well before each other, and, if allowed to commit via another port, clearly would allow forward progress that was otherwise missed.
Hopefully operand forwarding will mitigate this somewhat.
There must be better alternatives.
Perhaps even whilst the allocations are not to ports per se, they may be at least identification of "candidates for selection to commit to any port".
Or, perhaps, it is not a problem after all. Instruction order is to be preserved. If one shadow bank is not ready, allowing another to commit would mean getting out of order.
Needs more thought.
Documenting alternative idea
Inter bank shadows and intra bank shadows.
Intra bank is stripes, inter bank is dropped only when last numbered intra bank shadow is dropped.
Another possible algorithm:
* round robin insertion into shadow stripes
* round robin examination of shadow stripe retirement.
The retirement index of which shadow stripe has no shadows left, as it is round robin, may correspond directly with the write port to the regfile.
This in turn would mean that where previously it would have been necessary to have a recursive multi priority picker, it might be possible to have simpler straight unary pickers.
The round robin retirement selector will need to be capable of recognising multiple simultaneous retirements.
bitsmod[i]= retiring[(i + idx)mod 4]
if bitsmod and ~bitsmod:
idx_inc = 1
elif bitsmod and bitsmod and ~bitsmod:
idx_inc = 2
elif bitsmod and bitsmod and
bitsmod and ~bitsmod:
idx_inc = 3
idx_inc= 4# to be modulo 4
Nonzero test on idx_inc determines if commit goes ahead.
Actual inc on round robin is idx_inc mod 4
In this way we know that out of 4 possible commits rhe order will be maintained, plus we can detect up to 4 simultaneous commits in one cycle.
The multi priority picker may still be needed because we still do not know which FU is writable. Unless... the shadow stripes are only presenting one per port?
This might actually work!
An augmentation of this concept, if the number of instructions issued does not equal the number of write ports.
The round robin index to be inserted into each stripe can be modulo the number of write ports.
2 instructions issued, 6 stripes, 4 write ports, the target write port numbers to use are 0 into stripe 0, 1 into stripe 1.
On the next clock, 4 issued: 2 into stripe 2, 3 into stripe 3, 0 into stripe 4, 1 into stripe 5.
And do on.