* DONE: power_enum Function converts to unary values (1/2/4/8...)
* DONE: CSV files "Function" column sets multiple bits
* TODO: ALU / Logical / etc. (as appropriate) covers *MULTIPLE* operations
(switch statements), with duplicated logic.
this gives parallel operations and more opportunities for operations
to wait in line.
On 5/14/20 10:43 AM, Luke Kenneth Casson Leighton wrote:
> btw, you'll like this: because we are doing a parallel processor,
> different Function Units *can* actually cover multiple tasks let's
> not go into that right now, however if we convert power_enum Function
> into an unary field (1<<0, 1<<1, 1<<2) then we can, in the CSV files,
> set multiple bits.
> a good example would be AND and OR and XOR: those are prime candidates
> to put into *both* the Logical *and* Arithmetic pipelines. recorded
> it for now:
So this is sort of like what intel has on their processors. More complicated instructions can only execute on one port, but simple and frequently used ones can run on several.
(In reply to Michael Nolan from comment #1)
> On 5/14/20 10:43 AM, Luke Kenneth Casson Leighton wrote:
> > a good example would be AND and OR and XOR: those are prime candidates
> > to put into *both* the Logical *and* Arithmetic pipelines. recorded
> > it for now:
> > https://bugs.libre-soc.org/show_bug.cgi?id=310
> So this is sort of like what intel has on their processors. More complicated
> instructions can only execute on one port, but simple and frequently used
> ones can run on several.
yees, exaaactly :)
the Function Unit tracking (array of input latches plus corresponding
output latches) are there not just to be the front-ends to multiple
parallel copies of simple operations: they're there in case the complicated
ones start backing up.
an in-order processor is forced to stall if the complicated pipelines take
too long. even in microwatt, you can see "if operation == MUL or DIV, stall".
we can continue shoving out simple operations into *multiple* Function Units,
some of those, the ones that do not depend on the output of the
complicated instructions, can actually begin execution immediately.
however for the ones that *do* depend on the complicated ones we *continue
to shove them into Function Unit buffers/latches*.
the more FU latches available, the more we can "run ahead".
only when there is no FU latch available, *only then* does execution stall.
the problem comes if an FU Group has multiple functions, some of which can
cause blockage, some which do not. if that FU Group gets blocked and entirely
full (all FU register latches "waiting"), the next operation *will* cause
an execution stall.
to solve that: all you do is... make it possible for *another* FU Group
to do the exact same operation.
for simple operations this is perfectly fine.
this is why in an early iteration of LD/ST Comp Unit, it was capable of
performing ADD *and* it was capable of doing LD/ST. however POWER9 ADD
is a little more complex than just "ADD", so it wasn't appropriate to
add the carry-in/carry-out/ov/so to LD/ST Comp Unit just for that.
a good one to try here is to add logical AND/OR/XOR to the
*ALU* pipeline. this should be very simple however it is
the unit tests (and formal correctness proof) that will
need most alteration
to be added here: