* DONE: power_enum Function converts to unary values (1/2/4/8...) * DONE: CSV files "Function" column sets multiple bits * TODO: ALU / Logical / etc. (as appropriate) covers *MULTIPLE* operations (switch statements), with duplicated logic. this gives parallel operations and more opportunities for operations to wait in line.
On 5/14/20 10:43 AM, Luke Kenneth Casson Leighton wrote: > btw, you'll like this: because we are doing a parallel processor, > different Function Units *can* actually cover multiple tasks let's > not go into that right now, however if we convert power_enum Function > into an unary field (1<<0, 1<<1, 1<<2) then we can, in the CSV files, > set multiple bits. > > a good example would be AND and OR and XOR: those are prime candidates > to put into *both* the Logical *and* Arithmetic pipelines. recorded > it for now: > https://bugs.libre-soc.org/show_bug.cgi?id=310 So this is sort of like what intel has on their processors. More complicated instructions can only execute on one port, but simple and frequently used ones can run on several.
(In reply to Michael Nolan from comment #1) > On 5/14/20 10:43 AM, Luke Kenneth Casson Leighton wrote: > > a good example would be AND and OR and XOR: those are prime candidates > > to put into *both* the Logical *and* Arithmetic pipelines. recorded > > it for now: > > https://bugs.libre-soc.org/show_bug.cgi?id=310 > > > So this is sort of like what intel has on their processors. More complicated > instructions can only execute on one port, but simple and frequently used > ones can run on several. yees, exaaactly :) the Function Unit tracking (array of input latches plus corresponding output latches) are there not just to be the front-ends to multiple parallel copies of simple operations: they're there in case the complicated ones start backing up. an in-order processor is forced to stall if the complicated pipelines take too long. even in microwatt, you can see "if operation == MUL or DIV, stall". we can continue shoving out simple operations into *multiple* Function Units, some of those, the ones that do not depend on the output of the complicated instructions, can actually begin execution immediately. however for the ones that *do* depend on the complicated ones we *continue to shove them into Function Unit buffers/latches*. the more FU latches available, the more we can "run ahead". only when there is no FU latch available, *only then* does execution stall. the problem comes if an FU Group has multiple functions, some of which can cause blockage, some which do not. if that FU Group gets blocked and entirely full (all FU register latches "waiting"), the next operation *will* cause an execution stall. to solve that: all you do is... make it possible for *another* FU Group to do the exact same operation. for simple operations this is perfectly fine. this is why in an early iteration of LD/ST Comp Unit, it was capable of performing ADD *and* it was capable of doing LD/ST. however POWER9 ADD is a little more complex than just "ADD", so it wasn't appropriate to add the carry-in/carry-out/ov/so to LD/ST Comp Unit just for that.
a good one to try here is to add logical AND/OR/XOR to the *ALU* pipeline. this should be very simple however it is the unit tests (and formal correctness proof) that will need most alteration these: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/logical/main_stage.py;hb=HEAD#l49 to be added here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/main_stage.py;hb=HEAD#l58