Bug 310 - Function Units to cover multiple tasks
Summary: Function Units to cover multiple tasks
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: Lowest enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 356
Blocks:
  Show dependency treegraph
 
Reported: 2020-05-14 15:37 BST by Luke Kenneth Casson Leighton
Modified: 2020-05-29 14:42 BST (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-05-14 15:37:09 BST
* DONE: power_enum Function converts to unary values (1/2/4/8...)
* DONE: CSV files "Function" column sets multiple bits
* TODO: ALU / Logical / etc. (as appropriate) covers *MULTIPLE* operations
  (switch statements), with duplicated logic.

this gives parallel operations and more opportunities for operations
to wait in line.
Comment 1 Michael Nolan 2020-05-14 15:47:48 BST
On 5/14/20 10:43 AM, Luke Kenneth Casson Leighton wrote:
> btw, you'll like this: because we are doing a parallel processor,
> different Function Units *can* actually cover multiple tasks   let's
> not go into that right now, however if we convert power_enum Function
> into an unary field (1<<0, 1<<1, 1<<2) then we can, in the CSV files,
> set multiple bits.
>
> a good example would be AND and OR and XOR: those are prime candidates
> to put into *both* the Logical *and* Arithmetic pipelines.  recorded
> it for now:
> https://bugs.libre-soc.org/show_bug.cgi?id=310


So this is sort of like what intel has on their processors. More complicated instructions can only execute on one port, but simple and frequently used ones can run on several.
Comment 2 Luke Kenneth Casson Leighton 2020-05-14 15:59:21 BST
(In reply to Michael Nolan from comment #1)
> On 5/14/20 10:43 AM, Luke Kenneth Casson Leighton wrote:
> > a good example would be AND and OR and XOR: those are prime candidates
> > to put into *both* the Logical *and* Arithmetic pipelines.  recorded
> > it for now:
> > https://bugs.libre-soc.org/show_bug.cgi?id=310
> 
> 
> So this is sort of like what intel has on their processors. More complicated
> instructions can only execute on one port, but simple and frequently used
> ones can run on several.

yees, exaaactly :)

the Function Unit tracking (array of input latches plus corresponding
output latches) are there not just to be the front-ends to multiple
parallel copies of simple operations: they're there in case the complicated
ones start backing up.

an in-order processor is forced to stall if the complicated pipelines take
too long.  even in microwatt, you can see "if operation == MUL or DIV, stall".

we can continue shoving out simple operations into *multiple* Function Units,
some of those, the ones that do not depend on the output of the
complicated instructions, can actually begin execution immediately.

however for the ones that *do* depend on the complicated ones we *continue
to shove them into Function Unit buffers/latches*.

the more FU latches available, the more we can "run ahead".

only when there is no FU latch available, *only then* does execution stall.


the problem comes if an FU Group has multiple functions, some of which can
cause blockage, some which do not.  if that FU Group gets blocked and entirely
full (all FU register latches "waiting"), the next operation *will* cause
an execution stall.

to solve that: all you do is... make it possible for *another* FU Group
to do the exact same operation.

for simple operations this is perfectly fine.

this is why in an early iteration of LD/ST Comp Unit, it was capable of
performing ADD *and* it was capable of doing LD/ST.  however POWER9 ADD
is a little more complex than just "ADD", so it wasn't appropriate to
add the carry-in/carry-out/ov/so to LD/ST Comp Unit just for that.
Comment 3 Luke Kenneth Casson Leighton 2020-05-29 14:42:00 BST
a good one to try here is to add logical AND/OR/XOR to the
*ALU* pipeline.  this should be very simple however it is
the unit tests (and formal correctness proof) that will
need most alteration

these:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/logical/main_stage.py;hb=HEAD#l49

to be added here:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/main_stage.py;hb=HEAD#l58