Bug 588 - add SVP64 to PowerDecoder2
Summary: add SVP64 to PowerDecoder2
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL: https://libre-soc.org/openpower/sv/im...
Depends on:
Blocks: 583 617
  Show dependency treegraph
 
Reported: 2021-01-30 00:20 GMT by Luke Kenneth Casson Leighton
Modified: 2021-03-17 13:21 GMT (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-01-30 00:20:14 GMT
PowerDecoder2 needs to be able to understand SVP64, particularly register
numbers (isvec).  also the "modes" need sub-decoding, and predicate
selection etc.

* Reg EXTRA: done except out2
* CR EXTRA: done
* SPR EXTRA: TODO
* Predicate selection: TODO
* Element-width overrides: TODO
* Mode decoding incl. LDST: done, testing TODO
Comment 1 Luke Kenneth Casson Leighton 2021-01-30 00:22:54 GMT
commit 63aeeaa31a60065b03421d3a5497327078d0b0e8 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jan 30 00:17:20 2021 +0000

    add first SVP64 7-bit register context decoder to PowerDecoder2
Comment 2 Luke Kenneth Casson Leighton 2021-01-30 00:38:33 GMT
commit 982a3a872f8969ab61e9f1c42194e1522be38de9 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jan 30 00:36:22 2021 +0000

    add SVP64 EXTRA decoding to RB, RC and RT (out) in PowerDecode2
    DecodeOut2 will have to wait because it is more complex

Cesar i have the INT registers in the 3 input columns done, and one
output, but not the 2nd output yet (LDST-with-update), or the CRs.
Comment 3 Luke Kenneth Casson Leighton 2021-01-30 14:00:55 GMT
commit b90ce1976820244dbd710d2c612933db7d5eece9 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jan 30 13:55:55 2021 +0000

    add SVP64 CR EXTRA field-extension, from 3-bit to 7-bit (plus isvec)
    in PowerDecoder2

added CR incoming register extending, CR outgoing is next.  test_issuer.py
is still working fine.
Comment 4 Luke Kenneth Casson Leighton 2021-01-30 21:21:27 GMT
moved CR EXTRA into PowerDecoder2 so that tsatellite decoders do not have unnecessary copies of SVP64 decode modules.
Comment 5 Cesar Strauss 2021-02-03 10:37:19 GMT
The augmented decoder will stay stateless (purely combinatorial) right? So, it will need both the 32-bit prefix and the 32-bit suffix at the same time, correct?

Or, will it be split in two stages, so you first decode the prefix (if any), then you take the result and use it to post-process the result of the scalar decoder?
Comment 6 Luke Kenneth Casson Leighton 2021-02-03 12:31:38 GMT
(In reply to Cesar Strauss from comment #5)
> The augmented decoder will stay stateless (purely combinatorial) right? 

yes absolutelyn

> it will need both the 32-bit prefix and the 32-bit suffix at the same time,
> correct?

yes.  at the moment the only augmentation needed is EXTRA2/3 fields.

however later in the future certain combinations of vec2/3/4 will cause DIFFERENT sub-operations.

for example CROSSPRODUCT, CORDIC with compkex numbers, also and especially the mapreduce modes.
 
> Or, will it be split in two stages, so you first decode the prefix (if any),

yes

> then you take the result and use it to post-process the result of the scalar
> decoder?

exactly.  you can see i have started this process in ISACaller

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=7730ce198d8d70a4db02a80ab54c0450d678b6b2;hb=9f19947c9887e61f66247ee1ce82ae60bedaf3c6#l611

i could have used PowerDecoder2 to do that task, by adding a CSV file (major1.csv) entry plus a NNN-Form plus some fields.

but, to be honest, when we get to multi-issue, PowerDecoder2 is total overkill, it is better to have a separate vastly simpler SVP64 prefix identifier system.

we discussed that a few months back on the Compressed  bug and jacob came up with a carry-propagation algorithm for multi-issue
Comment 7 Luke Kenneth Casson Leighton 2021-02-03 21:33:43 GMT
(In reply to Cesar Strauss from comment #5)

> Or, will it be split in two stages, so you first decode the prefix (if any),
> then you take the result and use it to post-process the result of the scalar
> decoder?

first thing: identify the prefix using this:
https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=9cc04f05fff07d38c685614190007e107ee8b891

then if that is successfully identified as an svp64 instruction, pass in
the next 32 bits *and* the 24-bit ReMap into PowerDecoder2.

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;h=2f6c0bdec572db0ab605e83087ec7b72758e704c;hb=9cc04f05fff07d38c685614190007e107ee8b891#l793

now, in theory this could be done in 1 clock cycle, with some MUXes. but for the FSM it is perfectly fine to take more.

note however:

* the SVP64PowerDecoder2 is used in the *first* FSM (simply to identify
  "is this instruction 32 or 64 bit").

  - if it identifies an svp64 prefix it stores the 24-bit ReMap field
    in a latch, then reads *another* 32 bits

* PowerDecoder2 is used in the *second* FSM, receiving zero in the RM
  field if the *first* FSM identified a 32-bit operation.

first FSM reads from instruction fetch and identifies length.

second FSM does decode-and-execute *only*.


but, long before that is done, the split into two FSMs, and processing of 32-bit instructions *only*, must be carried out.  no involvement of svp64 at all.
Comment 8 Luke Kenneth Casson Leighton 2021-03-07 16:15:05 GMT
mode decoder here:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_svp64_rm.py;hb=HEAD

mostly recognises the differences between standard RM Mode, LDST-immediate and LDST-indexed.