PowerDecoder2 needs to be able to understand SVP64, particularly register numbers (isvec). also the "modes" need sub-decoding, and predicate selection etc. * Reg EXTRA: done except out2 * CR EXTRA: done * SPR EXTRA: TODO * Predicate selection: TODO * Element-width overrides: TODO * Mode decoding incl. LDST: done, testing TODO
commit 63aeeaa31a60065b03421d3a5497327078d0b0e8 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Jan 30 00:17:20 2021 +0000 add first SVP64 7-bit register context decoder to PowerDecoder2
commit 982a3a872f8969ab61e9f1c42194e1522be38de9 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Jan 30 00:36:22 2021 +0000 add SVP64 EXTRA decoding to RB, RC and RT (out) in PowerDecode2 DecodeOut2 will have to wait because it is more complex Cesar i have the INT registers in the 3 input columns done, and one output, but not the 2nd output yet (LDST-with-update), or the CRs.
commit b90ce1976820244dbd710d2c612933db7d5eece9 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Jan 30 13:55:55 2021 +0000 add SVP64 CR EXTRA field-extension, from 3-bit to 7-bit (plus isvec) in PowerDecoder2 added CR incoming register extending, CR outgoing is next. test_issuer.py is still working fine.
moved CR EXTRA into PowerDecoder2 so that tsatellite decoders do not have unnecessary copies of SVP64 decode modules.
The augmented decoder will stay stateless (purely combinatorial) right? So, it will need both the 32-bit prefix and the 32-bit suffix at the same time, correct? Or, will it be split in two stages, so you first decode the prefix (if any), then you take the result and use it to post-process the result of the scalar decoder?
(In reply to Cesar Strauss from comment #5) > The augmented decoder will stay stateless (purely combinatorial) right? yes absolutelyn > it will need both the 32-bit prefix and the 32-bit suffix at the same time, > correct? yes. at the moment the only augmentation needed is EXTRA2/3 fields. however later in the future certain combinations of vec2/3/4 will cause DIFFERENT sub-operations. for example CROSSPRODUCT, CORDIC with compkex numbers, also and especially the mapreduce modes. > Or, will it be split in two stages, so you first decode the prefix (if any), yes > then you take the result and use it to post-process the result of the scalar > decoder? exactly. you can see i have started this process in ISACaller https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=7730ce198d8d70a4db02a80ab54c0450d678b6b2;hb=9f19947c9887e61f66247ee1ce82ae60bedaf3c6#l611 i could have used PowerDecoder2 to do that task, by adding a CSV file (major1.csv) entry plus a NNN-Form plus some fields. but, to be honest, when we get to multi-issue, PowerDecoder2 is total overkill, it is better to have a separate vastly simpler SVP64 prefix identifier system. we discussed that a few months back on the Compressed bug and jacob came up with a carry-propagation algorithm for multi-issue
(In reply to Cesar Strauss from comment #5) > Or, will it be split in two stages, so you first decode the prefix (if any), > then you take the result and use it to post-process the result of the scalar > decoder? first thing: identify the prefix using this: https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=9cc04f05fff07d38c685614190007e107ee8b891 then if that is successfully identified as an svp64 instruction, pass in the next 32 bits *and* the 24-bit ReMap into PowerDecoder2. https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;h=2f6c0bdec572db0ab605e83087ec7b72758e704c;hb=9cc04f05fff07d38c685614190007e107ee8b891#l793 now, in theory this could be done in 1 clock cycle, with some MUXes. but for the FSM it is perfectly fine to take more. note however: * the SVP64PowerDecoder2 is used in the *first* FSM (simply to identify "is this instruction 32 or 64 bit"). - if it identifies an svp64 prefix it stores the 24-bit ReMap field in a latch, then reads *another* 32 bits * PowerDecoder2 is used in the *second* FSM, receiving zero in the RM field if the *first* FSM identified a 32-bit operation. first FSM reads from instruction fetch and identifies length. second FSM does decode-and-execute *only*. but, long before that is done, the split into two FSMs, and processing of 32-bit instructions *only*, must be carried out. no involvement of svp64 at all.
mode decoder here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_svp64_rm.py;hb=HEAD mostly recognises the differences between standard RM Mode, LDST-immediate and LDST-indexed.