Bug 583 - Implement simple VL for-loop in nMigen for TestIssuer
Summary: Implement simple VL for-loop in nMigen for TestIssuer
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Cesar Strauss
URL: https://libre-soc.org/openpower/sv/im...
Depends on: 588 609 639
Blocks:
  Show dependency treegraph
 
Reported: 2021-01-26 11:47 GMT by Cesar Strauss
Modified: 2021-08-06 09:42 BST (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cesar Strauss 2021-01-26 11:47:40 GMT
The goal is to implement the following pseudo-code

   for i = 0 to VL-1:
       GPR(RT+i) = GPR(RA+i) + GPR(RB+i)

The tasks are:

1) DONE add SVSTATE as an SPR

2) DONE augment PowerDecoder to decode the SVP64 prefix

3) DONE augment the TestIssuer FSM to:

    a) if a SVP64 prefix was decoded, read and decode the next word, before really issuing the instruction

    b) issue the instruction and wait for completion.

    c) If any source or destination was a vector, keep issuing the instruction and increment SVSTATE.srcstep. Break when it reaches VL-1

    d) set SVSTATE.srcstep to zero again


4) DONE modify the read/write point in the regfile to add SVSTATE.srcstep to
the reg number

5) TODO add SETVL instruction

6) TODO figure out how to deal with exceptions in the middle of a vector instruction:

   a) save the SVSTATE before entering an exception.

   b) when exiting, restore SVSTATE, while setting the PC to the current instruction, so it's decoded and issued again.

7) TODO modify the decoder to spot when an illegal SVP64 combination occurs.  these are MTMSR MFMSR setvl all branch operations at least.

8) TODO have the issue FSM do VL=0 detection to skip instructions
Comment 1 Cesar Strauss 2021-01-26 12:14:29 GMT
(In reply to Cesar Strauss from comment #0)
> 4) modify the read/write point in the regfile to add SVSTATE.srcstep to
> the reg number

Also: increase the the number of registers in the register file to 128.
Comment 2 Cesar Strauss 2021-01-26 12:52:56 GMT
(In reply to Cesar Strauss from comment #0)
>     a) if a SVP64 prefix was decoded, read and decode the next word, before
> really issuing the instruction

Also, if VL is zero, do not issue any vector instruction, just keep fetching the next instruction.
Comment 3 Luke Kenneth Casson Leighton 2021-01-26 15:15:56 GMT
(In reply to Cesar Strauss from comment #0)

> 6) figure out how to deal with exceptions in the middle of a vector
> instruction:

this *may* require (read: is going to) adding an SRR2 into which SVSTATE is copied/transferred.

this is because SVSTATE contains, in effect, a Sub-Program-Counter.

PC is saved and swapped in SRR0... therefore because SVSTATE is a Sub-PC... you see how that works.


>    a) save the SVSTATE before entering an exception.

in SRR2.

>    b) when exiting, restore SVSTATE, while setting the PC to the current
> instruction, so it's decoded and issued again.

from SRR2.

actually we should probably call it SVSRR0.  i will document this in the SPRs page.
https://libre-soc.org/openpower/sv/sprs/

done.
Comment 4 Luke Kenneth Casson Leighton 2021-01-26 17:24:56 GMT
Cesar, the Power Decoder works by reading CSV files, simply passing a list of dictionaries to each PowerDecoder instance:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder.py;h=028de9b3b0f78cc488096ff74a2c57a458a75f77;hb=c9a4beb81e9c11a3ac70494dcf64850d8c3a6d1e#l518

however the svp64 "EXTRA" fields, as you can see in svp64.py, are organised differently, keyed by the instruction (the comment field of v3.0B).

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/trans/svp64.py;h=4955df75f7703fdab954fb814391ad4eab54326e;hb=c9a4beb81e9c11a3ac70494dcf64850d8c3a6d1e#l133

my feeling here is that get_csv should be replaced in power_decoder.py with a function that "merges" the extra fields by looking up the instruction name and simply dropping in the RM fields.

rather than modifying the CSV files which are already long.

what do you think?
Comment 5 Luke Kenneth Casson Leighton 2021-01-27 03:56:40 GMT
(In reply to Cesar Strauss from comment #2)
> (In reply to Cesar Strauss from comment #0)
> >     a) if a SVP64 prefix was decoded, read and decode the next word, before
> > really issuing the instruction

yes.  basically, with a FSM though you have to be careful.  the actual PC is at the EXT001 (the prefix) not the suffix part!

but you still have to *read* both the prefix and suffix.

one possible solution is to have two FSMs, one for reading instructions, one for processing them.  they communicate by signals to say "start" and "done" just like ready/valid.

that way, passing 64 bit or 32 bit it is all the same.

> Also, if VL is zero, do not issue any vector instruction, just keep fetching
> the next instruction.

yes, because the for loop is from 0 to 0.

of course if it is a v3.0B 32bit instruction VL does not apply so will always be executed.
Comment 6 Luke Kenneth Casson Leighton 2021-01-27 03:58:16 GMT
(In reply to Cesar Strauss from comment #1)

> Also: increase the the number of registers in the register file to 128.

this we should do as a separate step, it is quite disruptive.  increases SRAM BRAM or DFF size a LOT.
Comment 7 Luke Kenneth Casson Leighton 2021-01-27 12:52:59 GMT
commit 085b986713680d8a64448c85a3d82f2b7fea4732 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Jan 27 12:49:05 2021 +0000

    provide "merger" of SVP64 RM info into v3.0B CSV files

i did the merging part, it's not in use yet.  also it is not
re-encoded to Power Enums, that is a separate step.
Comment 8 Luke Kenneth Casson Leighton 2021-01-27 16:07:26 GMT
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_svp64.py;hb=HEAD

hmm hmm as-is this is not very useful.

what is needed is to be able to get bits from EXTRA2/3 fields and say "this one extends RA".

right now it is inverted, key us value, value is key.

a bit more post-processing is needed, examining the v3.0B fields IN1, IN2, etc, to create:

SVP64_IN1, SVP64_IN2

and for those fields to contain the index 0, 1, 2, 3 pointing to which bits of RM to read.
Comment 9 Luke Kenneth Casson Leighton 2021-01-27 18:21:21 GMT
ok if you run this:
lkcl@fizzy:~/src/libresoc/soc/src/soc$ python3 decoder/power_svp64.py 

OrderedDict([('opcode', '0b1001'), ('unit', 'SHIFT_ROT'), ('internal op', 'OP_RLCR'), ('in1', 'NONE'), ('in2', 'RB'), ('in3', 'RS'), ('out', 'RA'), ('CR in', 'NONE'), ('CR out', 'CR0'), ('inv A', '0'), ('inv out', '0'), ('cry in', 'ZERO'), ('cry out', '0'), ('ldst len', 'NONE'), ('BR', '0'), ('sgn ext', '0'), ('upd', '0'), ('rsrv', '0'), ('32b', '0'), ('sgn', '0'), ('rc', 'RC'), ('lk', '0'), ('sgl pipe', '0'), ('comment', 'rldcr'), ('form', 'MD'), ('EXTRA0', 'd:RA;d:CR0'), ('EXTRA1', 's:RB'), ('EXTRA2', 's:RS'), ('EXTRA3', '0'), ('SV_Ptype', '1P'), ('SV_Etype', 'EXTRA3'), ('sv_in1', None), ('sv_in2', 1), ('sv_in3', 2), ('sv_out', 0)])

you can see now, the in1/2/3/out fields have been joined by sv_in1/2/3/out
which makes it dead-easy to know, in combination with SV_Etype, where to go
to get the SVP64 "extra" fields for a given register RA/B/C/RS/RT.

CRs are next

commit 41f1bc2c070cc47d1990955d60c899ad8fab88bb (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Jan 27 18:16:47 2021 +0000

    move svp64 reg-decode function to more appropriate location
    use it in SVP64RM get_svp64_csv to decode EXTRA bit-field positions
Comment 10 Luke Kenneth Casson Leighton 2021-01-27 18:39:40 GMT
excellent.  this is crxor.  see how sv_cr_in = [1,2] and sv_cr_out = 0
this is decoded from EXTRA0=d:BT, EXTRA1=s:BA, EXTRA2=s:BB


OrderedDict([('opcode', '0b0011000001'), ('unit', 'CR'), ('internal op', 'OP_CROP'), ('in1', 'NONE'), ('in2', 'NONE'), ('in3', 'NONE'), ('out', 'NONE'), ('CR in', 'BA_BB'), ('CR out', 'BT'), ('inv A', '0'), ('inv out', '0'), ('cry in', 'ZERO'), ('cry out', '0'), ('ldst len', 'NONE'), ('BR', '0'), ('sgn ext', '0'), ('upd', '0'), ('rsrv', '0'), ('32b', '0'), ('sgn', '0'), ('rc', 'NONE'), ('lk', '0'), ('sgl pipe', '0'), ('comment', 'crxor'), ('form', 'XL'), ('EXTRA0', 'd:BT'), ('EXTRA1', 's:BA'), ('EXTRA2', 's:BB'), ('EXTRA3', '0'), ('SV_Ptype', '1P'), ('SV_Etype', 'EXTRA3'), ('sv_in1', None), ('sv_in2', None), ('sv_in3', None), ('sv_out', None), ('sv_cr_in', (1, 2)), ('sv_cr_out', 0)])

commit af95c57b2027a2e5effa56f7ce324fd696216de4 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Jan 27 18:38:34 2021 +0000

    add svp64 CR field identification for EXTRA2/3 decoding
Comment 11 Luke Kenneth Casson Leighton 2021-01-27 22:46:14 GMT
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;h=e4ed80b18138c3d713538aaf9221ea40f2b2a24e;hb=refs/heads/master#l47

cesar if you can add SVSTATE to that list?  and also correct the comment, it should be 3 entries, whoops :)

it is currently 4 long, DEC and TB are in fastregs, now.

PC=0
MSR=1
SVSTATE=2 

and, at lines 52/54 add an extra rd and extra wr port, call one w_sv the other r_sv

then, here:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=d662dff4c20e7904e6ca7cb3ae2ceaa6f03b6e06;hb=refs/heads/master#l110

ta-daaa, you can get those ports, w_sv and r_sv, and now you can read and write SVSTATE.
Comment 12 Luke Kenneth Casson Leighton 2021-01-27 22:48:16 GMT
https://libre-soc.org/openpower/sv/sprs/

also a Record is needed for SVSTATE according to the bitfields in sprs page above.  SVStateRecord
Comment 13 Luke Kenneth Casson Leighton 2021-01-28 16:04:08 GMT
(In reply to Luke Kenneth Casson Leighton from comment #12)
> https://libre-soc.org/openpower/sv/sprs/
> 
> also a Record is needed for SVSTATE according to the bitfields in sprs page
> above.  SVStateRecord

commit 80fbef229a74420ab5708c32432938cedfbc5e0b (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Jan 28 15:53:50 2021 +0000

    add SVState SPR Record, SVSTATERec

done.  Cesar basically i am putting "pieces" in place.  joining them
together once the pieces are there is then much easier.
Comment 14 Luke Kenneth Casson Leighton 2021-01-28 16:08:47 GMT
(In reply to Luke Kenneth Casson Leighton from comment #11)
> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;
> h=e4ed80b18138c3d713538aaf9221ea40f2b2a24e;hb=refs/heads/master#l47
> 
> cesar if you can add SVSTATE to that list?  and also correct the comment, it
> should be 3 entries, whoops :)

done.  i might as well :)  not added into TestIssuer though.

commit 116c0f476db4984306c3547565fbe77dcc4cf800 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Jan 28 16:07:29 2021 +0000

    add SVSTATE to StateRegs
    (also fix comments)
Comment 15 Luke Kenneth Casson Leighton 2021-01-29 11:34:35 GMT
very trivial change here to use the augmented get_csv.  slowly getting there.

commit a044ebddf465b15ae4ee567f8ed3ca2417ebe20b (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Jan 29 11:33:22 2021 +0000

    use new svp64-augmented csv reader in PowerDecoder
Comment 16 Luke Kenneth Casson Leighton 2021-01-30 00:49:24 GMT
Cesar would you like to plan / draw out a FSM which splits TestIssuer into
2 FSMs?

the first FSM should be IDLE and INSN_READ, to be joined by INSN_WAIT

the second FSM should be what is currently INSN_START and INSN_ACTIVE.
these will need to be joined in this 2nd FSM by an IDLE state which
the first FSM triggers.

with ready/valid signalling it should be possible to get FSM1 to trigger
FSM2 to move to the new state, and likewise FSM2 to trigger FSM1 to move
back to IDLE.

the first phase is *not* to add SV support straight away.  the idea
is to change the *v3.0B* TestIssuer to 2 FSMs, with no change in behaviour.
v3.0B instruction read/issue *only*.

then once that is done it becomes much easier to get the 1st FSM to read
64-bit instructions, than it is right now.

one other nice thing is that the two FSMs will allow us to do pipelined versions later, much more easily.
Comment 17 Cesar Strauss 2021-01-30 17:50:00 GMT
(In reply to Luke Kenneth Casson Leighton from comment #16)
> Cesar would you like to plan / draw out a FSM which splits TestIssuer into
> 2 FSMs?

Sure, I'm on it.
Comment 18 Luke Kenneth Casson Leighton 2021-01-30 18:04:12 GMT
(In reply to Cesar Strauss from comment #17)
> (In reply to Luke Kenneth Casson Leighton from comment #16)
> > Cesar would you like to plan / draw out a FSM which splits TestIssuer into
> > 2 FSMs?
> 
> Sure, I'm on it.

star.  it should be a very easy code-morph, one that, strictly speaking,
it is not necessary to "understand" the code at all!  it is a slightly
weird and uncomfortable feeling that goes away once the unit tests pass :)
i do this all the time.
Comment 19 Cesar Strauss 2021-02-01 10:56:32 GMT
Fixed and enabled a looping test to ease iterative development.

The looping is important to ensure that a PC update on the issue/execute FSM is correctly picked up on the fetch/decode FSM.

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=1fe38e73880e734134bea00fd241ffb2e1a5b836;hb=HEAD#l417
Comment 20 Luke Kenneth Casson Leighton 2021-02-02 08:05:11 GMT
(In reply to Cesar Strauss from comment #0)

>     c) If any source or destination was a vector, keep issuing the
> instruction and increment SVSTATE.srcstep. Break when it reaches VL-1

or if the destination is a scalar.  but only after the instruction is executed.  when predication is added this makes more sense because the loop stops at the first nonzero source predicate bit.
 
>     d) set SVSTATE.srcstep to zero again

and only then allow PC to advance.
Comment 21 Luke Kenneth Casson Leighton 2021-02-02 08:12:59 GMT
(In reply to Cesar Strauss from comment #19)
> Fixed and enabled a looping test to ease iterative development.

ah good, i do not know why it was not working.
Comment 22 Cesar Strauss 2021-02-06 21:38:08 GMT
(In reply to Cesar Strauss from comment #17)
> (In reply to Luke Kenneth Casson Leighton from comment #16)
> > Cesar would you like to plan / draw out a FSM which splits TestIssuer into
> > 2 FSMs?
> 
> Sure, I'm on it.

It works!

No regressions that I could see.

I had to add a register between instruction memory and decode, but I think it's a good thing, despite taking an extra cycle.
Comment 23 Luke Kenneth Casson Leighton 2021-02-07 01:16:49 GMT
(In reply to Cesar Strauss from comment #22)

> It works!

ha!  totally awesome!

> No regressions that I could see.
> 
> I had to add a register between instruction memory and decode, but I think
> it's a good thing, despite taking an extra cycle.

yeah not a problem.  we're not focussing on performance at the moment.
there is a way to not need the extra cycle, you get each stage of the
FSM to place the data combinatorially, so that the "users" of that data
have it available immediately.  it does however make things a bit messy.

fascinatingly, if you look at the diff, it's a very clean separation.

ok so _now_ the SVP64PowerDecoder can be added (combinatorially) in
fetch_fsm, which analyses the incoming 32-bit instruction.
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;hb=HEAD#l1284

then if the "is_svp64_mode" is set, take a copy of svp64_rm first (in a register), read another 32 bits, and pass in the svp64_rm into the PowerDecoder2  (self.pdecode2).

that should still do absolutely nothing: zero impact (no regressions)

oh one minor complication: the PC must *not* advance even though reading the next 32 bits.  the PC *must* still point to the start of the 64-bit operation.
this is going to be slightly complicated... actually maybe not, because branches and traps are not possible to SV extend.
Comment 24 Luke Kenneth Casson Leighton 2021-02-11 16:15:33 GMT
commit 7539c4c4b9e76b42b3447d13b8195bafec6cec17 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Feb 11 15:57:25 2021 +0000

    comments in TestIssuer for SVP64PrefixDecoder

Cesar i have added in some TODO comments and the SVP64PrefixDecoder

@@ -22,7 +22,7 @@ from nmigen.cli import main
 import sys
 
 from soc.decoder.power_decoder import create_pdecode
-from soc.decoder.power_decoder2 import PowerDecode2
+from soc.decoder.power_decoder2 import PowerDecode2, SVP64PrefixDecoder
 from soc.decoder.decode2execute1 import IssuerDecode2ToOperand
 from soc.decoder.decode2execute1 import Data
 from soc.experiment.testmem import TestMemory # test only for instructions
@@ -88,6 +88,7 @@ class TestIssuerInternal(Elaboratable):
         self.cur_state = CoreState("cur") # current state (MSR/PC/EINT)
         self.pdecode2 = PowerDecode2(pdecode, state=self.cur_state,
                                      opkls=IssuerDecode2ToOperand)
+        self.svp64 = SVP64PrefixDecoder() # for decoding SVP64 prefix
 
         # Test Instruction memory
         self.imem = ConfigFetchUnit(pspec).fu


SVP64PrefixDecoder should be used combinatorially, it is very basic (5 lines),
to detect the presence of an SVP64 prefix.

if "yes" then the sv_rm field recorded (sync to register) and read another 32 bits insn stream
Comment 25 Cesar Strauss 2021-02-11 23:09:55 GMT
(In reply to Luke Kenneth Casson Leighton from comment #24)
> Cesar i have added in some TODO comments and the SVP64PrefixDecoder

Sure, I'm on it.

It would be interesting to have a test case to test against, like the one you have on test_caller_svp64.py. Does the simulator already work on it?
Comment 26 Luke Kenneth Casson Leighton 2021-02-11 23:15:04 GMT
(In reply to Cesar Strauss from comment #25)

> It would be interesting to have a test case to test against, like the one
> you have on test_caller_svp64.py. Does the simulator already work on it?

that unit test can be split out if done carefully or just duplicated (slightly different comparison needed)

no not yet.  or, it passes because it does only one element at the moment.
Comment 27 Luke Kenneth Casson Leighton 2021-02-12 15:14:53 GMT
(In reply to Cesar Strauss from comment #25)

> It would be interesting to have a test case to test against, like the one
> you have on test_caller_svp64.py. Does the simulator already work on it?

does now.

commit 9078b2935beb4ba89dcd2af91bb5e3a0bcffbe71 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Feb 12 14:41:48 2021 +0000

    add srcstep and correct PC-advancing during Sub-PC looping in ISACaller

i added 3 unit tests:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/test_caller_svp64.py;hb=HEAD

* one all vector
* one dest=scalar src1/2=vector
* one dest=vector src1=scalar src2=vector

strictly speaking for full coverage, QTY 8x tests are required.... per opcode!
that's going to be one hell of a lot of unit tests, particularly for 4-operand
instructions (QTY 16x for all permuations s/v s/v s/v s/v)

i just added this though:

commit 3a940211d59b532fb0b5d06890bbe1eb75cbcd4e (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Feb 12 15:02:43 2021 +0000

    validate all registers to make sure no damage occurs in SVP64 ISACaller

that's to make sure that running SVP64 doesn't damage the regfile.

now you should be able to safely "compare" against the same tests, they have
to be done slightly differently.  note that you'll need to manually set up
SVSTATE.vl/mvl otherwise there's no loop.

if you can start a new test case class, in fu/alu/test/test_pipe_caller.py
SVP64ALUTestCase, and add the same 3 from decoder/isa/test_caller_svp64.py
you *should* be able to just call self.add_case() just remember to establish
the SVSTATE class instance just like in test_caller_svp64.py

now, the only thing is: the test_pipe_caller.py will *not work* with SVP64,
it's not designed to and not intended to.  test_pipe_caller.py is not
connected to regfiles, and it's only TestIssuer that has the decoding
of SVP64.

so maybe putting the unit tests in a different file, fu/alu/test/svp64_cases.py
is probably a good idea?
Comment 28 Luke Kenneth Casson Leighton 2021-02-12 15:24:04 GMT
you'll need this:

commit f67cd9f81a517e3b6e75c58bdc1d0d836fae26b7 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Feb 12 15:23:05 2021 +0000

    add SVSTATE to TestCase infrastructure for use in TestIssuer

and this (not tested):

commit 99c712d52e4a9ff932162118677f228a332f1c01 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Feb 12 15:23:44 2021 +0000

    add one SVP64 ALU test case to get started
Comment 29 Luke Kenneth Casson Leighton 2021-02-12 17:57:43 GMT
Cesar here's where in ISACaller, after decoding (all done already by
PowerDecoder2, now), the srcstep is added on:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=03bf16d2ed98d6746fb6bdbfca2a312db8c9e424;hb=HEAD#l892

you can see further down the write version.  then, even further down,
the "loop" which exits when SVSTATE.srcstep == VL-1

however the equivalent location - and i must apologise for this - is
handled by regspec_decode_read() and regspec_decode_write():

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_regspec_map.py;h=05ff4814e2504e3acbc5924b614031b460f8bd7c;hb=HEAD#l51

that's actually called here, get_byregfiles():
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/core.py;h=d3a66bb1e9916c50277b1d5b1dc2a859707fc90b;hb=HEAD#l493

which returns a dictionary of tuples *kine 516), which in turn gets used by
connect_read/write_ports(), which in turn calls connect_read/write_port()

that tuple is unpacked here:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/core.py;h=d3a66bb1e9916c50277b1d5b1dc2a859707fc90b;hb=HEAD#l388

(also at line 260 for connect_readport)

now, *at the moment* it's relatively straightforward: regspec_decode_read/write could in theory be modified to return a Mux() based on is_vec being set.

     Mux(e.read1_isvec, e.svstate.srcstep + e.read1_data, e.read1_data)

something like that

for elwidth overrides that's going to be a *lot* more complex... but for
now, that Mux should "do the trick"
Comment 30 Cesar Strauss 2021-02-12 21:34:49 GMT
(In reply to Luke Kenneth Casson Leighton from comment #28)
> you'll need this:
> 
> commit f67cd9f81a517e3b6e75c58bdc1d0d836fae26b7 (HEAD -> master)
> Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
> Date:   Fri Feb 12 15:23:05 2021 +0000
> 
>     add SVSTATE to TestCase infrastructure for use in TestIssuer
> 
> and this (not tested):
> 
> commit 99c712d52e4a9ff932162118677f228a332f1c01 (HEAD -> master)
> Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
> Date:   Fri Feb 12 15:23:44 2021 +0000
> 
>     add one SVP64 ALU test case to get started

Please push these changes, thanks.

Great progress on the simulator. It will be invaluable when testing the next steps.

The SVP64 prefix fetch and decode is already working locally. I'll just need to test it a little more, add some comments, polish, etc, before pushing.
Comment 31 Luke Kenneth Casson Leighton 2021-02-12 22:01:20 GMT
(In reply to Cesar Strauss from comment #30)

> Please push these changes, thanks.

whoops

> Great progress on the simulator. It will be invaluable when testing the next
> steps.
> 
> The SVP64 prefix fetch and decode is already working locally. I'll just need
> to test it a little more, add some comments, polish, etc, before pushing.

star.  don't be tempted to also add the reg counting/advancement... yet!
do make sure that test_issuer.py still passes (except that annoying bpermd which is a bug in ISACaller not the hardware)
Comment 32 Cesar Strauss 2021-02-13 22:04:43 GMT
(In reply to Luke Kenneth Casson Leighton from comment #31)
> (In reply to Cesar Strauss from comment #30)
> > The SVP64 prefix fetch and decode is already working locally. I'll just need
> > to test it a little more, add some comments, polish, etc, before pushing.

It works! Pushed.

> do make sure that test_issuer.py still passes (except that annoying bpermd
> which is a bug in ISACaller not the hardware)

I does. It also passes test_issuer_svp64.py, which contains a test with a prefix, but with VL=1 for now.

I added a check in check_regs() at test_core.py to also compare the PC with the simulated PC. This is to check that the instruction size was correctly taken into account when calculating the next instruction address. Could be useful for testing branches as well.
Comment 33 Luke Kenneth Casson Leighton 2021-02-13 22:26:30 GMT
(In reply to Cesar Strauss from comment #32)

> It works! Pushed.

wha-hey!  again, 
 
> > do make sure that test_issuer.py still passes (except that annoying bpermd
> > which is a bug in ISACaller not the hardware)
> 
> I does. It also passes test_issuer_svp64.py, which contains a test with a
> prefix, but with VL=1 for now.

excellent.  next step is the regfile stepping, but first i'd like to make
that a module.
 
> I added a check in check_regs() at test_core.py to also compare the PC with
> the simulated PC. 

i saw that.  also, really, MSR and SVSTATE should be checked here, too.
(all three are part of the State Regfile).

> This is to check that the instruction size was correctly
> taken into account when calculating the next instruction address. Could be
> useful for testing branches as well.

yes, good point.
Comment 34 Luke Kenneth Casson Leighton 2021-02-13 22:45:13 GMT
(In reply to Luke Kenneth Casson Leighton from comment #33)
> (In reply to Cesar Strauss from comment #32)
> 
> > It works! Pushed.
> 
> wha-hey!  again, 
>  
> > > do make sure that test_issuer.py still passes (except that annoying bpermd
> > > which is a bug in ISACaller not the hardware)
> > 
> > I does. It also passes test_issuer_svp64.py, which contains a test with a
> > prefix, but with VL=1 for now.
> 
> excellent.  next step is the regfile stepping, but first i'd like to make
> that a module.

nuts.

see regspec_decode_read() - the Mux() on the isvec, this is too complex to
put into the function.

instead what i will do is augment PowerDecoder2 so that it contains the
Mux, there.
Comment 35 Luke Kenneth Casson Leighton 2021-02-14 13:04:13 GMT
Cesar i've added SVSTATE to CoreState, plus a read of the State regfile
at the exact point that MSR is done (same technique).

that is now available in PowerDecoder2 (because PowerDecoderSubset
takes a CoreState Record: might have to remove that, make it only
PowerDecoder2, have to see)

*now* i can do that Mux(is_vec, srcstep+read_data1, read_data1)
and in theory that should just "work".

can you do "setup" of SVSTATE? for now, just hack it with a yield
just after setup_memory/setup_regs in test_runner.py
the fields of the test.svstate should be transferred into
core.regs.state

(oh btw, see class RegFiles in regfiles.py, you can refer to
 core.regs.rf['state'] as just "core.regs.state")

see in setup_regs()?

    intregs = core.regs.int
    for i in range(32):
        if intregs.unary:
            yield intregs.regs[i].reg.eq(test.regs[i])
        else:
            yield intregs.memory._array[i].eq(test.regs[i])

so it can be done as (because state is a RegFileArray)

    stateregs = core.regs.state
    yield stateregs.regs[State.SVSTATE].reg.eq(test.svstate)

something like that.
Comment 36 Luke Kenneth Casson Leighton 2021-02-14 13:27:49 GMT
ok this is done:

diff --git a/src/soc/decoder/power_decoder2.py b/src/soc/decoder/power_decoder2.py
index 3dcc5a37..640acef8 100644
--- a/src/soc/decoder/power_decoder2.py
+++ b/src/soc/decoder/power_decoder2.py
@@ -1103,6 +1103,10 @@ class PowerDecode2(PowerDecodeSubset):
         if hasattr(do, "lk"):
             comb += dec_o2.lk.eq(do.lk)
 
+        # get SVSTATE srcstep (TODO: elwidth, dststep etc.) needed below
+        srcstep = Signal.like(self.state.svstate.srcstep)
+        comb += srcstep.eq(self.state.svstate.srcstep)
+
         # registers a, b, c and out and out2 (LD/ST EA)
         for to_reg, fromreg, svdec in (
             (e.read_reg1, dec_a.reg_out, in1_svdec),
@@ -1113,8 +1117,13 @@ class PowerDecode2(PowerDecodeSubset):
             comb += svdec.extra.eq(extra)        # EXTRA field of SVP64 RM
             comb += svdec.etype.eq(op.SV_Etype)  # EXTRA2/3 for this insn
             comb += svdec.reg_in.eq(fromreg.data) # 3-bit (CR0/BC/BFA)
-            comb += to_reg.data.eq(svdec.reg_out) # 7-bit output
             comb += to_reg.ok.eq(fromreg.ok)
+            # detect if Vectorised: add srcstep if yes.  TODO: a LOT.
+            # this trick only holds when elwidth=default and in single-pred
+            with m.If(svdec.isvec):
+                comb += to_reg.data.eq(srcstep+svdec.reg_out) # 7-bit output
+            with m.Else():
+                comb += to_reg.data.eq(svdec.reg_out) # 7-bit output
 
         comb += in1_svdec.idx.eq(op.sv_in1)  # SVP64 reg #1 (matches in1_sel)
         comb += in2_svdec.idx.eq(op.sv_in2)  # SVP64 reg #2 (matches in2_sel)

that *should* now work, there is quite a lot to do when elwidth overrides
are implemented, but for now, you should be able to adjust the FSM to count
from 0 to VL-1.

remember it should alter srcstep and write svstate to the core.state regfile,
copy the regfile write technique used on DEC and TB, there (see tb_dec_fsm)

                comb += state_w_sv.addr.eq(StateRegs.SVSTATE)
                comb += state_w_sv.wen.eq(1)
                comb += state_w_sv.data_i.eq(new_svstate)
Comment 37 Luke Kenneth Casson Leighton 2021-02-14 14:10:34 GMT
commit 60e753c5b2f9ebe91557ca864dbff77c1559819c (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sun Feb 14 14:06:39 2021 +0000

    add indicator to PowerDecoder2 when no outputs are Vectorised

--- a/src/soc/decoder/power_decoder2.py
+++ b/src/soc/decoder/power_decoder2.py
@@ -1003,6 +1003,7 @@ class PowerDecode2(PowerDecodeSubset):
         self.in3_isvec = Signal(1, name="reg_c_isvec")
         self.o_isvec = Signal(1, name="reg_o_isvec")
         self.o2_isvec = Signal(1, name="reg_o2_isvec")
+        self.no_out_vec = Signal(1, name="no_out_vec") # no outputs are vectors
 
Cesar this is the equivalent of svp64_dest_vector in ISACaller:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=03bf16d2ed98d6746fb6bdbfca2a312db8c9e424;hb=dbb140cd79611e59071d40416cd975f0f524faf2#l966

it can be used to detect that the 0..VL-1 loop should be ended early (scalar
dest). later when predication is added, the loop still continues and the
end-early test only triggers when the dest predicate bit is "1", even on scalar
dest.

(that's how we do VINSERT).
Comment 38 Luke Kenneth Casson Leighton 2021-02-23 17:30:44 GMT
https://libre-soc.org/irclog/%23libre-soc.2021-02-23.log.html#t2021-02-23T10:44:23

some notes:

* Fetch reads PC, MSR and SVSTATE.
  - MSR is needed to determine LE/BE of EXT01
  - SVSTATE may safely be passed to Issue

* Issue will immediately trigger Execute
  - monitoring of completion may involve
    updating (writing) of SVSTATE.srcstep
  - setvl, branch or trap NEVER involve
    a SVP64 Loop.
  - completion of execution will need
    + comparison against SVSTATE.vl
    + check if PowerDecoder2.no_out_vec
    + pc_changed or sv_changed which will
      indicate EXCEPTION
  - increment SVSTATE.srcstep WITHOUT
    needing to read it again (that was
    done in FETCH), EXCEPT if sv_changed
    because this was EXCEPTION.

* Execute *MAY* involve branch or trap or setvl
  - PC MSR and SVSTATE *may* be written
  - BUT, crucially, NEVER by an SVP64
  - therefore writing of PC or SVSTATE 
    indicates exit from loop
    (back to fetch)

basically there are two mutually exclusive situations:

1) setvl, branch or trap, any of these write to PC, MSR and SVSTATE.

  here is "Scalar Identity behaviour".
  illegal instruction if attempted to
  add an SVP64 prefix

2) everything else.

   here it is perfectly safe to read PC
   MSR and SVSTATE in FETCH FSM because
   they will never be written to
   (except on Exceptions)

   ISSUE can safely increment and write
   (except for an Exception, hence
    sv_changed detects that)

this will mesh well with speculative behaviour, and OoO, so we should recognise that even in the FSM because it will guide incremental development for pipelines and OoO.

of course only some instructions raise exceptions but we deal with that later.
Comment 39 Luke Kenneth Casson Leighton 2021-02-23 19:12:58 GMT
also:

* FETCH FSM will initiate handshake to ISSUE FSM

* ISSUE FSM will initiate handshake to EXECUTE FSM

* EXECUTE FSM will initiate handshake back to ISSUE FSM on completion
  - and also set (optionally) pc_changed, insn_done, and (optionally)
    sv_changed

* ISSUE FSM checks loop conditions.  if exiting it will *PASS ON*
  insn_done and pc_changed to FETCH FSM, and initiate
  handshake back to ISSUE FSM

it is important that there be a split there, insn_done is probably
enough... 

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=5ec00ca4ac6ccac038d6ca1f5b05dae413a191e6;hb=09b5ec626157143d593cb294f664a4b611937d0c#l258

yes it is.
Comment 40 Luke Kenneth Casson Leighton 2021-02-23 19:14:25 GMT
(In reply to Luke Kenneth Casson Leighton from comment #39)

> it is important that there be a split there,

i think.   you'll have to check.
Comment 41 Cesar Strauss 2021-02-26 11:18:49 GMT
(In reply to Luke Kenneth Casson Leighton from comment #40)
> (In reply to Luke Kenneth Casson Leighton from comment #39)
> 
> > it is important that there be a split there,
> 
> i think.   you'll have to check.

I have split the new issue FSM out from the issue/execute FSM, and marked the places where the looping will be checked/done, and where PC/SRCSTEP will be updated. test_issuer.py and test_issuer_svp64.py are still happy.
Comment 42 Luke Kenneth Casson Leighton 2021-02-26 13:28:48 GMT
(In reply to Cesar Strauss from comment #41)

> I have split the new issue FSM out from the issue/execute FSM, 

excellent that's a good incremental approach.

> and marked
> the places where the looping will be checked/done, and where PC/SRCSTEP will
> be updated. test_issuer.py and test_issuer_svp64.py are still happy.

superb.

i added that PowerDecoder.no_out_vec also needs to be checked (after execute)
i was going to refer you to the code where that's used in ISACaller but ha ha
i did it as this instead *sigh*

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=ccdbff1484797c070fc9c93f3165708f1acddf27;hb=HEAD#l1020

svp64_dest_vector needs to be replaced with using no_out_vec

i'm just going to move fetch_insn_o out so that the raw instruction is
dropped into pdecode permanently, see what happens.
Comment 43 Luke Kenneth Casson Leighton 2021-02-26 13:47:25 GMT
i've also moved update_svstate and new_svstate into the issue FSM,
initialised new_svstate from cur_state, and also sync'd it *back*
into cur_state if update_svstate is set.

basically cur_state.svstate is a mini write-back cache where because
we are in the issue FSM there's nothing else that should be modifying
it for normal loops, but exceptions *will* cause it to change (and
loops end, then, back to Fetch).
Comment 44 Luke Kenneth Casson Leighton 2021-02-27 12:54:03 GMT
cesar i took out the manual creation of svp64_dest_vector in ISACaller
and put in PowerDecoder.no_out_vec instead, confirmed that the unit
test test_caller_svp64.py still works.

line 1064:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=acaac1f7212c68c9c631d1626adc470f7fa5751f;hb=HEAD#l1064

the issue FSM should now be ridiculously simple.
Comment 45 Luke Kenneth Casson Leighton 2021-03-03 12:49:35 GMT
hi Cesar i took a look at the pc_i.ok this morning, yes you are right, things will go awry with SVSTATE because we are currently not also setting SVSTATE from the debug interface.

SVSTATE.srcstep (and dststep) are actual Sub-PC steppers so not resetting those is quite serious.

for now what i suggest is to reset SVSTATE.srcstep and dststep to zero when pc_i.ok is raised high (i will take care of this)

also as you suggested have pc_i.ok come on to the issue FSM and exit the loop.

later we can add SVSTATE to the DMI (and JTAG), then the rule will have to be set, "please over DMI modify PC first then SVSTATE second".

also then have to get test_core.py to use the DMI interface to set SVSTATE rather than put it into a SPR.
Comment 46 Luke Kenneth Casson Leighton 2021-03-03 14:37:51 GMT
i just added svstate_i which is the same as pc_i so that this can also
be set whilst the processor is STOPPED.

then, test_runner.py uses svstate_i.ok to set svstate and releases it
at the same time as pc_i.ok when START is sent.

in theeeeorryyyy that should solve the reset problem for starting a new
program, because (just like pc_i) when the new svstate_i is passed in,
it is no longer read from the StateRegs regfile.

that was probably the problem.
Comment 47 Cesar Strauss 2021-03-19 20:35:05 GMT
(In reply to Cesar Strauss from comment #41)
> I have split the new issue FSM out from the issue/execute FSM, and marked
> the places where the looping will be checked/done, and where PC/SRCSTEP will
> be updated.

This is old news by now, but, for the record, the Issue FSM now implements the SRCSTEP VL loop, as well as the VL==0 vector skip loop.

Probably, the VL==0 loop could have been done in Fetch instead. Luke, do you want me to do this, before moving to predication, or leave it for later?
Comment 48 Luke Kenneth Casson Leighton 2021-03-19 21:41:31 GMT
(In reply to Cesar Strauss from comment #47)

> Probably, the VL==0 loop could have been done in Fetch instead. Luke, do you
> want me to do this, before moving to predication, or leave it for later?

let's start predication, see how that goes, i will update comment 0 to make sure it's TODO or at least raised a new bugreport
Comment 49 Luke Kenneth Casson Leighton 2021-05-04 17:53:06 BST
Cesar i've just added SVSRR0 to PowerDecoder2 so that when OP_RFID or
OP_TRAP occur, it can issue a read (or write) of fast regs

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=d7811a9a511fe2949cc6adce2f0626edbb9702a1

commit d7811a9a511fe2949cc6adce2f0626edbb9702a1 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue May 4 17:51:16 2021 +0100

    add SVSRR0 to OP_RFID and OP_TRAP reg read/write
Comment 50 Luke Kenneth Casson Leighton 2021-05-05 01:18:40 BST
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/common_output_stage.py;hb=HEAD

cesar, here is where (for ALUs) the zeroing would be hooked in, towards the end.

when we do SIMD it will be partial zeroing based on 1, 2, 4 or 8 bits of predicate mask, which will also need to be passed in to the Input Record.

only when all 1 2 4 or 8 predicate mask bits are zero could we skip the entire operation and directly write zero to thev
 regfile.

this tells us that it is an optimisation to do that.

so, first step, don't worry about optimisation, simply test in the common output stage, if zeroing and predicate bit is clear, put a zero in the ouutput instead of the result.

ironically i cannot remember if the CR0 has to be zero'd (or, more to the point, the CR0 set to ==zero)