The goal is to implement the following pseudo-code for i = 0 to VL-1: GPR(RT+i) = GPR(RA+i) + GPR(RB+i) The tasks are: 1) DONE add SVSTATE as an SPR 2) DONE augment PowerDecoder to decode the SVP64 prefix 3) DONE augment the TestIssuer FSM to: a) if a SVP64 prefix was decoded, read and decode the next word, before really issuing the instruction b) issue the instruction and wait for completion. c) If any source or destination was a vector, keep issuing the instruction and increment SVSTATE.srcstep. Break when it reaches VL-1 d) set SVSTATE.srcstep to zero again 4) DONE modify the read/write point in the regfile to add SVSTATE.srcstep to the reg number 5) TODO add SETVL instruction 6) TODO figure out how to deal with exceptions in the middle of a vector instruction: a) save the SVSTATE before entering an exception. b) when exiting, restore SVSTATE, while setting the PC to the current instruction, so it's decoded and issued again. 7) TODO modify the decoder to spot when an illegal SVP64 combination occurs. these are MTMSR MFMSR setvl all branch operations at least. 8) TODO have the issue FSM do VL=0 detection to skip instructions
(In reply to Cesar Strauss from comment #0) > 4) modify the read/write point in the regfile to add SVSTATE.srcstep to > the reg number Also: increase the the number of registers in the register file to 128.
(In reply to Cesar Strauss from comment #0) > a) if a SVP64 prefix was decoded, read and decode the next word, before > really issuing the instruction Also, if VL is zero, do not issue any vector instruction, just keep fetching the next instruction.
(In reply to Cesar Strauss from comment #0) > 6) figure out how to deal with exceptions in the middle of a vector > instruction: this *may* require (read: is going to) adding an SRR2 into which SVSTATE is copied/transferred. this is because SVSTATE contains, in effect, a Sub-Program-Counter. PC is saved and swapped in SRR0... therefore because SVSTATE is a Sub-PC... you see how that works. > a) save the SVSTATE before entering an exception. in SRR2. > b) when exiting, restore SVSTATE, while setting the PC to the current > instruction, so it's decoded and issued again. from SRR2. actually we should probably call it SVSRR0. i will document this in the SPRs page. https://libre-soc.org/openpower/sv/sprs/ done.
Cesar, the Power Decoder works by reading CSV files, simply passing a list of dictionaries to each PowerDecoder instance: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder.py;h=028de9b3b0f78cc488096ff74a2c57a458a75f77;hb=c9a4beb81e9c11a3ac70494dcf64850d8c3a6d1e#l518 however the svp64 "EXTRA" fields, as you can see in svp64.py, are organised differently, keyed by the instruction (the comment field of v3.0B). https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/trans/svp64.py;h=4955df75f7703fdab954fb814391ad4eab54326e;hb=c9a4beb81e9c11a3ac70494dcf64850d8c3a6d1e#l133 my feeling here is that get_csv should be replaced in power_decoder.py with a function that "merges" the extra fields by looking up the instruction name and simply dropping in the RM fields. rather than modifying the CSV files which are already long. what do you think?
(In reply to Cesar Strauss from comment #2) > (In reply to Cesar Strauss from comment #0) > > a) if a SVP64 prefix was decoded, read and decode the next word, before > > really issuing the instruction yes. basically, with a FSM though you have to be careful. the actual PC is at the EXT001 (the prefix) not the suffix part! but you still have to *read* both the prefix and suffix. one possible solution is to have two FSMs, one for reading instructions, one for processing them. they communicate by signals to say "start" and "done" just like ready/valid. that way, passing 64 bit or 32 bit it is all the same. > Also, if VL is zero, do not issue any vector instruction, just keep fetching > the next instruction. yes, because the for loop is from 0 to 0. of course if it is a v3.0B 32bit instruction VL does not apply so will always be executed.
(In reply to Cesar Strauss from comment #1) > Also: increase the the number of registers in the register file to 128. this we should do as a separate step, it is quite disruptive. increases SRAM BRAM or DFF size a LOT.
commit 085b986713680d8a64448c85a3d82f2b7fea4732 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Jan 27 12:49:05 2021 +0000 provide "merger" of SVP64 RM info into v3.0B CSV files i did the merging part, it's not in use yet. also it is not re-encoded to Power Enums, that is a separate step.
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_svp64.py;hb=HEAD hmm hmm as-is this is not very useful. what is needed is to be able to get bits from EXTRA2/3 fields and say "this one extends RA". right now it is inverted, key us value, value is key. a bit more post-processing is needed, examining the v3.0B fields IN1, IN2, etc, to create: SVP64_IN1, SVP64_IN2 and for those fields to contain the index 0, 1, 2, 3 pointing to which bits of RM to read.
ok if you run this: lkcl@fizzy:~/src/libresoc/soc/src/soc$ python3 decoder/power_svp64.py OrderedDict([('opcode', '0b1001'), ('unit', 'SHIFT_ROT'), ('internal op', 'OP_RLCR'), ('in1', 'NONE'), ('in2', 'RB'), ('in3', 'RS'), ('out', 'RA'), ('CR in', 'NONE'), ('CR out', 'CR0'), ('inv A', '0'), ('inv out', '0'), ('cry in', 'ZERO'), ('cry out', '0'), ('ldst len', 'NONE'), ('BR', '0'), ('sgn ext', '0'), ('upd', '0'), ('rsrv', '0'), ('32b', '0'), ('sgn', '0'), ('rc', 'RC'), ('lk', '0'), ('sgl pipe', '0'), ('comment', 'rldcr'), ('form', 'MD'), ('EXTRA0', 'd:RA;d:CR0'), ('EXTRA1', 's:RB'), ('EXTRA2', 's:RS'), ('EXTRA3', '0'), ('SV_Ptype', '1P'), ('SV_Etype', 'EXTRA3'), ('sv_in1', None), ('sv_in2', 1), ('sv_in3', 2), ('sv_out', 0)]) you can see now, the in1/2/3/out fields have been joined by sv_in1/2/3/out which makes it dead-easy to know, in combination with SV_Etype, where to go to get the SVP64 "extra" fields for a given register RA/B/C/RS/RT. CRs are next commit 41f1bc2c070cc47d1990955d60c899ad8fab88bb (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Jan 27 18:16:47 2021 +0000 move svp64 reg-decode function to more appropriate location use it in SVP64RM get_svp64_csv to decode EXTRA bit-field positions
excellent. this is crxor. see how sv_cr_in = [1,2] and sv_cr_out = 0 this is decoded from EXTRA0=d:BT, EXTRA1=s:BA, EXTRA2=s:BB OrderedDict([('opcode', '0b0011000001'), ('unit', 'CR'), ('internal op', 'OP_CROP'), ('in1', 'NONE'), ('in2', 'NONE'), ('in3', 'NONE'), ('out', 'NONE'), ('CR in', 'BA_BB'), ('CR out', 'BT'), ('inv A', '0'), ('inv out', '0'), ('cry in', 'ZERO'), ('cry out', '0'), ('ldst len', 'NONE'), ('BR', '0'), ('sgn ext', '0'), ('upd', '0'), ('rsrv', '0'), ('32b', '0'), ('sgn', '0'), ('rc', 'NONE'), ('lk', '0'), ('sgl pipe', '0'), ('comment', 'crxor'), ('form', 'XL'), ('EXTRA0', 'd:BT'), ('EXTRA1', 's:BA'), ('EXTRA2', 's:BB'), ('EXTRA3', '0'), ('SV_Ptype', '1P'), ('SV_Etype', 'EXTRA3'), ('sv_in1', None), ('sv_in2', None), ('sv_in3', None), ('sv_out', None), ('sv_cr_in', (1, 2)), ('sv_cr_out', 0)]) commit af95c57b2027a2e5effa56f7ce324fd696216de4 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Jan 27 18:38:34 2021 +0000 add svp64 CR field identification for EXTRA2/3 decoding
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;h=e4ed80b18138c3d713538aaf9221ea40f2b2a24e;hb=refs/heads/master#l47 cesar if you can add SVSTATE to that list? and also correct the comment, it should be 3 entries, whoops :) it is currently 4 long, DEC and TB are in fastregs, now. PC=0 MSR=1 SVSTATE=2 and, at lines 52/54 add an extra rd and extra wr port, call one w_sv the other r_sv then, here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=d662dff4c20e7904e6ca7cb3ae2ceaa6f03b6e06;hb=refs/heads/master#l110 ta-daaa, you can get those ports, w_sv and r_sv, and now you can read and write SVSTATE.
https://libre-soc.org/openpower/sv/sprs/ also a Record is needed for SVSTATE according to the bitfields in sprs page above. SVStateRecord
(In reply to Luke Kenneth Casson Leighton from comment #12) > https://libre-soc.org/openpower/sv/sprs/ > > also a Record is needed for SVSTATE according to the bitfields in sprs page > above. SVStateRecord commit 80fbef229a74420ab5708c32432938cedfbc5e0b (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Jan 28 15:53:50 2021 +0000 add SVState SPR Record, SVSTATERec done. Cesar basically i am putting "pieces" in place. joining them together once the pieces are there is then much easier.
(In reply to Luke Kenneth Casson Leighton from comment #11) > https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py; > h=e4ed80b18138c3d713538aaf9221ea40f2b2a24e;hb=refs/heads/master#l47 > > cesar if you can add SVSTATE to that list? and also correct the comment, it > should be 3 entries, whoops :) done. i might as well :) not added into TestIssuer though. commit 116c0f476db4984306c3547565fbe77dcc4cf800 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Jan 28 16:07:29 2021 +0000 add SVSTATE to StateRegs (also fix comments)
very trivial change here to use the augmented get_csv. slowly getting there. commit a044ebddf465b15ae4ee567f8ed3ca2417ebe20b (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Jan 29 11:33:22 2021 +0000 use new svp64-augmented csv reader in PowerDecoder
Cesar would you like to plan / draw out a FSM which splits TestIssuer into 2 FSMs? the first FSM should be IDLE and INSN_READ, to be joined by INSN_WAIT the second FSM should be what is currently INSN_START and INSN_ACTIVE. these will need to be joined in this 2nd FSM by an IDLE state which the first FSM triggers. with ready/valid signalling it should be possible to get FSM1 to trigger FSM2 to move to the new state, and likewise FSM2 to trigger FSM1 to move back to IDLE. the first phase is *not* to add SV support straight away. the idea is to change the *v3.0B* TestIssuer to 2 FSMs, with no change in behaviour. v3.0B instruction read/issue *only*. then once that is done it becomes much easier to get the 1st FSM to read 64-bit instructions, than it is right now. one other nice thing is that the two FSMs will allow us to do pipelined versions later, much more easily.
(In reply to Luke Kenneth Casson Leighton from comment #16) > Cesar would you like to plan / draw out a FSM which splits TestIssuer into > 2 FSMs? Sure, I'm on it.
(In reply to Cesar Strauss from comment #17) > (In reply to Luke Kenneth Casson Leighton from comment #16) > > Cesar would you like to plan / draw out a FSM which splits TestIssuer into > > 2 FSMs? > > Sure, I'm on it. star. it should be a very easy code-morph, one that, strictly speaking, it is not necessary to "understand" the code at all! it is a slightly weird and uncomfortable feeling that goes away once the unit tests pass :) i do this all the time.
Fixed and enabled a looping test to ease iterative development. The looping is important to ensure that a PC update on the issue/execute FSM is correctly picked up on the fetch/decode FSM. https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;h=1fe38e73880e734134bea00fd241ffb2e1a5b836;hb=HEAD#l417
(In reply to Cesar Strauss from comment #0) > c) If any source or destination was a vector, keep issuing the > instruction and increment SVSTATE.srcstep. Break when it reaches VL-1 or if the destination is a scalar. but only after the instruction is executed. when predication is added this makes more sense because the loop stops at the first nonzero source predicate bit. > d) set SVSTATE.srcstep to zero again and only then allow PC to advance.
(In reply to Cesar Strauss from comment #19) > Fixed and enabled a looping test to ease iterative development. ah good, i do not know why it was not working.
(In reply to Cesar Strauss from comment #17) > (In reply to Luke Kenneth Casson Leighton from comment #16) > > Cesar would you like to plan / draw out a FSM which splits TestIssuer into > > 2 FSMs? > > Sure, I'm on it. It works! No regressions that I could see. I had to add a register between instruction memory and decode, but I think it's a good thing, despite taking an extra cycle.
(In reply to Cesar Strauss from comment #22) > It works! ha! totally awesome! > No regressions that I could see. > > I had to add a register between instruction memory and decode, but I think > it's a good thing, despite taking an extra cycle. yeah not a problem. we're not focussing on performance at the moment. there is a way to not need the extra cycle, you get each stage of the FSM to place the data combinatorially, so that the "users" of that data have it available immediately. it does however make things a bit messy. fascinatingly, if you look at the diff, it's a very clean separation. ok so _now_ the SVP64PowerDecoder can be added (combinatorially) in fetch_fsm, which analyses the incoming 32-bit instruction. https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_decoder2.py;hb=HEAD#l1284 then if the "is_svp64_mode" is set, take a copy of svp64_rm first (in a register), read another 32 bits, and pass in the svp64_rm into the PowerDecoder2 (self.pdecode2). that should still do absolutely nothing: zero impact (no regressions) oh one minor complication: the PC must *not* advance even though reading the next 32 bits. the PC *must* still point to the start of the 64-bit operation. this is going to be slightly complicated... actually maybe not, because branches and traps are not possible to SV extend.
commit 7539c4c4b9e76b42b3447d13b8195bafec6cec17 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Feb 11 15:57:25 2021 +0000 comments in TestIssuer for SVP64PrefixDecoder Cesar i have added in some TODO comments and the SVP64PrefixDecoder @@ -22,7 +22,7 @@ from nmigen.cli import main import sys from soc.decoder.power_decoder import create_pdecode -from soc.decoder.power_decoder2 import PowerDecode2 +from soc.decoder.power_decoder2 import PowerDecode2, SVP64PrefixDecoder from soc.decoder.decode2execute1 import IssuerDecode2ToOperand from soc.decoder.decode2execute1 import Data from soc.experiment.testmem import TestMemory # test only for instructions @@ -88,6 +88,7 @@ class TestIssuerInternal(Elaboratable): self.cur_state = CoreState("cur") # current state (MSR/PC/EINT) self.pdecode2 = PowerDecode2(pdecode, state=self.cur_state, opkls=IssuerDecode2ToOperand) + self.svp64 = SVP64PrefixDecoder() # for decoding SVP64 prefix # Test Instruction memory self.imem = ConfigFetchUnit(pspec).fu SVP64PrefixDecoder should be used combinatorially, it is very basic (5 lines), to detect the presence of an SVP64 prefix. if "yes" then the sv_rm field recorded (sync to register) and read another 32 bits insn stream
(In reply to Luke Kenneth Casson Leighton from comment #24) > Cesar i have added in some TODO comments and the SVP64PrefixDecoder Sure, I'm on it. It would be interesting to have a test case to test against, like the one you have on test_caller_svp64.py. Does the simulator already work on it?
(In reply to Cesar Strauss from comment #25) > It would be interesting to have a test case to test against, like the one > you have on test_caller_svp64.py. Does the simulator already work on it? that unit test can be split out if done carefully or just duplicated (slightly different comparison needed) no not yet. or, it passes because it does only one element at the moment.
(In reply to Cesar Strauss from comment #25) > It would be interesting to have a test case to test against, like the one > you have on test_caller_svp64.py. Does the simulator already work on it? does now. commit 9078b2935beb4ba89dcd2af91bb5e3a0bcffbe71 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Feb 12 14:41:48 2021 +0000 add srcstep and correct PC-advancing during Sub-PC looping in ISACaller i added 3 unit tests: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/test_caller_svp64.py;hb=HEAD * one all vector * one dest=scalar src1/2=vector * one dest=vector src1=scalar src2=vector strictly speaking for full coverage, QTY 8x tests are required.... per opcode! that's going to be one hell of a lot of unit tests, particularly for 4-operand instructions (QTY 16x for all permuations s/v s/v s/v s/v) i just added this though: commit 3a940211d59b532fb0b5d06890bbe1eb75cbcd4e (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Feb 12 15:02:43 2021 +0000 validate all registers to make sure no damage occurs in SVP64 ISACaller that's to make sure that running SVP64 doesn't damage the regfile. now you should be able to safely "compare" against the same tests, they have to be done slightly differently. note that you'll need to manually set up SVSTATE.vl/mvl otherwise there's no loop. if you can start a new test case class, in fu/alu/test/test_pipe_caller.py SVP64ALUTestCase, and add the same 3 from decoder/isa/test_caller_svp64.py you *should* be able to just call self.add_case() just remember to establish the SVSTATE class instance just like in test_caller_svp64.py now, the only thing is: the test_pipe_caller.py will *not work* with SVP64, it's not designed to and not intended to. test_pipe_caller.py is not connected to regfiles, and it's only TestIssuer that has the decoding of SVP64. so maybe putting the unit tests in a different file, fu/alu/test/svp64_cases.py is probably a good idea?
you'll need this: commit f67cd9f81a517e3b6e75c58bdc1d0d836fae26b7 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Feb 12 15:23:05 2021 +0000 add SVSTATE to TestCase infrastructure for use in TestIssuer and this (not tested): commit 99c712d52e4a9ff932162118677f228a332f1c01 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Feb 12 15:23:44 2021 +0000 add one SVP64 ALU test case to get started
Cesar here's where in ISACaller, after decoding (all done already by PowerDecoder2, now), the srcstep is added on: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=03bf16d2ed98d6746fb6bdbfca2a312db8c9e424;hb=HEAD#l892 you can see further down the write version. then, even further down, the "loop" which exits when SVSTATE.srcstep == VL-1 however the equivalent location - and i must apologise for this - is handled by regspec_decode_read() and regspec_decode_write(): https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_regspec_map.py;h=05ff4814e2504e3acbc5924b614031b460f8bd7c;hb=HEAD#l51 that's actually called here, get_byregfiles(): https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/core.py;h=d3a66bb1e9916c50277b1d5b1dc2a859707fc90b;hb=HEAD#l493 which returns a dictionary of tuples *kine 516), which in turn gets used by connect_read/write_ports(), which in turn calls connect_read/write_port() that tuple is unpacked here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/core.py;h=d3a66bb1e9916c50277b1d5b1dc2a859707fc90b;hb=HEAD#l388 (also at line 260 for connect_readport) now, *at the moment* it's relatively straightforward: regspec_decode_read/write could in theory be modified to return a Mux() based on is_vec being set. Mux(e.read1_isvec, e.svstate.srcstep + e.read1_data, e.read1_data) something like that for elwidth overrides that's going to be a *lot* more complex... but for now, that Mux should "do the trick"
(In reply to Luke Kenneth Casson Leighton from comment #28) > you'll need this: > > commit f67cd9f81a517e3b6e75c58bdc1d0d836fae26b7 (HEAD -> master) > Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> > Date: Fri Feb 12 15:23:05 2021 +0000 > > add SVSTATE to TestCase infrastructure for use in TestIssuer > > and this (not tested): > > commit 99c712d52e4a9ff932162118677f228a332f1c01 (HEAD -> master) > Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> > Date: Fri Feb 12 15:23:44 2021 +0000 > > add one SVP64 ALU test case to get started Please push these changes, thanks. Great progress on the simulator. It will be invaluable when testing the next steps. The SVP64 prefix fetch and decode is already working locally. I'll just need to test it a little more, add some comments, polish, etc, before pushing.
(In reply to Cesar Strauss from comment #30) > Please push these changes, thanks. whoops > Great progress on the simulator. It will be invaluable when testing the next > steps. > > The SVP64 prefix fetch and decode is already working locally. I'll just need > to test it a little more, add some comments, polish, etc, before pushing. star. don't be tempted to also add the reg counting/advancement... yet! do make sure that test_issuer.py still passes (except that annoying bpermd which is a bug in ISACaller not the hardware)
(In reply to Luke Kenneth Casson Leighton from comment #31) > (In reply to Cesar Strauss from comment #30) > > The SVP64 prefix fetch and decode is already working locally. I'll just need > > to test it a little more, add some comments, polish, etc, before pushing. It works! Pushed. > do make sure that test_issuer.py still passes (except that annoying bpermd > which is a bug in ISACaller not the hardware) I does. It also passes test_issuer_svp64.py, which contains a test with a prefix, but with VL=1 for now. I added a check in check_regs() at test_core.py to also compare the PC with the simulated PC. This is to check that the instruction size was correctly taken into account when calculating the next instruction address. Could be useful for testing branches as well.
(In reply to Cesar Strauss from comment #32) > It works! Pushed. wha-hey! again, > > do make sure that test_issuer.py still passes (except that annoying bpermd > > which is a bug in ISACaller not the hardware) > > I does. It also passes test_issuer_svp64.py, which contains a test with a > prefix, but with VL=1 for now. excellent. next step is the regfile stepping, but first i'd like to make that a module. > I added a check in check_regs() at test_core.py to also compare the PC with > the simulated PC. i saw that. also, really, MSR and SVSTATE should be checked here, too. (all three are part of the State Regfile). > This is to check that the instruction size was correctly > taken into account when calculating the next instruction address. Could be > useful for testing branches as well. yes, good point.
(In reply to Luke Kenneth Casson Leighton from comment #33) > (In reply to Cesar Strauss from comment #32) > > > It works! Pushed. > > wha-hey! again, > > > > do make sure that test_issuer.py still passes (except that annoying bpermd > > > which is a bug in ISACaller not the hardware) > > > > I does. It also passes test_issuer_svp64.py, which contains a test with a > > prefix, but with VL=1 for now. > > excellent. next step is the regfile stepping, but first i'd like to make > that a module. nuts. see regspec_decode_read() - the Mux() on the isvec, this is too complex to put into the function. instead what i will do is augment PowerDecoder2 so that it contains the Mux, there.
Cesar i've added SVSTATE to CoreState, plus a read of the State regfile at the exact point that MSR is done (same technique). that is now available in PowerDecoder2 (because PowerDecoderSubset takes a CoreState Record: might have to remove that, make it only PowerDecoder2, have to see) *now* i can do that Mux(is_vec, srcstep+read_data1, read_data1) and in theory that should just "work". can you do "setup" of SVSTATE? for now, just hack it with a yield just after setup_memory/setup_regs in test_runner.py the fields of the test.svstate should be transferred into core.regs.state (oh btw, see class RegFiles in regfiles.py, you can refer to core.regs.rf['state'] as just "core.regs.state") see in setup_regs()? intregs = core.regs.int for i in range(32): if intregs.unary: yield intregs.regs[i].reg.eq(test.regs[i]) else: yield intregs.memory._array[i].eq(test.regs[i]) so it can be done as (because state is a RegFileArray) stateregs = core.regs.state yield stateregs.regs[State.SVSTATE].reg.eq(test.svstate) something like that.
ok this is done: diff --git a/src/soc/decoder/power_decoder2.py b/src/soc/decoder/power_decoder2.py index 3dcc5a37..640acef8 100644 --- a/src/soc/decoder/power_decoder2.py +++ b/src/soc/decoder/power_decoder2.py @@ -1103,6 +1103,10 @@ class PowerDecode2(PowerDecodeSubset): if hasattr(do, "lk"): comb += dec_o2.lk.eq(do.lk) + # get SVSTATE srcstep (TODO: elwidth, dststep etc.) needed below + srcstep = Signal.like(self.state.svstate.srcstep) + comb += srcstep.eq(self.state.svstate.srcstep) + # registers a, b, c and out and out2 (LD/ST EA) for to_reg, fromreg, svdec in ( (e.read_reg1, dec_a.reg_out, in1_svdec), @@ -1113,8 +1117,13 @@ class PowerDecode2(PowerDecodeSubset): comb += svdec.extra.eq(extra) # EXTRA field of SVP64 RM comb += svdec.etype.eq(op.SV_Etype) # EXTRA2/3 for this insn comb += svdec.reg_in.eq(fromreg.data) # 3-bit (CR0/BC/BFA) - comb += to_reg.data.eq(svdec.reg_out) # 7-bit output comb += to_reg.ok.eq(fromreg.ok) + # detect if Vectorised: add srcstep if yes. TODO: a LOT. + # this trick only holds when elwidth=default and in single-pred + with m.If(svdec.isvec): + comb += to_reg.data.eq(srcstep+svdec.reg_out) # 7-bit output + with m.Else(): + comb += to_reg.data.eq(svdec.reg_out) # 7-bit output comb += in1_svdec.idx.eq(op.sv_in1) # SVP64 reg #1 (matches in1_sel) comb += in2_svdec.idx.eq(op.sv_in2) # SVP64 reg #2 (matches in2_sel) that *should* now work, there is quite a lot to do when elwidth overrides are implemented, but for now, you should be able to adjust the FSM to count from 0 to VL-1. remember it should alter srcstep and write svstate to the core.state regfile, copy the regfile write technique used on DEC and TB, there (see tb_dec_fsm) comb += state_w_sv.addr.eq(StateRegs.SVSTATE) comb += state_w_sv.wen.eq(1) comb += state_w_sv.data_i.eq(new_svstate)
commit 60e753c5b2f9ebe91557ca864dbff77c1559819c (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sun Feb 14 14:06:39 2021 +0000 add indicator to PowerDecoder2 when no outputs are Vectorised --- a/src/soc/decoder/power_decoder2.py +++ b/src/soc/decoder/power_decoder2.py @@ -1003,6 +1003,7 @@ class PowerDecode2(PowerDecodeSubset): self.in3_isvec = Signal(1, name="reg_c_isvec") self.o_isvec = Signal(1, name="reg_o_isvec") self.o2_isvec = Signal(1, name="reg_o2_isvec") + self.no_out_vec = Signal(1, name="no_out_vec") # no outputs are vectors Cesar this is the equivalent of svp64_dest_vector in ISACaller: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=03bf16d2ed98d6746fb6bdbfca2a312db8c9e424;hb=dbb140cd79611e59071d40416cd975f0f524faf2#l966 it can be used to detect that the 0..VL-1 loop should be ended early (scalar dest). later when predication is added, the loop still continues and the end-early test only triggers when the dest predicate bit is "1", even on scalar dest. (that's how we do VINSERT).
https://libre-soc.org/irclog/%23libre-soc.2021-02-23.log.html#t2021-02-23T10:44:23 some notes: * Fetch reads PC, MSR and SVSTATE. - MSR is needed to determine LE/BE of EXT01 - SVSTATE may safely be passed to Issue * Issue will immediately trigger Execute - monitoring of completion may involve updating (writing) of SVSTATE.srcstep - setvl, branch or trap NEVER involve a SVP64 Loop. - completion of execution will need + comparison against SVSTATE.vl + check if PowerDecoder2.no_out_vec + pc_changed or sv_changed which will indicate EXCEPTION - increment SVSTATE.srcstep WITHOUT needing to read it again (that was done in FETCH), EXCEPT if sv_changed because this was EXCEPTION. * Execute *MAY* involve branch or trap or setvl - PC MSR and SVSTATE *may* be written - BUT, crucially, NEVER by an SVP64 - therefore writing of PC or SVSTATE indicates exit from loop (back to fetch) basically there are two mutually exclusive situations: 1) setvl, branch or trap, any of these write to PC, MSR and SVSTATE. here is "Scalar Identity behaviour". illegal instruction if attempted to add an SVP64 prefix 2) everything else. here it is perfectly safe to read PC MSR and SVSTATE in FETCH FSM because they will never be written to (except on Exceptions) ISSUE can safely increment and write (except for an Exception, hence sv_changed detects that) this will mesh well with speculative behaviour, and OoO, so we should recognise that even in the FSM because it will guide incremental development for pipelines and OoO. of course only some instructions raise exceptions but we deal with that later.
also: * FETCH FSM will initiate handshake to ISSUE FSM * ISSUE FSM will initiate handshake to EXECUTE FSM * EXECUTE FSM will initiate handshake back to ISSUE FSM on completion - and also set (optionally) pc_changed, insn_done, and (optionally) sv_changed * ISSUE FSM checks loop conditions. if exiting it will *PASS ON* insn_done and pc_changed to FETCH FSM, and initiate handshake back to ISSUE FSM it is important that there be a split there, insn_done is probably enough... https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/issuer.py;h=5ec00ca4ac6ccac038d6ca1f5b05dae413a191e6;hb=09b5ec626157143d593cb294f664a4b611937d0c#l258 yes it is.
(In reply to Luke Kenneth Casson Leighton from comment #39) > it is important that there be a split there, i think. you'll have to check.
(In reply to Luke Kenneth Casson Leighton from comment #40) > (In reply to Luke Kenneth Casson Leighton from comment #39) > > > it is important that there be a split there, > > i think. you'll have to check. I have split the new issue FSM out from the issue/execute FSM, and marked the places where the looping will be checked/done, and where PC/SRCSTEP will be updated. test_issuer.py and test_issuer_svp64.py are still happy.
(In reply to Cesar Strauss from comment #41) > I have split the new issue FSM out from the issue/execute FSM, excellent that's a good incremental approach. > and marked > the places where the looping will be checked/done, and where PC/SRCSTEP will > be updated. test_issuer.py and test_issuer_svp64.py are still happy. superb. i added that PowerDecoder.no_out_vec also needs to be checked (after execute) i was going to refer you to the code where that's used in ISACaller but ha ha i did it as this instead *sigh* https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=ccdbff1484797c070fc9c93f3165708f1acddf27;hb=HEAD#l1020 svp64_dest_vector needs to be replaced with using no_out_vec i'm just going to move fetch_insn_o out so that the raw instruction is dropped into pdecode permanently, see what happens.
i've also moved update_svstate and new_svstate into the issue FSM, initialised new_svstate from cur_state, and also sync'd it *back* into cur_state if update_svstate is set. basically cur_state.svstate is a mini write-back cache where because we are in the issue FSM there's nothing else that should be modifying it for normal loops, but exceptions *will* cause it to change (and loops end, then, back to Fetch).
cesar i took out the manual creation of svp64_dest_vector in ISACaller and put in PowerDecoder.no_out_vec instead, confirmed that the unit test test_caller_svp64.py still works. line 1064: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/isa/caller.py;h=acaac1f7212c68c9c631d1626adc470f7fa5751f;hb=HEAD#l1064 the issue FSM should now be ridiculously simple.
hi Cesar i took a look at the pc_i.ok this morning, yes you are right, things will go awry with SVSTATE because we are currently not also setting SVSTATE from the debug interface. SVSTATE.srcstep (and dststep) are actual Sub-PC steppers so not resetting those is quite serious. for now what i suggest is to reset SVSTATE.srcstep and dststep to zero when pc_i.ok is raised high (i will take care of this) also as you suggested have pc_i.ok come on to the issue FSM and exit the loop. later we can add SVSTATE to the DMI (and JTAG), then the rule will have to be set, "please over DMI modify PC first then SVSTATE second". also then have to get test_core.py to use the DMI interface to set SVSTATE rather than put it into a SPR.
i just added svstate_i which is the same as pc_i so that this can also be set whilst the processor is STOPPED. then, test_runner.py uses svstate_i.ok to set svstate and releases it at the same time as pc_i.ok when START is sent. in theeeeorryyyy that should solve the reset problem for starting a new program, because (just like pc_i) when the new svstate_i is passed in, it is no longer read from the StateRegs regfile. that was probably the problem.
(In reply to Cesar Strauss from comment #41) > I have split the new issue FSM out from the issue/execute FSM, and marked > the places where the looping will be checked/done, and where PC/SRCSTEP will > be updated. This is old news by now, but, for the record, the Issue FSM now implements the SRCSTEP VL loop, as well as the VL==0 vector skip loop. Probably, the VL==0 loop could have been done in Fetch instead. Luke, do you want me to do this, before moving to predication, or leave it for later?
(In reply to Cesar Strauss from comment #47) > Probably, the VL==0 loop could have been done in Fetch instead. Luke, do you > want me to do this, before moving to predication, or leave it for later? let's start predication, see how that goes, i will update comment 0 to make sure it's TODO or at least raised a new bugreport
Cesar i've just added SVSRR0 to PowerDecoder2 so that when OP_RFID or OP_TRAP occur, it can issue a read (or write) of fast regs https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=d7811a9a511fe2949cc6adce2f0626edbb9702a1 commit d7811a9a511fe2949cc6adce2f0626edbb9702a1 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue May 4 17:51:16 2021 +0100 add SVSRR0 to OP_RFID and OP_TRAP reg read/write
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/common_output_stage.py;hb=HEAD cesar, here is where (for ALUs) the zeroing would be hooked in, towards the end. when we do SIMD it will be partial zeroing based on 1, 2, 4 or 8 bits of predicate mask, which will also need to be passed in to the Input Record. only when all 1 2 4 or 8 predicate mask bits are zero could we skip the entire operation and directly write zero to thev regfile. this tells us that it is an optimisation to do that. so, first step, don't worry about optimisation, simply test in the common output stage, if zeroing and predicate bit is clear, put a zero in the ouutput instead of the result. ironically i cannot remember if the CR0 has to be zero'd (or, more to the point, the CR0 set to ==zero)