Needs to handle POWER instructions https://libre-riscv.org/3d_gpu/architecture/decoder/
The POWER ISA includes instructions to load and store multiple registers to/from memory (`lmw` and `smw`). `lmw rt, D(rA)` loads registers rt, r(t+1), r(t+2), ..., r31 from subsequent addresses starting at rA + signed(D). `smw` behaves similarly but for stores These instructions, if implemented, would significantly complicate the decoder by forcing it to either generate $31-n$ load or store ops while stalling instruction fetch. Alternatively it could generate a single op that after taking a trip through the pipeline would be modified (i.e. incrementing rt) and being placed back into the issue queue. I suspect the latter would not play nicely with the OOO machinery because it wouldn't place a reservation on a register until it was time to store that register, so subsequent instructions could modify the registers before they are stored.
(In reply to Michael Nolan from comment #1) > The POWER ISA includes instructions to load and store multiple registers > to/from memory (`lmw` and `smw`). > A similar situation exists with the load and store string instructions which are currently "Phased Out" (I assume we wouldn't be implementing these, or emulating them with an OS trap)
(In reply to Michael Nolan from comment #1) > The POWER ISA includes instructions to load and store multiple registers > to/from memory (`lmw` and `smw`). > > `lmw rt, D(rA)` loads registers rt, r(t+1), r(t+2), ..., r31 from subsequent > addresses starting at rA + signed(D). `smw` behaves similarly but for stores ah funnily enough this is basically identical to LD.MULTI which is covered by the out-of-order execution engine and by SimpleV. > These instructions, if implemented, would significantly complicate the > decoder by forcing it to either generate $31-n$ load or store ops while > stalling instruction fetch. Alternatively it could generate a single op that > after taking a trip through the pipeline would be modified (i.e. > incrementing rt) and being placed back into the issue queue. I suspect the > latter would not play nicely with the OOO machinery because it wouldn't > place a reservation on a register until it was time to store that register, > so subsequent instructions could modify the registers before they are stored. the total opposite is true: it plays very nicely with the OoO engine and plays merry hell with an in-order design. no stalling is required. all that happens is: we translate this into a SimpleV instruction (VL=32) and hit the SV hardware-loop system with it.
(In reply to Luke Kenneth Casson Leighton from comment #3) > (In reply to Michael Nolan from comment #1) > no stalling is required. okok that's not quite true :) if the internal multi-issue width is say only 4 64-bit wide, then yes, the SimpleV VL-hardware-loop engine (which takes over at this point, the lmw decode is "done" by that point) will have to issue 4 instructions at a time. and, if there are not enough LOAD/STORE Function Units "completed" yet, then _yes_, the issue engine has to stall. however this is just day-to-day run-of-the-mill as far as the 6600 engine is concerned. if there are no Function Units free to accept the current instruction, then there are simply too many outstanding instructions being executed for it to be possible to drop more into the Dependency Matrices. so... *then* you stall. that's how it worked back in 1966, the only difference being now that we will have a multi-issue engine and so can accept more instructions per clock.
(In reply to Luke Kenneth Casson Leighton from comment #4) > (In reply to Luke Kenneth Casson Leighton from comment #3) > > (In reply to Michael Nolan from comment #1) > > > no stalling is required. > > okok that's not quite true :) > > if the internal multi-issue width is say only 4 64-bit wide, then yes, > the SimpleV VL-hardware-loop engine (which takes over at this point, > the lmw decode is "done" by that point) will have to issue 4 instructions > at a time. > > and, if there are not enough LOAD/STORE Function Units "completed" yet, > then _yes_, the issue engine has to stall. Huh ok. So there's a SimpleV translator between the decoder and the dependency matrices/issue queue?
(In reply to Michael Nolan from comment #5) > Huh ok. So there's a SimpleV translator between the decoder and the > dependency matrices/issue queue? yes. that's how SimpleV works - it's what it is. think about it: SimpleV is a hardware for-loop. the Program Counter pauses, and a *SUB-EXECUTION* loop occurs (for i in range(VL)) however each output from that sub-execution loop goes... ... into the instruction issue. ta-daaa :) therefore, lmw can quite literally be replaced with a SimpleV "SVPrefix" instruction. SVP{VL=32}.LD r1, r2#D. done.
Oh no. RLWINM & friends is going to be fun to do with the partitioned shifter. Maybe it warrants its own module?
(In reply to Michael Nolan from comment #7) > Oh no. RLWINM & friends is going to be fun to do with the partitioned > shifter. Maybe it warrants its own module? deefinitely. there is actually a way to "construct" it from *two* PartitionedShifters: * create an "inversion" (bitwise inversion) of both the mask *and* A-input * run through the SAME (existing) DynamicPartitionedShifter this is how you do "right shift" then, you basically do a hybrid combination of left-shift ORed with right-shift, where the right-shift part is done (32-B) and the left-shift part is (B) however... because of the dynamic partitioning, obviously that's (16-B) for 16-bit-wide partitions etc. etc. now, we *could* hypothetically do that as micro-ops, exactly as we discussed a few days ago: * put the data first through the left-shifter and store it as a partial result * put the data next through the left-shifter (after inversion) * do the OR (and ANDing) operation as a third phase i am tempted to suggest trying that, first, rather than writing a "special dedicated RLWINM" operation. do you want to give that a shot? see if it's first possible to create a Dynamic Right-Shifter out of a Dynamic Left-Shifter?
(In reply to Luke Kenneth Casson Leighton from comment #6) > therefore, lmw can quite literally be replaced with a SimpleV > "SVPrefix" instruction. SVP{VL=32}.LD r1, r2#D. One thing to consider: lmw and similar is probably defined to read the address register before writing to any registers, whereas SimpleV may not be defined that way (but probably should be), this matters when the load overwrites the address register part way through.
(In reply to Luke Kenneth Casson Leighton from comment #8) > * put the data first through the left-shifter and store it as a partial > result > * put the data next through the left-shifter (after inversion) > * do the OR (and ANDing) operation as a third phase > > i am tempted to suggest trying that, first, rather than writing a > "special dedicated RLWINM" operation. > > do you want to give that a shot? see if it's first possible to create a > Dynamic Right-Shifter out of a Dynamic Left-Shifter? I can. We're definitely going to need a right shift instruction regardless of how we do rlwinm. Should I create a separate bug report for that or update progress under either this one or 173?
(In reply to Jacob Lifshay from comment #9) > > One thing to consider: lmw and similar is probably defined to read the > address register before writing to any registers, whereas SimpleV may not be > defined that way (but probably should be), this matters when the load > overwrites the address register part way through. The POWER ISA declares a LMW like that (or a LMW including register 0) to be invalid because of this. Glad you brought it up though
(In reply to Jacob Lifshay from comment #9) > (In reply to Luke Kenneth Casson Leighton from comment #6) > > therefore, lmw can quite literally be replaced with a SimpleV > > "SVPrefix" instruction. SVP{VL=32}.LD r1, r2#D. > > One thing to consider: lmw and similar is probably defined to read the > address register before writing to any registers, whereas SimpleV may not be > defined that way (but probably should be), this matters when the load > overwrites the address register part way through. that's handled by the LD Function Unit. Reservations are put on the registers (and on the memory address... *when calculated*). you need to read mitch alsup's book chapters to fully understand the process. and yes, the Dependency Matrix will notice that there is a write hazard on the address being overwritten, and will correctly "hold up" the instruction that depends on the register. from what you're saying, michael, this is "too complicated" for POWER ISA to cope with, so they declare it "invalid"? if so this does not surprise me at all because you absolutely cannot have an ISA that specifically depends on an OoO execution model.
(In reply to Michael Nolan from comment #10) > I can. We're definitely going to need a right shift instruction regardless > of how we do rlwinm. Should I create a separate bug report for that or > update progress under either this one or 173? yes make it a separate one because this bugreport is about instruction decode, not ALU implementation. do cross-reference this bugreport though.
(In reply to Luke Kenneth Casson Leighton from comment #12) > (In reply to Jacob Lifshay from comment #9) > > (In reply to Luke Kenneth Casson Leighton from comment #6) > > > therefore, lmw can quite literally be replaced with a SimpleV > > > "SVPrefix" instruction. SVP{VL=32}.LD r1, r2#D. > > > > One thing to consider: lmw and similar is probably defined to read the > > address register before writing to any registers, whereas SimpleV may not be > > defined that way (but probably should be), this matters when the load > > overwrites the address register part way through. > > that's handled by the LD Function Unit. Reservations are put on the > registers (and on the memory address... *when calculated*). My issue is not that it can't be implemented correctly in HW, but that having the instruction switch the address used halfway through makes it much harder to use in a compiler due to the inputs being overwritten part way through execution. It's the difference between load1 and load2 in: int regs[]; // what I think SimpleV should be defined to do void load1(int address_reg, int dest_reg, int N) { int address = regs[address_reg]; for(int i = 0; i < N; i++) regs[dest_reg + i] = *(int *)(address + sizeof(int) * i); } // what SimpleV is currently defined to do void load2(int address_reg, int dest_reg, int N) { for(int i = 0; i < N; i++) regs[dest_reg + i] = *(int *)(regs[address_reg] + sizeof(int) * i); } where load2 overwrites the address register halfway through so all the memory accesses after that use the newly loaded value as the base address.
(In reply to Jacob Lifshay from comment #14) > My issue is not that it can't be implemented correctly in HW, but that > having the instruction switch the address used halfway through makes it much > harder to use in a compiler due to the inputs being overwritten part way > through execution. oh good point. hmmm, that makes much more sense as to why it would be considered invalid to do.
(In reply to Jacob Lifshay from comment #14) > My issue is not that it can't be implemented correctly in HW, but that > having the instruction switch the address used halfway through makes it much > harder to use in a compiler due to the inputs being overwritten part way > through execution. > > It's the difference between load1 and load2 in: note that switching to load1 will require a CSR to store the address if the instruction is interrupted in the middle because the address register may have already been overwritten. The load1 semantics should apply to all instructions with scalar/subvector inputs and vector outputs, where all scalar inputs are read before any outputs are written. vector (VL-length) inputs are still read and written one subvector/scalar at a time so we won't need giant temporary buffers.
(In reply to Jacob Lifshay from comment #14) > // what I think SimpleV should be defined to do (... it's not) > void load1(int address_reg, int dest_reg, int N) > { > int address = regs[address_reg]; > for(int i = 0; i < N; i++) > regs[dest_reg + i] = *(int *)(address + sizeof(int) * i); > } > > // what SimpleV is currently defined to do (... it's not) > void load2(int address_reg, int dest_reg, int N) > { > for(int i = 0; i < N; i++) > regs[dest_reg + i] = *(int *)(regs[address_reg] + sizeof(int) * i); > } it has support for both. the first is called 'indirect mode', the second is 'stride mode'.
(In reply to Jacob Lifshay from comment #16) > (In reply to Jacob Lifshay from comment #14) > > My issue is not that it can't be implemented correctly in HW, but that > > having the instruction switch the address used halfway through makes it much > > harder to use in a compiler due to the inputs being overwritten part way > > through execution. > > > > It's the difference between load1 and load2 in: > > note that switching to load1 will require a CSR to store the address if the > instruction is interrupted in the middle because the address register may > have already been overwritten. the modifications by mitch alsup to the 6600 system do not require that to happen. specifically: the "shadow" mechanism. the "shadow" mechanism allows for "damaging" consequences to be *calculated* but *prevented and prohibited* from actually hitting memory, register file, or other critical non-reversible resource. if there is anything that could raise an exception which could cause "damage", it raises a "shadow". ONLY when the result of the operation is GUARANTEED to complete is the shadow released. there *is* no possibility, therefore, of damage occurring by an instruction being "interrupted in the middle". the operation therefore either completes atomically... or its results discarded entirely, and, because the shadow is also cast across *subsequent* instructions as well, all instructions in sequence after the one being discarded are also discarded.
(In reply to Luke Kenneth Casson Leighton from comment #17) > (In reply to Jacob Lifshay from comment #14) > > > // what I think SimpleV should be defined to do > > (... it's not) > > > void load1(int address_reg, int dest_reg, int N) > > { > > int address = regs[address_reg]; > > for(int i = 0; i < N; i++) > > regs[dest_reg + i] = *(int *)(address + sizeof(int) * i); > > } > > > > // what SimpleV is currently defined to do > > (... it's not) > > > void load2(int address_reg, int dest_reg, int N) > > { > > for(int i = 0; i < N; i++) > > regs[dest_reg + i] = *(int *)(regs[address_reg] + sizeof(int) * i); > > } > > it has support for both. the first is called 'indirect mode', the > second is 'stride mode'. umm... strided mode is conventionally where the addresses loaded follow the sequence base + i * increment where increment can be much bigger than sizeof(int) indirect mode is conventionally where the addresses are taken from successive vector elements so each address can be independently specified. what load2 is is where the address register is a scalar that doesn't change (address_reg is never reassigned), but regs[address_reg] is assigned to halfway through, changing the address used for subsequent loads. This is what's specified for compact loads (for lack of a better term).
ehhm.. ehmehmehm.... i am getting confused, sorry, i was looking for certain patterns, spotted half of what i expected to be there and thought it was the whole lot. load2 is... oh wait! it's... yes they are both strided, yes i see the address computation is in the loop. ok yes got it, yes load2 is how it's done, with the Dependency Matrices protecting both the memory reads (from writes) *and* protecting the register reads and writes... in full sequence.
ok the right-shift looks great, michael, i was pleasantly surprised that the bit- swap technique worked so well, particularly for the dynamic-partitioning. can i make a suggestion, of adding tables of the ISA here: https://libre-riscv.org/openpower/isatables/ i have found that it helps enormously. basically, Appendix F from Power Version 2.07B?
um... :) https://github.com/antonblanchard/chiselwatt/blob/master/src/main/scala/Control.scala
and, err... https://github.com/antonblanchard/chiselwatt/blob/master/src/main/scala/Instructions.scala err anyone got 5 minutes to write a python script to turn that into nmigen? :)
https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl y'know... autogenerating these from a straight csv file which is included directly into the wiki via the "data" tag should be pretty easy...
https://libre-riscv.org/openpower/isatables/ using anton's decode1.vhdl, just for laughs i extracted the major opcodes into a csv file.
(In reply to Luke Kenneth Casson Leighton from comment #25) > https://libre-riscv.org/openpower/isatables/ > using anton's decode1.vhdl, just for laughs i extracted the major opcodes > into a csv file. I've got something in http://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/power_major_decoder.py;h=0e0f1f22e5f9d1a92d8362f49cdb6b62c42b0ec4;hb=HEAD I had to modify your table a little bit to remove the spaces after the commas. Would it break the display if I did the same thing to the wiki one?
(In reply to Michael Nolan from comment #26) > (In reply to Luke Kenneth Casson Leighton from comment #25) > > https://libre-riscv.org/openpower/isatables/ > > using anton's decode1.vhdl, just for laughs i extracted the major opcodes > > into a csv file. > > I've got something in > http://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/ > power_major_decoder.py;h=0e0f1f22e5f9d1a92d8362f49cdb6b62c42b0ec4;hb=HEAD that was quick and pretty ridiculously easy, wasn't it? :) > I had to modify your table a little bit to remove the spaces after the > commas. Would it break the display if I did the same thing to the wiki one? heck no, the whole point is to be able to do wget on the wiki version (rather than have two out-of-date copies)
(In reply to Luke Kenneth Casson Leighton from comment #27) oder.py;h=0e0f1f22e5f9d1a92d8362f49cdb6b62c42b0ec4;hb=HEAD > > that was quick and pretty ridiculously easy, wasn't it? :) yup > heck no, the whole point is to be able to do wget on the wiki version > (rather than have two out-of-date copies) It doesn't look like I have write access to the wiki git, and I can't figure out how to edit the CSV on the wiki. What do I need to do to update that one?
(In reply to Luke Kenneth Casson Leighton from comment #27) > heck no, the whole point is to be able to do wget on the wiki version > (rather than have two out-of-date copies) ok bear in mind i just added the heading "comment" which was missing. also added minor_19. btw remember that in python int("0b100", 2) will convert to the value 4 so we can leave the values in minor_19 in binary, even detect the bit-length all the minor_xx tables are exactly the same format as the major one.
(In reply to Michael Nolan from comment #28) > It doesn't look like I have write access to the wiki git, you do now. git clone gitolite3@git.libre-riscv.org:922/libreriscv.git i think. check HDL_workflow.
I'm moving the enums for each field in the table to a separate file. Would it be acceptable to do a `from power_enums import *` in the decoder and testbench?
(In reply to Luke Kenneth Casson Leighton from comment #31) > (In reply to Michael Nolan from comment #28) > > > It doesn't look like I have write access to the wiki git, > > you do now. git clone gitolite3@git.libre-riscv.org:922/libreriscv.git > i think. check HDL_workflow. Thanks, updated.
(In reply to Michael Nolan from comment #32) > I'm moving the enums for each field in the table to a separate file. great. > Would it be acceptable to do a `from power_enums import *` in the decoder and > testbench? no, it's really not a good idea. grep'd "class power_enum.py >f ", did a bit of vim voodoo, created this: from soc.decode.power_enums import (Function, InternalOp, In1Sel, In2Sel, In3Sel, OutSel, LdstLen, RC, CryIn, )
hmm we need to sort out the soc namespace, move the entire subdirectories to something that goes "from soc.decoder.power_decoder import xxxx" btw relative imports are generally to be avoided: full import namespaces otherwise things get messed up.
like the http get in the decode, michael. i would have used a Makefile and wget, httplib works just as well.
btw for RISCV we don't need to do one, as there is at least one migen RISCV processor out there. https://github.com/lambdaconcept/minerva/blob/master/minerva/units/decoder.py dang that took far too long to find! entire search space "riscv nmigen" is polluted with irrelevant items. anyway found it eventually. so. question for you, illustrating why "import *" is problematic. i have never seen the function "matcher" before. can you tell me, immediately, without doing a google search, if it is a standard python function, or if it is part of nmigen? it's impossible to tell, isn't it? even if it was a keyword, the very fact that there is even *one* import * DESTROYS all confidence because nmigen *might* have an overriden function named "matcher". anyway. with some adaptation and checking the license file first, we should be able to use that code.
https://github.com/lambdaconcept/minerva/blob/master/minerva/units/decoder.py#L140 found it :)
(In reply to Luke Kenneth Casson Leighton from comment #27) > (In reply to Michael Nolan from comment #26) > > I had to modify your table a little bit to remove the spaces after the > > commas. Would it break the display if I did the same thing to the wiki one? > > heck no, the whole point is to be able to do wget on the wiki version > (rather than have two out-of-date copies) Wouldn't it be better to just link the wiki to the source code in soc.git (or other repo)? that way you can build the SoC from source without needing to connect to our server. We should try to have all the required code in the code repos rather than the wiki repo. Alternatively, if ikiwiki can manage autoupdating it, have soc be a submodule of the wiki git repo.
(In reply to Jacob Lifshay from comment #39) > (In reply to Luke Kenneth Casson Leighton from comment #27) > > (In reply to Michael Nolan from comment #26) > > > I had to modify your table a little bit to remove the spaces after the > > > commas. Would it break the display if I did the same thing to the wiki one? > > > > heck no, the whole point is to be able to do wget on the wiki version > > (rather than have two out-of-date copies) > > Wouldn't it be better to just link the wiki to the source code in soc.git > (or other repo)? that way you can build the SoC from source without needing > to connect to our server. We should try to have all the required code in the > code repos rather than the wiki repo. good point... is it easy to do? > Alternatively, if ikiwiki can manage autoupdating it, have soc be a > submodule of the wiki git repo. hmm.. hmm... tempted to suggest "nuts to it" and just have a copy, marking it as "canonical source: http://....."
i saw you added minor_30.csv (great), i added 58 and 62. https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl hmm i think perhaps we could do instead with a class rather than explicit members of PowerDecoder, i moved them into PowerOps. perhaps later PowerOps might best be derived from Record, although whitequark is being quite intransigent about how Record should be created (and used), which causes us quite a lot of problems in using it. leaving that aside and moving on... when doing nop, attn and SIM_CONFIG, these are explicit single-ops that need assignment in a derivative of PowerDecoder https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl#L391 in VHDL, "&" appears not to be the logical "AND" operator, it appears to be "append together". line 375 for example.
hiya michael, http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-March/004661.html ok the minor_31 opcode is just absolutely ridiculous, so i took a look at pages 1385-1386 of Version 2.07B and noticed that there are patterns in the columns. add occupies the entire LSB "01010" column for example. sooOooo... what i propose is a two-level hierarchy as follows: * pre-process the rows by * first identifying the first 5 bits as keys (let's say N are identifed) * splitting the rows into N groups (by 5-bit key) then it becomes possible to create a case-of-case-statements, each with a separate PowerDecoder(), "automatically" from those groups. in fact, i suspect that it *might* be possible to specify how to decode *all* of the instructions in this fashion, with some sort of "specification" that says "these bits get decoded with this table", then "these bits get decoded with this table" etc. etc. in a cascade. i am quite tempted to suggest instead to split minor_31.csv into separate minor_31_5LSBs.csv files rather than do it automatically. what's your thoughts there?
ok i committed some experimental code (if False:) which gives the general idea, where PowerDecoder would iterate recursively through sub-encoders that have the bit-width specified that they are to look into. minor_19 i altered from anton's original so that the "19_valid" check is not needed. however, again, like 31, because it is 10 bits wide, it now needs a "pre-processing" stage, taking the bottom 5 bits, first, and creating a list-of-lists which act as *another* level of recursive hierarchy. unfortunately for Power ISA, it looks like they have something like four, five or even six levels of these decoders, i.e. 5 separate areas of the opcode that you must decode sequentially in order to work out the "true" operation. if you try to "flatten" any of those, you end up with the horrible 129-bit OR gates (!) anyway. do let me know what you think, michael.
(In reply to Luke Kenneth Casson Leighton from comment #43) > if you try to "flatten" any of those, you end up with the horrible > 129-bit OR gates (!) That may not be that big of a problem, since yosys should change that to a 4-level tree of 3 or 4-input OR gates. One other option is to use one or more ROMs for parts of the decoding process, since they are optimized for those cases where you need dense lookup tables. Something like a 7-bit address would work fine.
(In reply to Jacob Lifshay from comment #44) > (In reply to Luke Kenneth Casson Leighton from comment #43) > > if you try to "flatten" any of those, you end up with the horrible > > 129-bit OR gates (!) > > That may not be that big of a problem, since yosys should change that to a > 4-level tree of 3 or 4-input OR gates. when it comes to synthesis: true. you can tell however that this ISA was not "designed", it organically grew. > One other option is to use one or more ROMs for parts of the decoding > process, since they are optimized for those cases where you need dense > lookup tables. Something like a 7-bit address would work fine. i think... these are effectively Consts being assigned in a suite of Case statements, i have no idea if a ROM lookup would be more efficient, the address decode tree might actually be more gates, whereas these minor opcodes, they are a sparsely populated table with a considerable number of blanks, and a number of "early outs" if you know what i mean. so i suspect that a full address lookup cascade would be a lot more gates. i kinda like the code compactness of the Switch and Case system that Michael created, loading from csv files. it's kinda hilarious - reminds me of when i took the .txt file version of Microsoft's SMB Spec and literally spewed out c code structs and associated network parser from it :)
(In reply to Luke Kenneth Casson Leighton from comment #43) > ok i committed some experimental code (if False:) which gives > the general idea, where PowerDecoder would iterate recursively > through sub-encoders that have the bit-width specified that > they are to look into. > > > anyway. do let me know what you think, michael. So I tried this in 8af7da, and it works... Current minor_31_decoder (split 5 bits and 5 bits): Number of cells: 787 Longest topological path in top (length=16) Old way: Number of cells: 972 Longest topological path in top (length=15) So it's a little smaller, but a little deeper
(In reply to Michael Nolan from comment #46) > So I tried this in 8af7da, and it works... excellent! > Current minor_31_decoder (split 5 > bits and 5 bits): > Number of cells: 787 > Longest topological path in top (length=16) ok that's surprisingly not a lot smaller, i would have expected it to drop quite a lot. should work great on minor_19 as well. anyway: with the ability to do "sub-decoding" the idea that i had was to literally chain the entire lot together, in a tree of decoders, specified at the top level like in that "if False:" block. the idea being, we just have to write a tree-like array of specifications, pass that into *one* PowerDecoder, and it will (recursively) pass the next part of the spec-array to more PowerDecoder instances. otherwise we need to set that up "by hand"... with a top-level hand-coded Switch / Case statement, which, once done, i guarantee you will go, "hmm, this looks pretty much exactly like a PowerDecoder instance" :) you want to have a go at that?
hiya michael i've just spotted that minor_19.csv isn't quite ready yet, i'm working on getting it up to scratch so that it's in a position to do the hierarchical tree idea. i may just "cheat" and create a minor_19_00000.csv table.
(In reply to Luke Kenneth Casson Leighton from comment #47) > > the idea being, we just have to write a tree-like array of specifications, > pass that into *one* PowerDecoder, and it will (recursively) pass the next > part of the spec-array to more PowerDecoder instances. > > otherwise we need to set that up "by hand"... with a top-level hand-coded > Switch / Case statement, which, once done, i guarantee you will go, > "hmm, this looks pretty much exactly like a PowerDecoder instance" :) > > you want to have a go at that? I have something like that semi-working in b2bb2b, but it doesn't include extra.csv or minor_19_00000. (In reply to Luke Kenneth Casson Leighton from comment #48) > hiya michael i've just spotted that minor_19.csv isn't quite ready yet, > i'm working on getting it up to scratch so that it's in a position > to do the hierarchical tree idea. i may just "cheat" and create > a minor_19_00000.csv table. I get what extra.csv is for, though I'm not entirely sure how to integrate it with the decoder. What is minor_19_00000.csv for though?
(In reply to Michael Nolan from comment #49) > (In reply to Luke Kenneth Casson Leighton from comment #47) > > > > > the idea being, we just have to write a tree-like array of specifications, > > pass that into *one* PowerDecoder, and it will (recursively) pass the next > > part of the spec-array to more PowerDecoder instances. > > > > otherwise we need to set that up "by hand"... with a top-level hand-coded > > Switch / Case statement, which, once done, i guarantee you will go, > > "hmm, this looks pretty much exactly like a PowerDecoder instance" :) > > > > you want to have a go at that? > > I have something like that semi-working in b2bb2b, nice, will take a look. > but it doesn't include > extra.csv or minor_19_00000. ok. > I get what extra.csv is for, though I'm not entirely sure how to integrate > it with the decoder. yeah it is effectively at the same "level" as the major opcodes, and needs to be done.. mmmm... before them? in effect it is extra case statements that need to go into major.csv. the only "problem" is, the widths are different. therefore, we need to have *two* switch statements. i *think* this is doable by making all args of PowerDecode into lists, disabling the "default" switch, and initialising the "default" at the beginning. > What is minor_19_00000.csv for though? just as extra.csv is to be added in at the same level as major.csv, minor_19_00000 is likewise needing to be added to minor_19 ad a 2nd set of switches, but with *different* widths. hence the idea of making PowerDecode args into lists then putting a for-loop around the switch generation. what do you think?
so a *list* of subdecoders. comments inline. git pull.
(In reply to Luke Kenneth Casson Leighton from comment #50) > (In reply to Michael Nolan from comment #49) > > (In reply to Luke Kenneth Casson Leighton from comment #47) > > > > > > > > the idea being, we just have to write a tree-like array of specifications, > > > pass that into *one* PowerDecoder, and it will (recursively) pass the next > > > part of the spec-array to more PowerDecoder instances. > > > > > > otherwise we need to set that up "by hand"... with a top-level hand-coded > > > Switch / Case statement, which, once done, i guarantee you will go, > > > "hmm, this looks pretty much exactly like a PowerDecoder instance" :) > > > > > > you want to have a go at that? > > > > I have something like that semi-working in b2bb2b, > > nice, will take a look. > > > but it doesn't include > > extra.csv or minor_19_00000. > > ok. > > > > I get what extra.csv is for, though I'm not entirely sure how to integrate > > it with the decoder. > > yeah it is effectively at the same "level" as the major opcodes, and needs > to be done.. mmmm... before them? > > in effect it is extra case statements that need to go into major.csv. > > the only "problem" is, the widths are different. therefore, we need to have > *two* switch statements. > > i *think* this is doable by making all args of PowerDecode into lists, > disabling the "default" switch, and initialising the "default" at the > beginning. > > > What is minor_19_00000.csv for though? > > just as extra.csv is to be added in at the same level as major.csv, > minor_19_00000 is likewise needing to be added to minor_19 ad a 2nd set of > switches, but with *different* widths. > > hence the idea of making PowerDecode args into lists then putting a for-loop > around the switch generation. > > what do you think? I see now. So instead of passing in the bit selection and csv name, we'd pass in a list of bit selections and csvs, and it would cascade to the next csv if the first one doesn't match?
whoops i realised that it's the arguments to PowerDecoder that need to be passed in as a *list*, via either an object... actually, Subdecoder itself would almost do, if it also had "width". however i'd suggest creating something new called "Decoder".
(In reply to Michael Nolan from comment #52) > I see now. So instead of passing in the bit selection and csv name, we'd > pass in a list of bit selections and csvs, yes. and that includes width, and the *list* becomes the (almost only) argument to PowerDecoder. the width per PowerDecoder list element (a new namedtuple called Decoder?) has to be added because it specifies the width of the Switch statement. > and it would cascade to the next > csv if the first one doesn't match? or, just... they're all done in parallel, and we assume that there are no overlaps. actually, hmmm, probably a cascade is safer, althought it would introduce _another_ gate delay *sigh*. i think we can get away with assuming that the list-of-matches are unique even though they would be in separate switch statements.
hm even subdecoders needs to be one of the arguments to the "Decoder" namedtuple. i put in a Decoder namedtuple in commit 7da4ec503badf
doh. no on width being part of Decoders namedtuple. yes however on subdecoders... frickin 3am am just gonna do it, csn you check it when you have a mo?
oof. ok. couldn't help it, was staring at the code instead of sleep. sorry, michael :) can you do a review?
nngggh mmoooooo wtf why is the Power manual saying major opcodes are in bits 0-5 however microwatt and chiselwatt all specify the fields in *reverse* order?? moooo? https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl majorop := unsigned(f_in.insn(31 downto 26)); page 15 of Power V2.07B 0 6 11 16 21 31 OPCD RT RA /// XO / OPCD RT RA RB XO /
(In reply to Luke Kenneth Casson Leighton from comment #58) > nngggh mmoooooo wtf why is the Power manual saying major opcodes are in bits > 0-5 however microwatt and chiselwatt all specify the fields in *reverse* > order?? > moooo? > https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl > majorop := unsigned(f_in.insn(31 downto 26)); > > page 15 of Power V2.07B > > 0 6 11 16 21 31 > OPCD RT RA /// XO / > OPCD RT RA RB XO / Right, endianness. Ugh.
(In reply to Michael Nolan from comment #59) > > page 15 of Power V2.07B > > > > 0 6 11 16 21 31 > > OPCD RT RA /// XO / > > OPCD RT RA RB XO / > > Right, endianness. Ugh. ah, maaan - it's byte-endian, isn't it? so it's not quite as straightforward as bit-reversal. *sigh*. i'm tempted to suggest complying with what's in the spec rather than following exactly what anton did - then again that implies a lot of work. or having a close comparative look at chiselwatt.
We should check the spec to see what its conventions are -- some use bit 0 as MSB, some as LSB. Also, the memory interface may or may not byte swap.
(In reply to Jacob Lifshay from comment #61) > We should check the spec to see what its conventions are -- some use bit 0 > as MSB, some as LSB. Also, the memory interface may or may not byte swap. i *think* it's a combination of these two things, from V2.07B, P35, section 1.3.2 "Notation": - Bits in instructions, fields, and bit strings are numbered from left to right, starting with bit 0 - The leftmost bit of a sequence of bits is the most significant bit of the sequence. which is about as obtuse and technically correct as it's possible to get. having *separate* numbering for the ordering of the bits then specifying that the bits are ordered in MSB?? wtf?? :)
https://libre-riscv.org/openpower/isatables/fields/ (accessible as the straight .txt file from the git repo) i literally cut/paste that from the pdf (xpdf) and tidied it up. it would have been much better to have the source of the original document, particularly if it was in .tex format, but oh well. i've also added the "form" onto the end of the CSV files. some of them are missing because i think they're from POWER 3.0 (whoops)
(In reply to Luke Kenneth Casson Leighton from comment #63) > i literally cut/paste that from the pdf (xpdf) and tidied it up. > it would have been much better to have the source of the original > document, particularly if it was in .tex format, but oh well. They might be willing to give us the source, try asking :)
make sure to do form-lookup for: * attn * sim_cfg done (found in V3.0B) * bperm * cmpeqb * darn * dzbz * extswsli * mcrxrx * modud * moduw * modsd * modsw * setb
(In reply to Jacob Lifshay from comment #64) > They might be willing to give us the source, try asking :) true! :) (mailing list server's offline at the moment though)
ok i've written a parser for the fields.txt file, the idea being that it will create a series of objects that can be walked inside the code to access fields by an appropriate name, depending on the context (the instruction, basically). those data structures should not need to go actually into the HDL (i don't think), they are there to be used in the appropriate code (the load function and so on) or perhaps as a case/switch statement like in PowerDecoder to get the appropriate register entries and other fields. have to see how it goes, i think.
*face-palm*. i went to all the trouble of putting the fields into a text file yesterday, thinking that there was missing information, wrote a parser for it... then realised that the 3.0B specification had everything that's needed. example: EO (11:15) Expanded opcode field Formats: VX, X, XX2 that was the information i thought was missing (the Formats). the idea was: parse the entirety of the opcode-format tables, extract the Formats that they apply to. then in the nmigen code, it's possible to create structures where we can do: (start, end) Formats.VX.EO[0:1] but... duhhh, that's already in section 1.7 of 3.0B duhhh, you can see above. *sigh*. well, at least we have a consistency checker.
ok i now have the other section of the specification decoded (1.7 Instruction Fields in 3.0B, 1.6.8 Instruction Fields in 2.07B) this basically will allow us to construct python objects that, when used, will allow the following: FormX.FRSp[0:-1] this will return bits 6 thru 10 of the instruction (*sigh* actually bits 31-6 thru 31-10) because all the indices are appropriately recorded in memory. this even works on multi-bit fields such as this: dc,dm,dx (25,29,11:15) Immediate fields that a Data Class Mask. Formats: XX2 which would be accessed as: FormXX2.dc_dm_dx[0:7] where FormXX2.dc_dm_dx[0] would return bit 25 (okok 31-25) FormXX2.dc_dm_dx[1] would return bit 29 (okok 31-29) and so on. it's still a bit of work to do. i think this will be a heck of a lot easier than messing about trying to memorise bit-field numbering, constantly having to refer back to the PDF spec.
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/power_decoder2.py;h=34ac66836870324a55840d890ffc741a98845fb5;hb=HEAD ok made a start on the register / immediate / spr / etc. decoder. it will need careful double-checking (help appreciated) not just against anton's original work (decode2.vhdl and other files), against the PDF V3.0B manual as well.
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/decoder/power_decoder2.py;h=5d706d9386e2f65e666d4a06408d43c1547c3913;hb=HEAD#l315 pulling it all together, some pieces still missing, needs a bit of rework: classes that return "register num plus register valid" as an object, that sort of thing, otherwise there's a lot of things to copy, laboriously.
(In reply to Luke Kenneth Casson Leighton from comment #71) > classes that return "register num plus register valid" as an object, that sorted.
Huh, apparently the register numbers need to be bit reversed in our decoder.
(In reply to Michael Nolan from comment #73) > Huh, apparently the register numbers need to be bit reversed in our decoder. eeeverything needs to be bit-reversed. see 1.3.2 "notation" - Bits in instructions, fields, and bit strings are numbered from left to right, starting with bit 0 - The leftmost bit of a sequence of bits is the most significant bit of the sequence. *gibber*. so i put the reversing into.. err... where was it... SignalBitRange.__getitem__() - you can see (i just commented it) if you can double-check that, it would be really good. i tend to get these things wrong. ultimately, we'll find out pretty damn quick when it comes to executing actual code.
(In reply to Luke Kenneth Casson Leighton from comment #74) > (In reply to Michael Nolan from comment #73) > > Huh, apparently the register numbers need to be bit reversed in our decoder. > > eeeverything needs to be bit-reversed. see 1.3.2 "notation" Well yes... But I thought we did that already, evidently not. > if you can double-check that, it would be really good. i tend > to get these things wrong. Ok
From what I understand, it's just the way bits are numbered that's reversed, the MSB is still the MSB. So, just the field order is reversed, individual fields are not since they are treated as a multi-bit number and the MSB is still the MSB (and other bits also keep their place in a number). Example: Power bit order: bit # 0 1 2 3 4 5 6 7 0 1 0 1 1 1 1 0 fields: A:0-3, B:4, C:5-7 field A 0-3: 0101 = 5 field B 4: 1 = 1 field C 5-7: 110 = 6 switched bit numbering order: nmigen/RISC-V bit order: bit # 7 6 5 4 3 2 1 0 0 1 0 1 1 1 1 0 fields: C:0-2, B:3, A:4-7 fields have same value but different order: field C 0-2: 110 = 6 field B 3: 1 = 1 field A 4-7: 0101 = 5
(In reply to Jacob Lifshay from comment #76) > From what I understand, it's just the way bits are numbered that's reversed, > the MSB is still the MSB. So, just the field order is reversed, individual > fields are not since they are treated as a multi-bit number and the MSB is > still the MSB (and other bits also keep their place in a number). ngggh mindbender. so by doing width-idx-1 it has reversed the bitorder of each bit in the field. ye gods. can you and michael take a look at this properly, suggest reviewing the pearpc and gem5 source code, and chiselwatt.
In any case, what matters is that the instructions are decoded correctly. Some example disassembly from objdump (edited to convert decimal to hex): test.o: file format elf64-powerpcle Disassembly of section .text: 0000000000000000 <f>: 0: 01 00 20 3d addis r9,0,1 4: 45 23 29 61 ori r9,r9,0x2345 8: d2 49 84 7c mulld r4,r4,r9 c: d2 19 64 7c mulld r3,r4,r3 10: 20 00 80 4e bclr 20,lt Note how the immediate values for ori and addis are not bit-reversed (they are in little-endian byte order, though). I haven't checked, but I'm quite sure the register fields are also not bit-reversed.
Ok, so I've got an instruction, for example: add 4, 16, 7 which gets assembled to: 0x7c903a14 or: value: 0111 1100 1001 0000 0011 1010 0001 0100 field: oooo oott ttta aaaa mmm mmmm mmm number: 0123 4567 8901 2345 6789 0123 4567 8901 o: Opcode = 31 m: Opcode minor = 266/0b0100001010 - ADD - form XO t: RT = 0b00100 = 4 This all looks right, right? Somehow the decoder is getting write_reg=8 (write_reg should be RT) out of this, and I'm not sure how. 0b00100 is 4 regardless of bit ordering... I double checked, and pdecoder.dec.RT is SignalBitRange([(0, 6), (1, 7), (2, 8), (3, 9), (4, 10)]) which seems right. Any idea of where to look?
rright. ok. so i think, to "fix" this, in SignalBitRange, the slice indices - start, end (and direction) all need to be reversed *as well*. oink.
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/power_fieldsn.py;h=a5e03a114ec54d915deb13f20c99e8ab59078dc5;hb=HEAD#l31 try subtracting start and end from from len-1, and sign inverting step, at line 31
(In reply to Michael Nolan from comment #79) > Ok, so I've got an instruction, for example: > add 4, 16, 7 > which gets assembled to: > 0x7c903a14 > or: > value: 0111 1100 1001 0000 0011 1010 0001 0100 > field: oooo oott ttta aaaa mmm mmmm mmm > number: 0123 4567 8901 2345 6789 0123 4567 8901 > > o: Opcode = 31 > m: Opcode minor = 266/0b0100001010 - ADD - form XO > t: RT = 0b00100 = 4 > > This all looks right, right? > > Somehow the decoder is getting write_reg=8 (write_reg should be RT) out of > this, and I'm not sure how. 0b00100 is 4 regardless of bit ordering... > > I double checked, and pdecoder.dec.RT is > SignalBitRange([(0, 6), (1, 7), (2, 8), (3, 9), (4, 10)]) > which seems right. Any idea of where to look? Oh I see. It's bit reversing the whole word
(In reply to Luke Kenneth Casson Leighton from comment #81) > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/ > power_fieldsn.py;h=a5e03a114ec54d915deb13f20c99e8ab59078dc5;hb=HEAD#l31 > > try subtracting start and end from from len-1, and sign inverting step, at > line 31 It's giving a key error because the index (31) isn't in the "dict" (I printed out the object and it gave SignalBitRange([(0, 31)]), and I assume it treats that as {0: 31}). What's the purpose of line 1? k = OrderedDict.__getitem__(self, t) res.append(self.signal[k]) # reverse-order here I think it'd work ok if I just grabbed the bit from the signal directly, like so: res.append(self.signal[t])
(In reply to Michael Nolan from comment #83) > (In reply to Luke Kenneth Casson Leighton from comment #81) > > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/decoder/ > > power_fieldsn.py;h=a5e03a114ec54d915deb13f20c99e8ab59078dc5;hb=HEAD#l31 > > > > try subtracting start and end from from len-1, and sign inverting step, at > > line 31 > > It's giving a key error because the index (31) isn't in the "dict" (I > printed out the object and it gave SignalBitRange([(0, 31)]), and I assume > it treats that as {0: 31}). > > What's the purpose of line 1? > k = OrderedDict.__getitem__(self, t) > gets the remapping of what bit you want in the field onto what ACTUAL bit you want in the instruction. res.append(self.signal[k]) # reverse-order here > I think it'd work ok if I just grabbed the bit from the signal directly, > like so: > res.append(self.signal[t]) nope. that will get you bit 0 of the instruction when you request bit 0 ofa **FIELD** in the instruction. so, apologies it was reverse the order of t not k
line 32 for t in range(lenself - stop - 1, lenself - start - 1) except this is wrong because bits are not stepped properly so it is probably l = list(range(start, stop, step) l = l.reverse() # probably for t in l: t = len(self) - t - 1 carry on for the else bit just do the t sub-from-self before lookup, too
really like the gnu gas test idea. could you also plan for covering BE as well as LE? but... later, we have enough brainmelting going on
(In reply to Michael Nolan from comment #82) > Oh I see. It's bit reversing the whole word yes. that is what width-k-1 is for. and then for the field which is local that needs to be reversed *back*. this is what we are missing. k is into the instruction (0..31) t is into the FIELD and just to make life rreally interesting it can cover MULTIPLE fields. joy.
michael, fired up laptop brielfly when stopped git commited field-index reverse please check it, should work.
ok i fixed some dumb off-by-one errors in decode_fieldsn.py and the gas example works (yay!) am tempted to suggest throwing a ton more random stuff at it.
(In reply to Luke Kenneth Casson Leighton from comment #89) > ok i fixed some dumb off-by-one errors in decode_fieldsn.py and the gas > example works (yay!) am tempted to suggest throwing a ton more random stuff > at it. Yep, looks like you fixed the register thing. I'll get to adding more instructions later today
(In reply to Michael Nolan from comment #90) > Yep, looks like you fixed the register thing. dang that was a good guess > I'll get to adding more instructions later today i love this idea. btw did we raise a specific bugreport for using gnu-as?
(In reply to Luke Kenneth Casson Leighton from comment #91) > > i love this idea. btw did we raise a specific bugreport for using gnu-as? Yes: http://bugs.libre-riscv.org/show_bug.cgi?id=212
y'know what, michael... you see in soc experiments comp6600 there is the beginnings of a Simulator? i have an idea, it might be worthwhile to run with that to its logical obvious implication, namely to actually have a simulator, in soc, in python, which interprets the InternalOps. we then have a reference implementation which we can document properly etc. with links to the PDF. also it means we have to write the code and so understsnd it. speed is not crucial as the nmigen simulator will be screamingly slow. we can use cocotb later. also at some point we can drop-in power-gem5. what does everyone think?
(In reply to Luke Kenneth Casson Leighton from comment #93) > y'know what, michael... you see in soc experiments comp6600 there is the > beginnings of a Simulator? > > i have an idea, it might be worthwhile to run with that to its logical > obvious implication, namely to actually have a simulator, in soc, in python, > which interprets the InternalOps. > > we then have a reference implementation which we can document properly etc. > with links to the PDF. > > also it means we have to write the code and so understsnd it. > > speed is not crucial as the nmigen simulator will be screamingly slow. we > can use cocotb later. > > also at some point we can drop-in power-gem5. > > what does everyone think? Yeah, seems reasonable. This wouldn't be cycle accurate or anything right, just a simulator for our eventual backend?
(In reply to Michael Nolan from comment #94) > Yeah, seems reasonable. This wouldn't be cycle accurate or anything right, well, cycle accurate just means that everything on each clock is exactly as it would be after one clock it's actually harder to do a nonaccurate simulator as it involves JIT translation etc. cycle accurate means doing TLBs in the simulator, too, and that means we have something to test the hardware against, there, too. > just a simulator for our eventual backend? yes. however adding a power frontend on it as we found ain't so bad. can use the same code. if you look here you can see how basic the test simulator is https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/score6600.py;h=babce2439233f84c262c08795b873a6167197ec9;hb=HEAD#l859 adjusting that so it uses InternalOp is... er... not hard? :)
(In reply to Luke Kenneth Casson Leighton from comment #95) > yes. however adding a power frontend on it as we found ain't so bad. can > use the same code. > > if you look here you can see how basic the test simulator is > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/score6600. > py;h=babce2439233f84c262c08795b873a6167197ec9;hb=HEAD#l859 > > adjusting that so it uses InternalOp is... er... not hard? :) Luke, looks like you and I are doing different things...? It looks to me like score6600 is actual hardware, not a simulator. I have basic simulator I'm working on here https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/internalop_sim.py;h=813dd19c40e21283d000fd597074a2cb38595a8a;hb=e95b4372c5163bb6f753c92d8445d4af7f117457 - I took some of the elements from score6600 (like the register and memory file) and modified them to get a basic simulator, which I'm testing along side a real power machine. Were you going for something different...?
(In reply to Michael Nolan from comment #96) > (In reply to Luke Kenneth Casson Leighton from comment #95) > > > yes. however adding a power frontend on it as we found ain't so bad. can > > use the same code. > > > > if you look here you can see how basic the test simulator is > > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/experiment/score6600. > > py;h=babce2439233f84c262c08795b873a6167197ec9;hb=HEAD#l859 > > > > adjusting that so it uses InternalOp is... er... not hard? :) > > Luke, looks like you and I are doing different things...? It looks to me > like score6600 is actual hardware, not a simulator. i'd literally just moved RegSim and MemorySim into their own folder, you put internalop_sim.py in the same location. they should be pretty much identical in function. > I have basic simulator > I'm working on here > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/ > internalop_sim.py;h=813dd19c40e21283d000fd597074a2cb38595a8a; > hb=e95b4372c5163bb6f753c92d8445d4af7f117457 - I took some of the elements > from score6600 (like the register and memory file) and modified them to get > a basic simulator, which I'm testing along side a real power machine. Were > you going for something different...? take a look at sim.py it is near identical to what you did :) although you have, i think, also added a decoder on it? so what you wrote can take actual power instructions, is that right? sim.py "tracks" what the real hardware does. so the idea is that after running the exact same instructions through each, and waiting for the instruction queue to indicate "no longer busy processing", we can do a memory-memory compare and a regfile-regfile compare and they should be *exactly* the same. i wrote a function which does exactly that, one for mem and one for regfile. unfortunately because this is a multiissue OoO design we cannot take a snapshot at a given time and say "yeah this will be exactly the same", you have to pause, wait for outstanding ops, *then* snapshot. that "pause" means unfortunately that it affects internal state, because FUs are emptying rather than being filled with new operations. but, it is the best we can do. oh btw, one really important thing to remember: at sone point the sim will do dynamic SIMD just like in the PartitionedSignal. code to reuse in InternalOpSim has already been written... in the PartitionedSignal unit tests! and, actually, hm, we could probably just lift that code wholesale into a class then rework the psig unit test to use it, *and* use that same code here.
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim.py;hb=HEAD y'know... the comment "checked this against qemu" got me thinking. this looks like it could be used to actually run any of: * qemu * the simulator * the score6600 engine * pearpc * gem5 i know qemu can be run in bare metal mode, console only, for example. it would not be cycle accurate however would give us a jumpstart on running really basic programs.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_administration_guide/sect-vish-dump crashdump --memory only would get a .elf file with memory and registers. executing a bare metal .bin file under qemu with a small memory would give something not too massive to parse https://github.com/eliben/pyelftools/blob/master/examples/dwarf_decode_address.py hmm it seems there are at least 2 different python elf parsers. the only thing we have to watch, these are big dependencies we're pulling in (i'm on a mobile 4G connection). luckily i have qemu already installed
one issue with qemu is that it doesn't properly implement everything -- for example, it doesn't appear to implement the FPSCR.FR bit (I've been reading its source as an additional reference for simple-soft-float).
(In reply to Jacob Lifshay from comment #100) > one issue with qemu is that it doesn't properly implement everything -- for > example, it doesn't appear to implement the FPSCR.FR bit yuck > (I've been reading > its source as an additional reference for simple-soft-float). ok so when it comes to FP we can't expect that to work. by then we should be a long way forward at least with running something.
(In reply to Luke Kenneth Casson Leighton from comment #98) > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim. > py;hb=HEAD > > y'know... the comment "checked this against qemu" got me thinking. this > looks like it could be used to actually run any of: > > * qemu what do you think, michael: does firing off qemu to run those same (tiny) binaries sound practical?
(In reply to Luke Kenneth Casson Leighton from comment #102) > (In reply to Luke Kenneth Casson Leighton from comment #98) > > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/test_sim. > > py;hb=HEAD > > > > y'know... the comment "checked this against qemu" got me thinking. this > > looks like it could be used to actually run any of: > > > > * qemu > > what do you think, michael: does firing off qemu to run those same > (tiny) binaries sound practical? It doesn't seem like a bad idea, especially since we don't have another power simulator to compare to yet. Getting the instructions/data into qemu is pretty easy, just compile and link a bare metal kernel. I've got a folder with my manual method for this already, and I can add it to the git repo. However, getting the data (registers/memory) out of qemu might be a bit harder. Probably the best way would be to use the gdb socket that qemu makes available for reading registers and memory. I know GDB is written with python in mind, so it *should* be doable to control it from python, but I haven't investigated that yet.
just taking a closer look at python-libvirt, and trying to find the examples i saw which turn qemu params into xml files also this: https://www.programcreek.com/python/example/28110/libvirt.open canCreateVM - is able to run virsh, which will get a coredump (including regfile as well as memory). see earlier comments, yes do drop the qemu example into a subfolder. be with you in a sec, just investigating.
(In reply to Luke Kenneth Casson Leighton from comment #104) > see earlier comments, yes do drop the qemu example into a subfolder. It's now uploaded here: https://git.libre-riscv.org/?p=soc.git;a=tree;f=src/soc/simulator/qemu_test;h=50cde50b45e9af13b3c070c8b9618a956f63d3b5;hb=40c5ec1f0df4bed1d78b1d0d019b399450d87ced
https://libvirt.org/drvqemu.html shows how to convert qemu to xml if you have some args already ("virsh domxml-from-native")
currently attempting to get python-libvirt to "start" launch.sh. wooow is this painful. obtuseness beyond obtuseness. /usr/bin/qemu-system-ppc64 \ -name guest=Linux-0.2,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-Linux-0.2/master-key.aes \ -machine powernv9,accel=tcg,usb=off,dump-guest-core=off \ -m 1024 \ -overcommit mem-lock=off \ -smp 1,sockets=1,cores=1,threads=1 \ -uuid ce1326f0-a9a0-11e3-a5e2-0800200c9a66 \ -display none \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=33,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc \ -no-shutdown \ -boot strict=on \ -kernel /home/lkcl/testdisk.raw \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2020-03-25T19:38:07.721499Z qemu-system-ppc64: error creating device tree: (fdt_property_string(fdt, "system-id", buf)): FDT_ERR_BADSTATE 2020-03-25 19:38:07.758+0000: shutting down, reason=failed https://bugzilla.redhat.com/show_bug.cgi?id=1624539 nggggggh. attempting to downgrade qemu-system-ppc so that the older version of libfdt can be used apt-get install libfdt1=1.4.2-1 qemu-system-ppc=1:3.1+dfsg-8+deb10u3 qemu-system=1:3.1+dfsg-8+deb10u3 qemu-system-misc=1:3.1+dfsg-8+deb10u3
https://bugzilla.redhat.com/show_bug.cgi?id=1585991 woooow....
I'm going to try using https://sourceware.org/gdb/wiki/PythonGdb to see if I can't get something working that way.
I've got a rudimentary qemu-gdb interface working in qemu.py: https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=70e29b0d76b65227a06d3ba3bd95315cf5af37e8;hb=3f7e6c771c7f8df65e67e6dc350964a54aee8066 It's able to load a kernel into qemu, set up a breakpoint at the start of it, single step it, and dump the register contents.
deep breath.... install dependencies (saves messing about) with apt-get build-dep qemu git clone from here: [remote "origin"] url = https://github.com/legoater/qemu.git fetch = +refs/heads/*:refs/remotes/origin/* git checkout this branch: * origin/powernv-next configure/build with this: ./configure --enable-system --target-list=ppc64-softmmu,ppc64-linux-user,ppc64le-linux-user edit hw/ppc/pnv.c and comment out these lines: /* if (qemu_uuid_set) { _FDT((fdt_property_string(fdt, "system-id", buf))); } */ run python3 domstart.py (files to be attached shortly) dump the state to a file: virsh # dump Linux-0.2 mem.hex --memory-only --crash objdump the results with this: objdump -x mem.hex which gives the following (including regfile and memory): architecture: powerpc:common64, flags 0x00000000: start address 0x0000000000000000 Program Header: NOTE off 0x00000000000000b0 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**0 filesz 0x0000000000000710 memsz 0x0000000000000710 flags --- LOAD off 0x00000000000007c0 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**0 filesz 0x0000000040000000 memsz 0x0000000040000000 flags --- Sections: Idx Name Size VMA LMA File off Algn 0 note0 00000710 0000000000000000 0000000000000000 000000b0 2**0 CONTENTS, READONLY 1 .reg/0 00000180 0000000000000000 0000000000000000 00000134 2**2 CONTENTS 2 .reg 00000180 0000000000000000 0000000000000000 00000134 2**2 CONTENTS 3 .reg2/0 00000108 0000000000000000 0000000000000000 000002d0 2**2 CONTENTS 4 .reg2 00000108 0000000000000000 0000000000000000 000002d0 2**2 CONTENTS 5 load1 40000000 0000000000000000 0000000000000000 000007c0 2**0 CONTENTS, ALLOC, LOAD, READONLY and thus can be parsed with python elf libraries mentioned in previous comments.
Created attachment 47 [details] libvirt xml file
Created attachment 48 [details] copy of python libvirt domain start example obtained from here: https://github.com/libvirt/libvirt-python/tree/master/examples
actually, i like what you wrote a whooole lot better, michael: https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/soc/simulator/qemu.py;h=c9b1b573dd22d3da0adb0415a0a6b8bb005c99c6;hb=4b50a170c413f47e0527c843be49babc493c684b the only modification i needed: set architecture powerpc:common64 target remote localhost:1234 layout asm b *0x20000000 c and to use debian package "gdb-multilib" which *sigh* had to come from debian/unstable. building cross-compilers etc. from source has always stressed me out :) if we do that, we need a proper documented reproducible process. how did you get powerpc64-linux-gnu-gdb?
i enabled export PYTHONTRACEMALLOC=25 https://docs.python.org/3/using/cmdline.html#envvar-PYTHONTRACEMALLOC /usr/lib/python3.7/subprocess.py:858: ResourceWarning: subprocess 1551551 is still running ResourceWarning, source=self) Object allocated at (most recent call last): File "simulator/test_sim.py", lineno 106 unittest.main() File "/usr/lib/python3.7/unittest/main.py", lineno 101 self.runTests() File "/usr/lib/python3.7/unittest/main.py", lineno 271 self.result = testRunner.run(self.test) File "/usr/lib/python3.7/unittest/runner.py", lineno 176 test(result) File "/usr/lib/python3.7/unittest/suite.py", lineno 84 return self.run(*args, **kwds) File "/usr/lib/python3.7/unittest/suite.py", lineno 122 test(result) File "/usr/lib/python3.7/unittest/suite.py", lineno 84 return self.run(*args, **kwds) File "/usr/lib/python3.7/unittest/suite.py", lineno 122 test(result) File "/usr/lib/python3.7/unittest/case.py", lineno 663 return self.run(*args, **kwds) File "/usr/lib/python3.7/unittest/case.py", lineno 615 testMethod() File "simulator/test_sim.py", lineno 58 self.run_test_program(program, [1, 2, 3, 4]) File "simulator/test_sim.py", lineno 95 with run_program(prog) as q: File "/home/lkcl/src/libreriscv/soc/src/soc/simulator/qemu.py", lineno 54 q = QemuController(program.binfile.name) File "/home/lkcl/src/libreriscv/soc/src/soc/simulator/qemu.py", lineno 16 stdin=subprocess.PIPE) simulator/test_sim.py:58: ResourceWarning: unclosed file <_io.BufferedWriter name=4> self.run_test_program(program, [1, 2, 3, 4]) this gets rid of it: self.qemu_popen.stdout.close() self.qemu_popen.stdin.close() except one of the qemus is still running, thissays you have to call qemu_popen.communicate(): https://docs.python.org/3/library/subprocess.html fixed with a quick commit.
i've also added pygdbmi to the dependencies in setup.py and added a way to get hold of and compile up powerpc64 gdb. michael (and anyone else) could you take a look? https://git.libre-riscv.org/?p=soc.git;a=blob;f=README.md;hb=HEAD
(In reply to Luke Kenneth Casson Leighton from comment #115) > > except one of the qemus is still running, thissays you have > to call qemu_popen.communicate(): > https://docs.python.org/3/library/subprocess.html > > fixed with a quick commit. Oh, thank you! I was struggling to figure out where this was coming from.
(In reply to Luke Kenneth Casson Leighton from comment #116) > i've also added pygdbmi to the dependencies in setup.py and added a way > to get hold of and compile up powerpc64 gdb. michael (and anyone else) > could you take a look? > https://git.libre-riscv.org/?p=soc.git;a=blob;f=README.md;hb=HEAD Yep, that looks right
>coooool. > >confirmed able to duplicate that, here. > >i did encounter KeyError, register-values not present but it is spurious >and unreliably occurring. Huh. It sounds like that's from the register read code, where it reaches in to a largeish dict to grab the register values. I'll have to see if I can replicate it on my end. > >i added an assert to investigate and, sigh, was no longer seeing failures >even after running 20 times. > >the other really useful addition would be memory-dump / memory location >reading. That shouldn't be too hard to add. Similar to the register comparison code, it'll need to only do it to specific locations as comparing all of memory would not only take too long, but disagree because qemu loads the kernel and some other stuff into memory and the simulator does not. >btw change of topic, i noticed, just like in the original RegSim, OP_ADD is >not joined by OP_SUB because Anton chose to do "add1 to op1" as well as >"invert op1" in the InternalOp csv. > >therefore we need to pass the whole of the decode row to the >execute_alu_sim function, rather than just the enum InternalOp value. Ah ok.
(In reply to Michael Nolan from comment #118) > (In reply to Luke Kenneth Casson Leighton from comment #116) > >the other really useful addition would be memory-dump / memory location > >reading. > > That shouldn't be too hard to add. Similar to the register comparison code, yes just the right voodoo gdb incantation. i think it allows byte word dword selection. > it'll need to only do it to specific locations as comparing all of memory > would not only take too long, but disagree because qemu loads the kernel and > some other stuff into memory and the simulator does not. urk :) > > >btw change of topic, i noticed, just like in the original RegSim, OP_ADD is > >not joined by OP_SUB because Anton chose to do "add1 to op1" as well as > >"invert op1" in the InternalOp csv. > > > >therefore we need to pass the whole of the decode row to the > >execute_alu_sim function, rather than just the enum InternalOp value. > > Ah ok. yeh :) should be very straightforward (you want to do it?) a separate if elif elif, "if op1.value == Op1.ONE op1 += 1" can skip carry for now, add as TODO. then also if row['op1neg'] op1 ^= 0xffffffffffffffff something like that. or op1 = (~op1) & 0xffffffffffffffff that _should_ be sufficient to get sub, subi and neg all working, with not a lot of effort.
(In reply to Luke Kenneth Casson Leighton from comment #120) > > should be very straightforward (you want to do it?) a separate if elif elif, > "if op1.value == Op1.ONE op1 += 1" > > can skip carry for now, add as TODO. > > then also if row['op1neg'] op1 ^= 0xffffffffffffffff > > something like that. > > or op1 = (~op1) & 0xffffffffffffffff > > that _should_ be sufficient to get sub, subi and neg all working, with not a > lot of effort. Working now. Need to sort out setting the carry and overflow flags at some point though, that'll be more fun...
(In reply to Michael Nolan from comment #121) > > that _should_ be sufficient to get sub, subi and neg all working, with not a > > lot of effort. > > Working now. yay! oh wait... F ====================================================================== FAIL: test_sub (__main__.DecoderTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "simulator/test_sim.py", line 98, in test_sub self.run_test_program(program, [1, 2, 3, 4, 5]) File "simulator/test_sim.py", line 105, in run_test_program qemu_register_compare(simulator, q, reglist) File "simulator/test_sim.py", line 111, in qemu_register_compare simulator.regfile.assert_gpr(reg, qemu_val) File "/home/lkcl/src/libreriscv/soc/src/soc/simulator/internalop_sim.py", line 74, in assert_gpr assert reg_val == val, msg AssertionError: reg r3 got 10000000000004444, expecting 4444 ---------------------------------------------------------------------- Ran 5 tests in 15.696s FAILED (failures=1) how many bits is that 0x 1 0000 0000 0000 4444 ha, that's amusing. the result is a roll-over from 64-bit arithmetic > Need to sort out setting the carry and overflow flags at some > point though, that'll be more fun... yehyeh. a SPR storing the carry, first. take a look at pearpc, it should show a way to do it (that's easy to interpret) https://github.com/sebastianbiallas/pearpc
i hacked masking the write to take out the upper 1 in write_reg.
(In reply to Luke Kenneth Casson Leighton from comment #122) > how many bits is that > 0x 1 0000 0000 0000 4444 > > ha, that's amusing. the result is a roll-over from 64-bit arithmetic > > > Need to sort out setting the carry and overflow flags at some > > point though, that'll be more fun... > > yehyeh. a SPR storing the carry, first. take a look at pearpc, > it should show a way to do it (that's easy to interpret) > https://github.com/sebastianbiallas/pearpc Ooops, some of my experiments with carry must have snuck in, sorry. Speaking of, I'm getting some weird behavior with carry on qemu. If I run some instructions that generate a carry or overflow, then have gdb read the xer register, it returns 0. But if I execute a `mfxer r5` or the like and print out that register, it gives me the correct value for xer. I'm trying to confirm whether this is a bug with qemu or not.
(In reply to Michael Nolan from comment #124) > (In reply to Luke Kenneth Casson Leighton from comment #122) > > > how many bits is that > > 0x 1 0000 0000 0000 4444 > > > > ha, that's amusing. the result is a roll-over from 64-bit arithmetic > > > > > Need to sort out setting the carry and overflow flags at some > > > point though, that'll be more fun... > > > > yehyeh. a SPR storing the carry, first. take a look at pearpc, > > it should show a way to do it (that's easy to interpret) > > https://github.com/sebastianbiallas/pearpc > > Ooops, some of my experiments with carry must have snuck in, sorry. no worries > Speaking > of, I'm getting some weird behavior with carry on qemu. If I run some > instructions that generate a carry or overflow, then have gdb read the xer > register, it returns 0. But if I execute a `mfxer r5` or the like and print > out that register, it gives me the correct value for xer. I'm trying to > confirm whether this is a bug with qemu or not. hmmm. if it's a bug in qemu it's one that, i suspect, would have been detected a loooong time ago. but... you never know... let's look at the spec: p70 3.0B "addc": RT <= (RA) + (RB) The sum (RA) + (RB) is placed into register RT. Special Registers Altered: CA CA32 CR0 (if Rc=1) SO OV OV32 (if OE=1) ok so SO (summary overflow), OV (overflow) and OV32 (overflow 32-bit) all get set, in XER. these are listed on p45 3.2.2 however that's not the "carry" flag: that's *CR0*. which is in... errr... the Condition Register p45 2.3.1 let's check the pearpc source (yes it's only 32-bit). ppc_alu.c whereisitwhereisit... ok! https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_alu.cc#L56 ahh, then as a *separate* concern, if you see the implementation of addcx, that updatex XER_CA *and* updates CR0: https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_alu.cc#L86
here's ppc_update_cr0: https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_opc.h#L26 #define CR_CR0_LT (1<<31) #define CR_CR0_GT (1<<30) #define CR_CR0_EQ (1<<29) #define CR_CR0_SO (1<<28) * clears the CR-CR0 bits * result is zero: sets bit 29 * result top bit is set (indicating -ve as signed): sets bit 31 * otherwise (+ve signed): sets bit 30 * after that (in *addition*): if the XER SO bit was set, set bit 28 of CR0 interestingly, then, it looks like you have to check the XER flag *before* checking the carry. and XER is set based on... err... comparing operand A with the result. how odd. https://github.com/sebastianbiallas/pearpc/blob/master/src/cpu/cpu_generic/ppc_alu.cc#L86
oh - i just spotted this in p30, section 2.3.1. it seems to concur with ppc_update_cr0 *except* of course, it's been adapted for 32/64-bit modes. if (64-bit mode) then M <= 0 else M <= 32 if (target_register)[M:63] < 0 then c <= 0b100 # bit 31 else if (target_register)[M:63] > 0 then c <= 0b010 # bit 30 else c <= 0b001 # bit 29 CR0 <= c || XERSO # bit 28
(In reply to Luke Kenneth Casson Leighton from comment #125) > hmmm. if it's a bug in qemu it's one that, i suspect, would have been > detected > a loooong time ago. but... you never know... > > ok so SO (summary overflow), OV (overflow) and OV32 (overflow 32-bit) > all get set, in XER. these are listed on p45 3.2.2 > Sorry, was actually overflow that I was checking: li 1, 1 sldi 1, 1, 63 or 2, 1, 1 addco. 3, 2, 1 mfxer 5 That should add -0x80000000000000000 to itself and overflow, and the "o" means that OV and SO should actually be set. Like I said before, running it in qemu and reading the xer register gives a result of 0, and reading r5 gives a result of 0xe0000000. > however that's not the "carry" flag: that's *CR0*. which is in... errr... > the Condition Register p45 2.3.1 It does also list CA and CA32, which should correspond to the flags in xer as well.
https://github.com/qemu/qemu/blob/master/target/ppc/translate.c#L950 eurrrgh... macros-on-macros-on-macros... *sigh*. ok so we're looking at a trail that goes through GEN_HANDLER, GEN_OPCODE, GEN_INT_ARITH_ADD but ultimately lands at geo_op_addic which in turn lands in gen_op_arith_add. here's where addco sets the flags compute_ca=1, compute_ov=1 *but* sets add_ca=0 whereas addeo *does* set all three (line #926) https://github.com/qemu/qemu/blob/master/target/ppc/translate.c#L924 the flag "cpu_ca" comes in globally (initialised at line 132). looking at the spec, the difference between addco and addeo is supposed to be: addco RT <= (RA) + (RB) The sum (RA) + (RB) is placed into register RT. addeo RT <= (RA) + (RB) + CA The sum (RA) + (RB) + CA is placed into register RT. hmmm need to look at microwatt, will do that in a minute.
https://github.com/antonblanchard/microwatt/blob/master/ppc_fx_insns.vhdl#L110 ppc_adde(): return 65-bit RA + 65-bit RB + carry https://github.com/antonblanchard/microwatt/blob/master/ppc_fx_insns.vhdl#L140 ppc_add(): return 64-bit RA + 64-bit RB hmm ppc_add() doesn't appear to be used (at all) in execute1.vhdl decode_input_carry(ic, XERC) is: if ic=ZERO return 0 if ic=ONE return 1 if ic=CA return XERC.ca returning XERC.ca might be a significant difference, there... ah! yes, if you look in the "cry_in" column, adde and addeo are supposed to take cry_in=CA *not* cry_in=ZERO. that's likely to be it.
btw don't be afraid to commit unit tests that don't "quotes work quotes". when it comes to unit tests, the whole idea is to have "things-that-demonstrate-a-need-to-fix-code" rather than "have everything absolutely perfect". so do go ahead and commit some unit tests that actually break, but you expect them to pass at some point in the future. i say that because in the past i noticed that you commented out some unit tests which "failed". just leave them to fail, as a reminder. also there's a way to tag them with bugreports (with decorators). i'll document this in HDL_workflow.
(In reply to Luke Kenneth Casson Leighton from comment #131) > btw don't be afraid to commit unit tests that don't "quotes work quotes". > > when it comes to unit tests, the whole idea is to have > "things-that-demonstrate-a-need-to-fix-code" rather than "have everything > absolutely perfect". > > so do go ahead and commit some unit tests that actually break, but you > expect them to pass at some point in the future. > > i say that because in the past i noticed that you commented out some unit > tests which "failed". just leave them to fail, as a reminder. also > there's a way to tag them with bugreports (with decorators). > > i'll document this in HDL_workflow. Actually, I'd decorate them with something like: import unittest @unittest.skip("not working yet") def test_fn(): body-here or @unittest.expectedFailure # FIXME: not working yet def test_fn(): body-here That way, once I add support for GitLab CI we will get a message when one of the commits fails any of the tests, allowing us to more quickly detect that something unexpectedly broke. If it's filled with a whole bunch of tests that we know will fail, then that makes the CI nearly worthless because it will always be filled with a bunch of failures obscuring the ones we care about.
I've tried to run the tests that use GDB, they still fail. Traceback (most recent call last): File "test_sim.py", line 98, in test_sub self.run_tst_program(program, [1, 2, 3, 4, 5]) File "test_sim.py", line 105, in run_tst_program qemu_register_compare(simulator, q, reglist) File "test_sim.py", line 110, in qemu_register_compare qemu_val = qemu.get_register(reg) File "/path/to/soc/src/soc/simulator/qemu.py", line 35, in get_register res = self.gdb.write('-data-list-register-values x {}'.format(num)) File "/usr/local/lib/python3.7/dist-packages/pygdbmi/gdbcontroller.py", line 247, in write timeout_sec=timeout_sec, raise_error_on_timeout=raise_error_on_timeout File "/usr/local/lib/python3.7/dist-packages/pygdbmi/gdbcontroller.py", line 285, in get_gdb_response "Did not get response from gdb after %s seconds" % timeout_sec pygdbmi.gdbcontroller.GdbTimeoutError: Did not get response from gdb after 1 seconds
(In reply to Tobias Platen from comment #133) > I've tried to run the tests that use GDB, they still fail. working fine here tobias - you'll need to work out what's missing, what's going on, by using appropriate print / debug statements. when you find out please do document it.
We shouldn't be pulling the the ISA from the wiki. This is because I can edit the Wiki and break the codebase.
if someone starts doing that i will ban them, recover the wiki with a git reversion, then add a password to the frontend to prevent it happening again. ikiwiki was specifically chosen because of the use of git as a backend. i prefer to trust people.
closing this, completed (POWER9, not doing RISCV). EUR 200 michael, EUR 200 lkcl
increasing to 500 because EUR 100 for jacob's design insights