plan (edited as needed by programmerjake): * DONE: TI-Form * DONE: TI (not TII) field (arguably necessary) * DONE: OP_TERNARY (not OP_TERNARYI) in CSV/power_enums * DONE: rename to ternlog[i] and TLI renamed in both openpower and the wiki * basic adaptable module (probably in nmutil) - DONE: module in nmutil - DONE: replace with Array version - DONE: unit test - DONE: formal * DONE: add encoding of ternlogi to SVP64Asm class (as a 32bit op) * DONE: add ternlogi to shiftrot pipe * DONE: put shiftrot rotator module code back with its comment see comment #22 * DONE: add "pspec" option to enable/disable draft instruction, see comment #22 * DONE: fu unit tests * DONE: fu formal * DONE: add "ExpectedState" to bitmanip_cases.py (yes irony it needs a python implementation of ternlogi but it'll be 4 lines fortunately) this is so that - DONE add a stand-alone Simulator *only* runner of bitmanip_cases.py * DONE tidy up lut.py in nmutil as it is a public API comment #45 * DONE: re-add Rc=1 to ternlogi (maybe) Links/commits: * https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/lut.py;h=5705d3d96e21c050b229e65b9b88d249dbdfe356;hb=3e45917af6e91df910b6fc77d031ee3a656c4116 * https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=892a3cd2f7d76a842c9f838d808af70ea87bf7f6 * https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=2863e83567d6ab0666d7a9749ec07a8c572e47b4;hb=3e45917af6e91df910b6fc77d031ee3a656c4116 * https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=878e4c6776203e12abb4cf4e78f6d251121418c1 * https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_core.py;h=a35545024e23ba05a8ad71ac96e2500e0c5b4612;hb=75bdc1747f32a4fb6cf848ed8b5c68ef2f683f4c#l245 * https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/main_stage.py;h=b8ec704199a800c6df652524612581e07d885bb2;hb=880bf8469c65dbb96c9853b32618d07a0c742cf0 * https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=79b540bb5d8f8ba139f56ff1e8010bde24df3d22
these data structures and code can probably be utilised directly, no modifications at all, in fact just simply import them in the pipeline *or* even: simply add OP_TERNARY *to* shift_rot because it has the exact register profile. if deploying that trick, it would be best to add a PSpec run/compile-time option to make it possible to activate/deactivate (accessible as self.pspec inside main_stage.py) https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/pipe_data.py;hb=HEAD https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/input_stage.py;hb=HEAD https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/output_stage.py;h=d207d4ff416e6f503b9d459fd885bf9ac6e366c7;hb=HEAD if adding OP_TERNARY to shift_rot it would be here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/main_stage.py;hb=HEAD adding to an existing pipeline will save time, so makes a lot of sense, not having more code to write (or review). unit tests however should really go into this directory: https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/test/shift_rot;hb=HEAD but have their own separate file (i.e. not go into shift_rot_cases.py). the unit tests for this one are going to be BIG. 256 combinations of the selector.
updated the plan https://bugs.libre-soc.org/show_bug.cgi?id=745#c0 also, ternary is not a very good name (too generic), imho a better name would be ternlog (like x86's instruction) or lut3
Added BitwiseLut, which is a generic n-ary bitwise lut -- ternary is just BitwiseLut(input_count=3, width=64) https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=3e45917af6e91df910b6fc77d031ee3a656c4116
Added formal test for BitwiseLut: https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=878e4c6776203e12abb4cf4e78f6d251121418c1
(In reply to Jacob Lifshay from comment #4) > Added formal test for BitwiseLut: > https://git.libre-soc.org/?p=nmutil.git;a=commitdiff; > h=878e4c6776203e12abb4cf4e78f6d251121418c1 excellent. btw the convention for those is that they go in a directory named "formal" rather than a directory named "test". (In reply to Jacob Lifshay from comment #2) > updated the plan https://bugs.libre-soc.org/show_bug.cgi?id=745#c0 > also, ternary is not a very good name (too generic), imho a better name > would be ternlog (like x86's instruction) or lut3 lut3's good. using x86 names... hmmm. (In reply to Jacob Lifshay from comment #3) > Added BitwiseLut, which is a generic n-ary bitwise lut -- ternary is just > BitwiseLut(input_count=3, width=64) > > https://git.libre-soc.org/?p=nmutil.git;a=commitdiff; > h=3e45917af6e91df910b6fc77d031ee3a656c4116 + for i in range(self.input_count): + if sel_values[i]: + lut_index |= 2 ** i] that could simply be done in one line as: lut_index.eq(Cat(*sel_values) also... i'm not really sure this is a correct implementation. there are 256 unit tests required: one for each and every 4th input, because the 4th input (lut) is 8-bit wide and therefore there are 256 possible values. also, i'm surprised it works without using Array() (yes there are shorter ways to do this): self._lut_underlying_signal = Signal(8) lut_bits = [] for i in range(len(self._lut_underlying_signal)): lut_bits.append(self.lut_underlying_signal[i]) self.lut = Array(lut_bits) without that dynamic indexing i would not expect it to work, at all [Array creates a pmux at the back-end]. if it does work then i expect that what you've written is a re-implementation of pmux (Array) using a lot more lines of code than is necessary (30 instead of 2) the whole module should be literally this - really not kidding, only around 4 lines: lut = Array([lut_input[i] for i in range(lut_width)]) for i in range(input_width): output[i].eq(lut[Cat(in1[i], in2[i], in3[i])]) that's it. that's the entire implementation. also, what is the Repl() for? the idea is to take each bit of the 3 inputs and put each of them through a lut Array lookup. the inputs should be of the exact length such that no use of Repl() is required: that would be an additional Module()'s job, or the user of this Module's responsibility. Repl() - a secondary task - complicates both the unit test and the Formal Correctness Proof because both the unit test and the Formal Correctness Proof have two tasks to deal with, not one. [unix philosophy: "do one job and do it well"]
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/cr/main_stage.py;h=5f1edc7adb6fed02a2fc08b2a5ff0891905eabe9;hb=04c4144f0d962bb181c69de4ce5a998077314e24#l96 96 # look up the output bit in the lookup table 97 bit_o = Signal() 98 comb += bit_o.eq(Mux(bit_b, 99 Mux(bit_a, lut[3], lut[1]), 100 Mux(bit_a, lut[2], lut[0]))) hmm. this should not have been done this way: it is manual construction (replication) of a pmux. that prevents the HDL tools from identifying a pmux opportunity, which would in turn mean that if an optimised ASIC-grade pmux Cell is ever created, the tools would not use it. this code should be replaced with: lut_array = Array([lut[i] for i in range(4)] bit_o.eq[lut_array[Cat(bit_b, bit_a)) or, better, the use of the LUT module once converted to (much simpler) use of Array.
(In reply to Luke Kenneth Casson Leighton from comment #5) > (In reply to Jacob Lifshay from comment #4) > > Added formal test for BitwiseLut: > > https://git.libre-soc.org/?p=nmutil.git;a=commitdiff; > > h=878e4c6776203e12abb4cf4e78f6d251121418c1 > > excellent. btw the convention for those is that they go > in a directory named "formal" rather than a directory named > "test". k, i put them in the test file cuz I already had it open...didn't think about that. Also, I rely on the formal proof to ensure complete coverage since there are too many cases to check in a unit test. > > (In reply to Jacob Lifshay from comment #2) > > updated the plan https://bugs.libre-soc.org/show_bug.cgi?id=745#c0 > > also, ternary is not a very good name (too generic), imho a better name > > would be ternlog (like x86's instruction) or lut3 > > lut3's good. using x86 names... hmmm. the benefit of using x86's name is someone who has seen the x86 instruction before will know exactly what it does, without having to look it up. The other names are worse cuz they don't indicate that the operation is bitwise... the log in ternlog indicates bitwise logic. > (In reply to Jacob Lifshay from comment #3) > > Added BitwiseLut, which is a generic n-ary bitwise lut -- ternary is just > > BitwiseLut(input_count=3, width=64) > > > > https://git.libre-soc.org/?p=nmutil.git;a=commitdiff; > > h=3e45917af6e91df910b6fc77d031ee3a656c4116 > > + for i in range(self.input_count): > + if sel_values[i]: > + lut_index |= 2 ** i] > > that could simply be done in one line as: > > lut_index.eq(Cat(*sel_values) no, it can't cuz sel_values is a tuple of bools and lut_index is an int, they are not nmigen Values. > > also... i'm not really sure this is a correct implementation. there > are 256 unit tests required: one for each and every 4th input, because > the 4th input (lut) is 8-bit wide and therefore there are 256 possible > values. there's actually 2^56 possible input values cuz 3 * 16-bit inputs + 8-bit lut. I decided that was too many to test all of, so I rely on the formal proof for complete coverage and just pick 100 pseudorandom cases (using sha256 for reproducibility) to test in the unit test. > also, i'm surprised it works without using Array() > (yes there are shorter ways to do this): > > self._lut_underlying_signal = Signal(8) > lut_bits = [] > for i in range(len(self._lut_underlying_signal)): > lut_bits.append(self.lut_underlying_signal[i]) > self.lut = Array(lut_bits) > > without that dynamic indexing i would not expect it to work, > at all [Array creates a pmux at the back-end]. > > if it does work then i expect that what you've written is > a re-implementation of pmux (Array) using a lot more lines of > code than is necessary (30 instead of 2) > > the whole module should be literally this - really not kidding, > only around 4 lines: > > lut = Array([lut_input[i] for i in range(lut_width)]) > for i in range(input_width): > output[i].eq(lut[Cat(in1[i], in2[i], in3[i])]) > > that's it. that's the entire implementation. > > > also, what is the Repl() for? well, the way I implemented it is by building a binary tree of output-width wide BitwiseMux selecting based on the inputs, where each Signal in the tree is output-width wide -- the Repl is to splat each lut bit into an output-width wide value so the BitwiseMux tree has all the right input bits. if you use yosys show, you should get a good idea of the structure: output | /----+----\ in0-/ mux_0bxxx \ /--0-------1--\ | | +---+ +-------------+ | | /----+----\ /----+----\ in1-/ mux_0bxx0 \ in1-/ mux_0bxx1 \ /--0-------1--\ /--0-------1--\ | | | | | | | +-------------------------------+ | | +---------------------+ | +---+ +-------------+ | | | | | | /----+----\ /----+----\ /----+----\ /----+----\ in2-/ mux_0bx00 \ in2-/ mux_0bx10 \ in2-/ mux_0bx01 \ in2-/ mux_0bx11 \ /--0-------1--\ /--0-------1--\ /--0-------1--\ /--0-------1--\ | | | | | | | | splat splat splat splat splat splat splat splat | | | | | | | | lut[0] lut[4] lut[2] lut[6] lut[1] lut[5] lut[3] lut[7]
(In reply to Jacob Lifshay from comment #7) > the benefit of using x86's name is someone who has seen the x86 instruction > before will know exactly what it does, without having to look it up. The > other names are worse cuz they don't indicate that the operation is > bitwise... the log in ternlog indicates bitwise logic. yep, like it. ternlog (and ternlogi) it is > there's actually 2^56 possible input values cuz 3 * 16-bit inputs + 8-bit > lut. I decided that was too many to test all of, hell yes. > so I rely on the formal proof for complete coverage which is what they're for. > and just pick 100 pseudorandom cases (using > sha256 for reproducibility) to test in the unit test. good idea. > > also, what is the Repl() for? > well, the way I implemented it is by building a binary tree of output-width > wide BitwiseMux selecting based on the inputs, where each Signal in the tree > is output-width wide if pmux did not exist, or was not covered in nmigen by Array(), or if there was some massive inefficiency expected by yosys to completely fail to create an optimal binary-fan-out mux-tree, then it would be a good idea. > -- the Repl is to splat each lut bit into an > output-width wide value so the BitwiseMux tree has all the right input bits. that can be covered by specifying a fixed width as an integer argument for both input and output. that results in the number of lines of code required literally reducing down to four (4) if someone wants Repl() externally they can do that... with another Module or in the Module. forcing the API to deal with a case that may - or may not - be even needed introduces API and design complexity. needs to go. 4 lines of code. that's all.
(In reply to Luke Kenneth Casson Leighton from comment #8) > (In reply to Jacob Lifshay from comment #7) > > > the benefit of using x86's name is someone who has seen the x86 instruction > > before will know exactly what it does, without having to look it up. The > > other names are worse cuz they don't indicate that the operation is > > bitwise... the log in ternlog indicates bitwise logic. > > yep, like it. ternlog (and ternlogi) it is k, i'll rename it to ternlog in both the openpower-isa.git and the wiki. imho the TI-form and TI-field should probably be renamed to TLI. > if pmux did not exist, or was not covered in nmigen by Array(), > or if there was some massive inefficiency expected by yosys to > completely fail to create an optimal binary-fan-out mux-tree, > then it would be a good idea. yup, I'll replace it with using Array as suggested. Repl is only needed for the BitwiseMux-tree version. > > > -- the Repl is to splat each lut bit into an > > output-width wide value so the BitwiseMux tree has all the right input bits. > > that can be covered by specifying a fixed width as an integer argument > for both input and output. umm, the input/outputs are already fixed-width with the width as an integer argument passed into __init__, i think you've misunderstood how Repl was used here... > forcing the API to deal with a case that may - or may not - be even needed > introduces API and design complexity. it was needed to convert from each single-bit slice of lut into the width-wide inputs of the BitwiseMux tree, cuz otherwise you'd end up just assigning the lsb (cuz slices are unsigned) and only the lsb of the output would be correct, the rest of the output bits would always be zeros.
(In reply to Jacob Lifshay from comment #9) > yup, I'll replace it with using Array as suggested. Done: https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=892a3cd2f7d76a842c9f838d808af70ea87bf7f6
renamed ternary->ternlog and TI->TLI in both openpower-isa.git and the wiki
WIP: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=309b35ee91f1ad6448379a1d804c304dd9681396 https://git.libre-soc.org/?p=soc.git;a=commit;h=e281a933b5e0b7b0c85040116a404873f4ee0f17
(In reply to Jacob Lifshay from comment #12) > WIP: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=309b35ee91f1ad6448379a1d804c304dd9681396 i forgot, an additional TODO (now added): basically the contents of this patch should be shifted to SVP64Asm (as a 32bit instruction) there are about 6-10 instructions now in that class, and the lst argument can be chucked at it and it will create a .long xxxxxxxxxx with the right "stuff". > https://git.libre-soc.org/?p=soc.git;a=commit; > h=e281a933b5e0b7b0c85040116a404873f4ee0f17 mm... a case could be made either way for the benefits of having a separate pipeline vs putting ops into ones with identical profiles. basically, instructions must be grouped by their register profile, *not* by the name of the PDF Page Section in the ISA Manual. and OP_TERNLOG is an identical register profile to shiftrot. when more instructions are added, this will start to make sense. e.g. adding the CR-based variant of OP_TERNLOG, are another three regs to be added to regspec? that would make SIX input regs and two output regs! RA RB RC BA BB BC this would be quite insane to have a pipeline and a Dependency Matrix Row with EIGHT registers! see now why things have to be grouped with other instructions with a similar register profile? it is to keep the Dependency Hazard Tracking down to sane levels.
(In reply to Luke Kenneth Casson Leighton from comment #13) > (In reply to Jacob Lifshay from comment #12) > > https://git.libre-soc.org/?p=soc.git;a=commit; > > h=e281a933b5e0b7b0c85040116a404873f4ee0f17 > > mm... a case could be made either way for the benefits of having > a separate pipeline vs putting ops into ones with identical > profiles. I was thinking I would put all integer bitmanip ops in one pipeline, and all cr bitmanip ops in another.
(In reply to Jacob Lifshay from comment #14) > (In reply to Luke Kenneth Casson Leighton from comment #13) > > (In reply to Jacob Lifshay from comment #12) > > > https://git.libre-soc.org/?p=soc.git;a=commit; > > > h=e281a933b5e0b7b0c85040116a404873f4ee0f17 > > > > mm... a case could be made either way for the benefits of having > > a separate pipeline vs putting ops into ones with identical > > profiles. > > I was thinking I would put all integer bitmanip ops in one pipeline, and all > cr bitmanip ops in another. if you examine the source code - or were familiar with it already - you would have seen how core.py works and how it auto-constructs register file port access by analysing the regspecs and creating broadcast buses. due to there being 10 pipelines already there are currently a whopping 35 register file read port accesses: this is after merging RA RB and RC into one single register file port (which, clearly, requires 3 cycles delay just to get all the operands). for an in-order core assuming a 3R1W or 4R1W regfile that goes up to a staggering 50 register file port read accesses, if that "merging" trick is removed (in order to get latency down). the consequences of what you are proposing is that the number of register file ports goes up to almost 60 register file port read accesses. that is just read port accessors. the policy / convention of grouping by regspec profile - not by function - which i have repeated four times now - is there to ensure that there is no out-of-control explosion of port proliferation, which requires massive wide OR gates and Muxes to handle. when i provide summaries as instructions to you, it is to save time explaining these things in detail. it would be good if you could listen because you are still writing 3-5 times as much code as is required, and then i have to spend more time explaining why that code has to be cut back, usually to what i said would be needed, when i said it the first time.
i put together an explanatory video for you, i recommend viewing it and core.py https://youtu.be/7Th1b-jq40k
(In reply to Luke Kenneth Casson Leighton from comment #16) responded in new thread cuz it's more closely related to the core pipeline ALU/FU design than ternlog: https://lists.libre-soc.org/pipermail/libre-soc-dev/2021-November/004178.html
https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=14896da28d3eff637048c746d1e7a2aa5865f133;hb=892a3cd2f7d76a842c9f838d808af70ea87bf7f6#l33 jacob could you please remove that code entirely and use the standard FHDLTestCase class just like all other formal correctness proofs, which you should have known existed because you must have been reading the source code and constantly examining it for over 2 years now, so there is no excuse. again this is unnecessary duplicated code which has been written and precious time has been wasted on more duplication, then more time spent identifying the duplication, then more time asking you to remove the duplication, and more time by you spent removing the duplication.
(In reply to Luke Kenneth Casson Leighton from comment #18) > https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=14896da28d3eff637048c746d1e7a2aa5865f133;hb=892a3cd2f7d76a842c9f838d808af70ea87bf7f6#l33 > > jacob could you please remove that code entirely and use the standard > FHDLTestCase class just like all other formal correctness proofs, > which you should have known existed because you must have been > reading the source code and constantly examining it for over 2 years > now, so there is no excuse. Yeah, I can switch back, though FHDLTestCase has some serious shortcomings, including not working correctly when tests are run in parallel cuz it uses the same directory (fixed by formal's using get_test_path which uses a separate path for each test case fn as well as each time it's called in a single test fn), as well as breaking the API of the assertRaises, assertRaisesRegex, and assertWarns methods (cuz they yield None rather than what they should). > > again this is unnecessary duplicated code which has been written > and precious time has been wasted on more duplication, it took me all of 30s to copy from where I already wrote it for unrelated code, so basically zero time was spent in that particular aspect. > then more > time spent identifying the duplication, then more time asking you > to remove the duplication, and more time by you spent removing the > duplication. yeah, yeah... I'll be busy for the rest of today and tomorrow, so will get to it on friday.
(In reply to Jacob Lifshay from comment #19) > Yeah, I can switch back, though FHDLTestCase has some serious shortcomings, > including not working correctly when tests are run in parallel cuz it uses > the same directory (fixed by formal's using get_test_path which uses a > separate path for each test case fn as well as each time it's called in a > single test fn), as well as breaking the API of the assertRaises, > assertRaisesRegex, and assertWarns methods (cuz they yield None rather than > what they should). then you should have fixed it! at the very least you should have raised it as a bugreport, not least so that money can be allocated to it for you to do the damn work! > > > > again this is unnecessary duplicated code which has been written > > and precious time has been wasted on more duplication, > > it took me all of 30s to copy from where I already wrote it for unrelated > code, so basically zero time was spent in that particular aspect. by you. creating a maintenance headache that i have to deal with. which is very annoying to me > > then more > > time spent identifying the duplication, then more time asking you > > to remove the duplication, and more time by you spent removing the > > duplication. > > yeah, yeah... > > I'll be busy for the rest of today and tomorrow, so will get to it on friday. please speed up. it has been about six weeks and only one instruction has [not] been written.
(In reply to Luke Kenneth Casson Leighton from comment #20) > (In reply to Jacob Lifshay from comment #19) > > > Yeah, I can switch back, though FHDLTestCase has some serious shortcomings, > > then you should have fixed it! Ok, I fixed it and switched the tests to use the newly-fixed FHDLTestCase. I also moved ternlogi to the shiftrot pipe and removed the now-redundant bitmanip pipe, I still need to add the fu unit test.
(In reply to Jacob Lifshay from comment #21) > (In reply to Luke Kenneth Casson Leighton from comment #20) > > (In reply to Jacob Lifshay from comment #19) > > > > > Yeah, I can switch back, though FHDLTestCase has some serious shortcomings, > > > > then you should have fixed it! > > Ok, I fixed it and switched the tests to use the newly-fixed FHDLTestCase. brilliant. > I also moved ternlogi to the shiftrot pipe great. really, a compile-time option needs adding which can disable it. added to pspec. 22 def __init__(self, pspec): 22.5 self.draft_bitmanip = hasattr(pspec, "draft_bitmanip") and psec.draftbitmanip == True 23 super().__init__(pspec, "main") 92.5 if self.draft_bitmanip: 93 with m.Case(MicrOp.OP_TERNLOG): 94 # TODO: this only works for ternaryi, change to get lut value that has the advantage of identifying draft instructions as well as making it abundantly clear that they are, in fact, draft (unofficial) see test_core.py as to how to set them up: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_core.py;h=a35545024e23ba05a8ad71ac96e2500e0c5b4612;hb=75bdc1747f32a4fb6cf848ed8b5c68ef2f683f4c#l245 the new option draft_bitmanip=True will need to be added btw, this move should not have been left uncommented in the commit message: it has nothing to do with ternlogi. it should have been done as a completely separate commit, "moving rotator module connectivity) actually, please put that back so to line 102, back associated with the Cat() because it now looks like the Cat statement has no comment or explanation associated with it. (edit: or move the Cat() and add a comment about it setting "mode") 102 comb += Cat(rotator.right_shift, 103 rotator.clear_left, @@ -65,6 +71,10 @@ class ShiftRotMainStage(PipeModBase): comb += o.ok.eq(1) # defaults to enabled + # outputs from the microwatt rotator module + comb += [o.data.eq(rotator.result_o), + self.o.xer_ca.data.eq(Repl(rotator.carry_out_o, 2))] + # instruction rotate type mode = Signal(4, reset_less=T > and removed the now-redundant > bitmanip pipe, I still need to add the fu unit test. great.
(In reply to Luke Kenneth Casson Leighton from comment #22) > (In reply to Jacob Lifshay from comment #21) > > I also moved ternlogi to the shiftrot pipe > > great. really, a compile-time option needs adding which can disable > it. added to pspec. I added that to shift_rot's pspec, I have yet to figure out how to pass that in from all the places where ShiftRot pipelines are created...will try to figure that out. > see test_core.py as to how to set them up: > > https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_core.py;h=a35545024e23ba05a8ad71ac96e2500e0c5b4612;hb=75bdc1747f32a4fb6cf848ed8b5c68ef2f683f4c#l245 > > the new option draft_bitmanip=True will need to be added > btw, this move should not have been left uncommented in the commit message: > it has nothing to do with ternlogi. > > (edit: or move the Cat() and add a comment about it setting "mode") moved the Cat
(In reply to Jacob Lifshay from comment #23) > (In reply to Luke Kenneth Casson Leighton from comment #22) > > (In reply to Jacob Lifshay from comment #21) > > > I also moved ternlogi to the shiftrot pipe > > > > great. really, a compile-time option needs adding which can disable > > it. added to pspec. > > I added that to shift_rot's pspec, I have yet to figure out how to pass that > in from all the places where ShiftRot pipelines are created...will try to > figure that out. for anything related to TestIssuer unit test running, as well as ISACaller running, the "base" class is here: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/runner.py;h=b8378a93ee550c67ac4eb48097005e9f631f0879;hb=74adc2b94a4409162119157f3f75e05a06e4c841#l135 which for TestIssuer simulations the extra argument must be passed here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_runner.py;h=962d47a20e9e11eaf992c91c5e424c4c7bbdae5f;hb=20ffa8ec223be44fc37df5184ca09d648e85a844#l292 and a command-line argument added here: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_issuer.py;h=b8245cbf1a5356f39a6695531ecda1e96460617b;hb=20ffa8ec223be44fc37df5184ca09d648e85a844#l37 (actually to all of these) https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/simple/test;hb=HEAD hmmm TestMemPSpec must move into openpower-isa, hmmm. importing from soc in the openpower-isa repo is... Bad(tm) then there will also be fu/compunit tests, as well as individual pipeline-tests that need it enabling. basically, anywhere you can find a "grep -r TestmemPspec" > > (edit: or move the Cat() and add a comment about it setting "mode") > > moved the Cat great. need to keep things together, comments are critically important in a project of this size. https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=20ffa8ec223be44fc37df5184ca09d648e85a844 +def get_pspec_draft_bitmanip(pspec): + return getattr(pspec, "draft_bitmanip", False) it can't be done that way (at least, i don't think so, can you please check and put in a code-comment if it does) TestMemPspec actually is a subclass of unittest Mock() which has overrides on __getattr__ etc. and consequently you cannot just access attributes in the "normal" way. you'll need to check that getattr() and hasattr() do the same thing: they probably don't, which is most likely why i specifically used the (odd-looking) "if hasattr(pspec, "attribute") and pspec.attribute == True)" class ShiftRotMainStage(PipeModBase): def __init__(self, pspec): super().__init__(pspec, "main") + self.draft_bitmanip = get_pspec_draft_bitmanip(pspec) self.fields = DecodeFields(SignalBitRange, [self.i.ctx.op.insn]) self.fields.create_specs() please comment that. explain what is going on (that it's draft instructions, include the URL of draft bitmanip instructions as well) and also cross-reference at the top code-block back to the URL of the relevant bugreport comment as to what work is going on here: something like "draft bitmanip instructions are added here because of similar register profiles, they are not approved yet by OPF, and can be disabled at compile-time. draft spec is at {URL}, bugreport is at {URL#comment}" this is so that we don't get completely and hopelessly lost even during code-review right now, as well as later when other people wonder what the heck is going on. hmmm also can you please add this URL, it's missing https://bugs.libre-soc.org/show_bug.cgi?id=339 and that it implements Power ISA v3.0B Book I Section 3.3.14 p101-110 (shiftrot was one of the very early pipelines and the practice of linking the context (bugreport, Spec page numbers) hadn't been established yet).
(In reply to Luke Kenneth Casson Leighton from comment #24) > https://git.libre-soc.org/?p=soc.git;a=commitdiff; > h=20ffa8ec223be44fc37df5184ca09d648e85a844 > > +def get_pspec_draft_bitmanip(pspec): > + return getattr(pspec, "draft_bitmanip", False) > > it can't be done that way (at least, i don't think so, can you please > check and put in a code-comment if it does) yup, it won't work that way, if pspec is a Mock instance here. Mock will return new Mock instances for any attribute access. Mock is plain and simple the wrong tool for the job, dataclasses are better. From my reading through the code, it appears as though the pspec passed into ShiftRot isn't the same pspec created for the whole core, so there shouldn't be an issue for that particular use case. > TestMemPspec actually is a subclass of unittest Mock() which has overrides > on __getattr__ etc. and consequently you cannot just access attributes > in the "normal" way. > > you'll need to check that getattr() and hasattr() do the same thing: > they probably don't, which is most likely why i specifically used the > (odd-looking) "if hasattr(pspec, "attribute") and pspec.attribute == True)" (hasattr(pspec, "attr") and pspec.attr) isn't any better than getattr(pspec, "attr", default), cuz hasattr always returns true for any attr string.
(In reply to Jacob Lifshay from comment #25) > yup, it won't work that way, if pspec is a Mock instance here. Mock will > return new Mock instances for any attribute access. Mock is plain and simple > the wrong tool for the job, dataclasses are better. Mock was there, it does the job, it saved 1 day writing a custom class that does exactly the same thing. now it will be 3-4 days going through the entire codebase removing it and rerunning all unit tests. which we csnnot afford to do. this is about the lowest possible priority task of all the lowest priority tasks and even discussing how great a replacement dataclass should be by comparison is a waste of time. if you had helped out at the time then you could have written such a class. but you did not help. therefore under extreme pressure i had to make drastic shortcut decisions and now we are stuck with it. lesson learned, i trust. > From my reading through the code, it appears as though the pspec passed into > ShiftRot isn't the same pspec created for the whole core, so there shouldn't > be an issue for that particular use case. yes, annoyingly confusing choice of variable name, 18 months ago. i think i meant to merge them, then changed my mind, and never corrected the variable names. now there are hundreds of pspecs in use and it will be a pig to separate them all. > (hasattr(pspec, "attr") and pspec.attr) isn't any better than getattr(pspec, > "attr", default), cuz hasattr always returns true for any attr string. now you know why i did "pspec.somevar == True" not "if pspec.somevar" because that will always return true. it does need to be *in* TestMemPSpec and passed through, not in the pipeline spec (sorry about the confusion) because TestMemPSpec is what gets set/activated from the commandline (and various unit tests). in this way there is a choice to compile with or without draft bitmanip. exactly the same as nosvp64 and other options. see issuer_verilog.py for the full list, an option is needed --with-draft-bitmanip by default it should be enabled (incl in test_issuer.py) because it will be a pain to type every time.
the class where these two types of pspecs merge is AllFunctionUnits. https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/compunits/compunits.py;h=be3d4e69c5806dcce1cf68642e0a514ac37249e2;hb=4c98cc88be5aba23807c6f4bf97e9de6ba13fd73#l281 you can see here: 293 def __init__(self, pspec, pilist=None, div_fsm=True): 294 addrwid = pspec.addr_wid 295 units = pspec.units 296 microwatt_mmu = hasattr(pspec, "mmu") and pspec.mmu == True 297 print("AllFunctionUnits.microwatt_mmu="+str(microwatt_mmu)) that is taking: * the address width from the TestMemPSpec * the units dictionary from the TestMemPSpec * whether mmu is requested to be added as a special-case the address width is specifically handed to LDSTFunctionUnit(s): 344 for i, pi in enumerate(pilist): 345 self.fus["ldst%d" % (i)] = LDSTFunctionUnit(pi, addrwid, i) and you can see how in both FunctionUnitBaseMulti and Single the pspec - this one is the *pipe* specification - is created and passed through to the ALU: 135 class FunctionUnitBaseMulti(ReservationStations2): 157 def __init__(self, speckls, pipekls, num_rows): 158 id_wid = num_rows.bit_length() 159 pspec = speckls(id_wid=id_wid) # spec (NNNPipeSpec instance) 160 opsubset = pspec.opsubsetkls # get the operand subset class 161 regspec = pspec.regspec # get the regspec 162 alu = pipekls(pspec) # create actual NNNBasePipe 163 self.pspec = pspec there's a couple of options here: 1) pass the TestMemPSpec instance through to *everything* (an extra argument to every function unit class, in addition to idx 2) add a function that passes it in, post-instantiation (which then passes it through to the self.alu instance with a follow-on call) i'd recommend (1) because the constructor of the PipeSpec and also the ALU may need to adapt. this will *need* code comments explaining what the hell is going on, it's the first time these two (identical-named) variables have "met"
love the commit comments, yes a dog's dinner would be prettier :) it'll have to do for now, with enough warnings and explanatory notes: changing low-level structures in python code is too risky (or needs very careful intermediary planning)
I decided that just passing the cpu's pspec into everything would work best, so added it as parent_pspec variables, since it being parent makes it more obvious what's going on. Hopefully I didn't miss anything important in my refactor... I also added forwarding via __getattr__ so code expecting attributes to be set on pspec will automatically get them from parent_pspec too. class CommonPipeSpec: """CommonPipeSpec: base class for all pipeline specifications see README.md for explanation of members. """ def __init__(self, id_wid, parent_pspec): self.pipekls = SimpleHandshakeRedir self.id_wid = id_wid self.opkls = lambda _: self.opsubsetkls() self.op_wid = get_rec_width(self.opkls(None)) # hmm.. self.stage = None self.parent_pspec = parent_pspec # forward attributes from parent_pspec def __getattr__(self, name): return getattr(self.parent_pspec, name)
I added the ternlogi spec pseudo-code. While I'm at it, I become annoyed with constantly having -.csv and sv_decode.vhdl be untracked files in openpower/isatables/ Should I add them to .gitignore or should I commit them?
I got the ternlogi tests to run, they all seem to work correctly, except that somehow the simulator ends up with SO set even though nothing writes to it, Luke, please help debug that
(In reply to Jacob Lifshay from comment #30) > I added the ternlogi spec pseudo-code. > > While I'm at it, I become annoyed with constantly having -.csv erm... > and sv_decode.vhdl that one should be .gitignored > be untracked files in openpower/isatables/ > Should I add them to .gitignore or should I commit them? ermmm...ermermerm.... these are actually important to have properly examined, lwzu for example is clearly a mistake. can you please raise a bugreport about them, to be investigated in the meantime i'll put something in sv_analysis.py (edit: please cross-ref this commit in the bugreport https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=81c20b9e15d931ef58195644767a58944d933336) insn,CONDITIONS,Ptype,Etype,0,1,2,3,in1,in2,in3,out,CR in,CR out,out2 dcbz,,0,EXTRA2,0,0,0,0,RA_OR_ZERO,RB,0,0,0,0,0 lwzu,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA lbzu,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA lhzu,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA lhau,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA lfsu,SVP64BREV,0,EXTRA2,0,0,0,0,RA,0,RC,FRT,0,0,RA lfdu,SVP64BREV,0,EXTRA2,0,0,0,0,RA,0,RC,FRT,0,0,RA 1/6=mtfsb1,,0,EXTRA2,0,0,0,0,0,0,0,0,0,CR1,0 2/0=mcrfs,,0,EXTRA2,0,0,0,0,0,0,0,0,0,BF,0 2/6=mtfsb0,,0,EXTRA2,0,0,0,0,0,0,0,0,0,CR1,0 4/6=mtfsfi,,0,EXTRA2,0,0,0,0,0,0,0,0,0,CR1,0
(In reply to Luke Kenneth Casson Leighton from comment #32) > (In reply to Jacob Lifshay from comment #30) > > I added the ternlogi spec pseudo-code. > > > > While I'm at it, I become annoyed with constantly having -.csv > > erm... > > > and sv_decode.vhdl > > that one should be .gitignored Done. > > > be untracked files in openpower/isatables/ > > Should I add them to .gitignore or should I commit them? > > ermmm...ermermerm.... these are actually important to have > properly examined, lwzu for example is clearly a mistake. > can you please raise a bugreport about them, Done.
The FU unit tests pass now. openpower-isa.git: commit 3d9d4a0864e4e5d4ab6dc13348ec3289748285d6 (HEAD -> master, origin/master, origin/HEAD) Author: Jacob Lifshay <programmerjake@gmail.com> Date: Fri Dec 10 12:34:23 2021 -0800 change ternlogi to not have Rc field
added to SVP64Asm
soc.git: commit 798afa4ccda4a0e8065e03dcf90fb60922ba4b00 (HEAD -> master, origin/master, origin/HEAD) Author: Jacob Lifshay <programmerjake@gmail.com> Date: Fri Dec 10 13:54:22 2021 -0800 add ternlogi to shift_rot formal test
(In reply to Jacob Lifshay from comment #36) > add ternlogi to shift_rot formal test brilliant. give me a chance to review everything first before moving it to "FIXED" because "FIXED" has the annoying side-effect of removing the bug from public searches, and i want to make sure the wiki TODO list has a crossreference. in the meantime do pick another instruction, you know the ropes, now, which is great. feel free to leave e.g. min/max to someone whom we can give it to as a "beginner" exercise
(edit: done, except 7-idx) result <- [0] * XLEN idx <- [0] * 3 do i = 0 to XLEN - 1 idx[0] <- (RT)[i] idx[1] <- (RA)[i] idx[2] <- (RB)[i] result[i] <- (TLI & ROTL64(1, idx)) != 0 the style in the Power ISA is to use less lines, and that's possible here with: do i = 0 to XLEN - 1 idx <- (RT)[i] || (RA)[i] || (RB)[i] result[i] <- (TLI & ROTL64(1, idx)) != 0 also, using ROTL64 seems overkill (and is supposed to be for shiftrot) i suspect this will do: do i = 0 to XLEN - 1 idx <- (RT)[i] || (RA)[i] || (RB)[i] result[i] <- TLI[idx] don't be tempted to substitute idx there, the parser won't be able to cope because it can't identify the size of idx. if those 3 lines don't work, then this should "fix" that: pre-declaring idx to a known size. idx <- [0] * 3 do i = 0 to XLEN - 1 idx <- (RT)[i] || (RA)[i] || (RB)[i] result[i] <- TLI[idx] no i don't want the parser to turn into a 2-pass Monster right now.
commit 8abcd7b74c5416c8a5348efc467854ad3080f12f (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Dec 11 15:13:15 2021 +0000 use concat in ternlogi to reduce code size works. result <- [0] * XLEN - idx <- [0] * 3 do i = 0 to XLEN - 1 - idx[0] <- (RT)[i] - idx[1] <- (RA)[i] - idx[2] <- (RB)[i] + idx <- (RT)[i] || (RA)[i] || (RB)[i]
ahh, and *this* works. @@ -12,7 +12,7 @@ Pseudo-code: result <- [0] * XLEN do i = 0 to XLEN - 1 idx <- (RT)[i] || (RA)[i] || (RB)[i] - result[i] <- (TLI & ROTL64(1, idx)) != 0 + result[i] <- TLI[7-idx] RT <- result nuts. note the 7-idx. IBM will have a fit about that, it's LSB0-to-MSB0 conversion and i'm pretty sure they'll want (insist on) MSB0 being in the spec. deep breath: the spec needs to read: result[i] <- TLI[idx] and... sigh... all the unit tests, HDL, everything, changed to match that. i know. it's annoying, but we don't want to irritate the IBM members of the ISA WG by not following the conventions of 30 years.
unit tests pass and run great btw! python3 fu/shift_rot/test/test_pipe_calr.py >& /tmp/f Ran 3 tests in 8.743s OK
also (sorry!) just saw that bitmanip_cases.py doesn't have a stand-alone Simulator-only test, which it should, and also have an ExpectedState (see alu_cases.py for examples) see decoder/isa/test_caller_alu.py and literally copy it, or, if you feel inclined, work out a common function and update test_caller_bitmanip.py and test_caller_alu.py and test_caller_shiftrot.py to use that common function, rather than duplicate 30 lines of near-identical code. no big deal if you duplicate, there. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/alu/alu_cases.py;hb=HEAD https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_alu.py;hb=HEAD
(In reply to Luke Kenneth Casson Leighton from comment #40) > ahh, and *this* works. > > @@ -12,7 +12,7 @@ Pseudo-code: > result <- [0] * XLEN > do i = 0 to XLEN - 1 > idx <- (RT)[i] || (RA)[i] || (RB)[i] > - result[i] <- (TLI & ROTL64(1, idx)) != 0 > + result[i] <- TLI[7-idx] > RT <- result > > nuts. note the 7-idx. IBM will have a fit about that, > it's LSB0-to-MSB0 conversion and i'm pretty sure they'll > want (insist on) MSB0 being in the spec. > > deep breath: the spec needs to read: > > result[i] <- TLI[idx] I'd strongly argue we need little-endian anyway for essentially the same reasons as SV integer predicates being in little-endian, if we're ever going to have the non-immediate form. If we know for sure that we'll only have ternlogi, then it doesn't matter as much cuz the compiler can easily do the endian swap. That's why I used ROTL64 there, cuz it gives natural-looking little-endian, and the left-shift operator apparently doesn't exist.
(In reply to Jacob Lifshay from comment #43) > I'd strongly argue we need little-endian anyway for essentially the same > reasons as SV integer predicates being in little-endian, if we're ever going > to have the non-immediate form. in some fashion or other, even if it's similar to bpermd. i do wish this stuff wasn't so bleedin hard to get yer head round. thoughts out loud: * the lowest bit in e.g. RC has to be the lowest index * therefore it will be 63-0 * therefore it is indeed ok to have MSB number inversion * and, actually, that's consistent with everything else where we just did a massive XLEN-i substitution * which i had totally forgotten about so yes if questioned there exists a justification for keeping 7-idx because it will match with XLEN-id for the nonimmed version am all caught up now. ROTL64 just looks messy and unreadable. specifications are supposed to be clear.
just going over lut.py: delete these. again they are unnecessary and violate LSP 29 assert isinstance(input_count, int) 30 assert isinstance(width, int) all of these can go 40 def lut_index(i): 41 return Signal(input_count, name=f"lut_index_{i}") 42 self._lut_indexes = [lut_index(i) for i in range(width)] insert this between 47 and 48 47 for i in range(self.width): lut_idx = Signal(input_count, name=f"lut_index_{i}") 48 for j in range(self.input_count): remove self. in front of lut_idx also remove self. on front of lut and move into elaborate where self. is also removed. 38 self.lut = Signal(2 ** input_count) the style that you are using is exposing private local signals unnecessarily. because this is a public library, there are not enough code-comments. as with grev.py: * link to bugreport * add explicit copyright and link to NLnet and NGI POINTER * """docstring what the module does""" * explanatory comments
(In reply to Luke Kenneth Casson Leighton from comment #45) I simplified the code and added docs. > also remove self. on front of lut and move into elaborate > where self. is also removed. > > 38 self.lut = Signal(2 ** input_count) No, self.lut is an *input*, it would break if I had it private.
(In reply to Jacob Lifshay from comment #46) > (In reply to Luke Kenneth Casson Leighton from comment #45) > I simplified the code and added docs. nice. it's tidy and consistent. looks great. ah, just spotted: * missing copyright notice * using php-style functions-inside-text (blech!) * spaces in front of docstrings (i know, it looks untidy, but that's the style) * although saying "see BitwiseLUT docstring" is fine in this case (because the API is identical), actually saying *that* the API is identical (and why) is missing. > No, self.lut is an *input*, it would break if I had it private. that would be why it would need to have been documented :)
oh: and definitely don't use docstrings *under* (after) a line of code to explain it: put a # comment *before* the code, or at the end of the line (if there's space to do so). self.input """ something after-the-fact about input: NO" # something about the input which is coming up next.... self.input # something short about the input
(In reply to Luke Kenneth Casson Leighton from comment #48) > oh: and definitely don't use docstrings *under* (after) a line of code > to explain it But that's the official Python standard for documenting attributes in a way that tools can extract the docstrings: https://www.python.org/dev/peps/pep-0257/ > String literals occurring immediately after a simple assignment at the top > level of a module, class, or __init__ method are called "attribute > docstrings".
(In reply to Jacob Lifshay from comment #49) > (In reply to Luke Kenneth Casson Leighton from comment #48) > > oh: and definitely don't use docstrings *under* (after) a line of code > > to explain it > > But that's the official Python standard for documenting attributes in a way > that tools can extract the docstrings: > https://www.python.org/dev/peps/pep-0257/ A: because it's totally counter-intuitive Q: why is it awful to have the docstring after the attribute it refers to? > > String literals occurring immediately after a simple assignment at the top > > level of a module, class, or __init__ method are called "attribute > > docstrings". i've never seen this used - ever - in 20 years. it's great that you're reading the specs, but some things like this just in practice never get used, and they'll confuse the hell out people.
(In reply to Luke Kenneth Casson Leighton from comment #50) > (In reply to Jacob Lifshay from comment #49) > > (In reply to Luke Kenneth Casson Leighton from comment #48) > > > String literals occurring immediately after a simple assignment at the top > > > level of a module, class, or __init__ method are called "attribute > > > docstrings". > > i've never seen this used - ever - in 20 years. it's great that you're > reading the specs, but some things like this just in practice never get used, > and they'll confuse the hell out people. They do get used pretty often: https://repo.or.cz/docutils.git/blob/3c60c830ac4cafaede096539d09c45f51544686a:/docutils/docutils/transforms/frontmatter.py#l382 https://github.com/pdoc3/pdoc/blob/4aa70de2221a34a3003a7e5f52a9b91965f0e359/pdoc/__init__.py#L502 Also, probably most importantly, they're the format that basically *all* tools understand, sphinx/docutils, pdoc3, vscode, etc.
(In reply to Jacob Lifshay from comment #51) > They do get used pretty often: > https://repo.or.cz/docutils.git/blob/ > 3c60c830ac4cafaede096539d09c45f51544686a:/docutils/docutils/transforms/ > frontmatter.py#l382 > > https://github.com/pdoc3/pdoc/blob/4aa70de2221a34a3003a7e5f52a9b91965f0e359/ > pdoc/__init__.py#L502 blech :) > Also, probably most importantly, they're the format that basically *all* > tools understand, sphinx/docutils, pdoc3, vscode, etc. very true. i also deduced why they must have done post-docstring, because that's what's done on functions and classes. the docstring comes directly after the function declaration or class declaration [it's still "blech" :) ] i know what it was: the lack of space between them inputs = ... """the inputs""" outputs = .... """the outputs""" etc. this just confused the hell out of me. i tried restoring the docstrings and adding some extra spaces and it just irritated me that there was extra space, given that it becomes nine (9!!) lines instead of three (3). this by contrast is compact, clear, and fits in under 80 chars + self.inputs = tuple(inp(i) for i in range(input_count)) # inputs + self.lut = Signal(2 ** input_count) # lookup input + self.output = Signal(width) # output
I added the stand-alone simulator test. lkcl, please mark this as completed after you check and add some funding, cuz I'll need more soon.
Since the OE getting set when we ask for Rc=1 thing is a bug that I figured out how to work around, I think we should add Rc=1 back to ternlogi, for consistency with OpenPower base instructions and./or./xor./etc.
(In reply to Jacob Lifshay from comment #54) > Since the OE getting set when we ask for Rc=1 thing is a bug that I figured > out how to work around, I think we should add Rc=1 back to ternlogi, for > consistency with OpenPower base instructions and./or./xor./etc. Additional description: https://bugs.libre-soc.org/show_bug.cgi?id=765#c0 Work around by adding ternlog to hard-coded list: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder2.py;h=a636fdf5160cbfa680cdbf8ea36bced0304ca7f7;hb=HEAD#l573
(In reply to Jacob Lifshay from comment #54) > Since the OE getting set when we ask for Rc=1 thing is a bug that I figured > out how to work around, I think we should add Rc=1 back to ternlogi, for > consistency with OpenPower base instructions and./or./xor./etc. nowhere near enough encoding space in opcode 22. if moved to an alternative major opcode tbere is room.
(In reply to Luke Kenneth Casson Leighton from comment #56) > (In reply to Jacob Lifshay from comment #54) > > Since the OE getting set when we ask for Rc=1 thing is a bug that I figured > > out how to work around, I think we should add Rc=1 back to ternlogi, for > > consistency with OpenPower base instructions and./or./xor./etc. > > nowhere near enough encoding space in opcode 22. > > if moved to an alternative major opcode tbere is room. Ok, I'll add a note to the bitmanip page on the wiki.
sorry jacob i moved OP_TERNLOG to its own major op (to be chosen) good news is, Rc=1 is now possible. GT LE actually have partial meaning there which is nice. i am putting sone thought into the stranger ternlog ops as well, the dynamic one in particular, i would like to see a way to do FPGA emulation without static immediates (the 8bit lut3 be a register) also the extra space allows 3in 1out for CRs. BT BA BB BC
(In reply to Luke Kenneth Casson Leighton from comment #58) > sorry jacob i moved OP_TERNLOG to its own > major op (to be chosen) lets pick major 5 for now, since that means I have to change less rn. > good news is, Rc=1 > is now possible. Ok, added Rc=1 to the code. https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=79b540bb5d8f8ba139f56ff1e8010bde24df3d22 I also had to clean up where you accidentally used HTML comments instead of python comments in the pseudocode, which caused compilation to fail (evidently you forgot to try compiling it...) https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4e181c8e868230e930292df33f9985c3bb78e30c Afaict this is complete now, Luke, can you allocate some budget to this and mark this bug resolved, so I can get paid soon?
(In reply to Jacob Lifshay from comment #59) > (In reply to Luke Kenneth Casson Leighton from comment #58) > > sorry jacob i moved OP_TERNLOG to its own > > major op (to be chosen) > > lets pick major 5 for now, since that means I have to change less rn. ack. > > good news is, Rc=1 > > is now possible. > > Ok, added Rc=1 to the code. > > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=79b540bb5d8f8ba139f56ff1e8010bde24df3d22 excellent. > I also had to clean up where you accidentally used HTML comments instead of > python comments in the pseudocode, no, i did not make that mistake. HTML comments as long as they are on one line are intentionally supported. > which caused compilation to fail > (evidently you forgot to try compiling it...) yes, i expected it to work. > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=4e181c8e868230e930292df33f9985c3bb78e30c those comments aren't intended to go into the pseudocode, they are as intended. HTML-style comments *are* supported (and desirable) and the parser should be fixed. can you please revert that. > Afaict this is complete now, Luke, can you allocate some budget to this and > mark this bug resolved, so I can get paid soon? that's down to michiel signing off the MoU.
(In reply to Luke Kenneth Casson Leighton from comment #60) > (In reply to Jacob Lifshay from comment #59) > > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > > h=4e181c8e868230e930292df33f9985c3bb78e30c > > those comments aren't intended to go into the pseudocode, > they are as intended. HTML-style comments *are* supported > (and desirable) and the parser should be fixed. can you > please revert that. sure, once the parser is fixed. until then, I'm going to leave that in so I can actually work on the pseudo-code for clmul and friends, which is what I'm planning on working on next. I'll create a bug report for fixing the parser.
https://libre-soc.org/irclog/%23libre-soc.2022-05-17.log.html#t2022-05-17T00:41:29 > programmerjake oh, lkcl, reading through your modifications to > bitmanip, crbinlog imho should take the look-up table from a GPR rather > than a CR. this will make it much easier to generate. 00:41 > programmerjake much easier to use in user code because CR's are > generally conditions rather than arbitrary binary values. 00:41 unfortunately there's nowhere near enough space in the opcode | 0.5|6.8 | 9.11|12.14|15.17|18.22|23...30 |31| | -- | -- | --- | --- | --- |-----| --------|--| | NN | BT | BA | BB | BC |m0-m2|00101110 |m3| * the 4-bit mask selects which CR Field gets updated * BC contains the LUT2 which happens to be 4-bit * BT BA BB BC are a total of 12 bits. i'm nervous about using some of the 5-bit XO table (the one above the 10-bit XO which this instruction is already intruding on) but more than that, the crweird instructions allow for more powerful nibble-based transfer between GPR and CR Fields, including merging and also that special mode where QTY 2of CR Fields can be put into the 8 LSBs of a GPR. i kinda like the idea, though, let me take a look / reminder of what crweirds do.
turns out v3.1B has xxeval, which is a ternlogi-equivalent for VSX
(In reply to Jacob Lifshay from comment #63) > turns out v3.1B has xxeval, which is a ternlogi-equivalent for VSX ah! yes, i encountered that briefly, couldn't find it. i did notice that the ternlog table is bit-inverted (MSB0), it would be useful to keep exactly the same table, by bit-inverting ternlogi a[i] b[i] c[i] to c[i] b[i] a[i]
https://libre-soc.org/openpower/sv/bitmanip/ternlogi_simplification_experiment/ https://git.libre-soc.org/?p=libreriscv.git;a=commit;h=8268a10d4827a095f7c776d3424e7b5abc581056 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Thu Mar 9 00:10:47 2023 -0800 add experiment for seeing if changing ternlogi to have 7-bit immediate could even work turns out, it probably can. I wrote a script that looks for possible expand_encoded_imm() functions and filled in one I found. assuming I counted correctly and the function I found actually is correct, it should only need 9 xor gates! so, imho this is trivial to decode even if we (unwisely) decided to mush it into the decoder instead of the ALU.
(In reply to Jacob Lifshay from comment #65) > assuming I counted correctly and the function I found actually is correct, > it should only need 9 xor gates! so, imho this is trivial to decode even if > we (unwisely) decided to mush it into the decoder instead of the ALU. we may still decide the (slightly) reduced flexibility and increased cognitive complexity isn't worth it and just use an 8-bit immediate, which is fine with me.
lkcl: can I just mark this as done and you allocate some budget to it?
using the same figures of EUR 2.70 per line of code from bug #1025 and the estimate of 384 lines from: https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-August/005607.html I'm assigning EUR 1000 which is rounded to the nearest EUR 100
(In reply to Jacob Lifshay from comment #66) > we may still decide the (slightly) reduced flexibility and increased > cognitive complexity isn't worth it and just use an 8-bit immediate, which > is fine with me. svp64:vector-immediates can extend it later without changing the opcode. (In reply to Jacob Lifshay from comment #68) > I'm assigning EUR 1000 which is rounded to the nearest EUR 100 awesome.