Bug 745 - OP_TERNLOG instruction
Summary: OP_TERNLOG instruction
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Jacob Lifshay
URL: https://libre-soc.org/openpower/sv/bi...
Depends on: 753
Blocks: 741 771
  Show dependency treegraph
 
Reported: 2021-11-12 13:44 GMT by Luke Kenneth Casson Leighton
Modified: 2022-05-21 16:44 BST (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2021.02A.052.CryptoRouter
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation: 741
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-11-12 13:44:48 GMT
plan (edited as needed by programmerjake):
* DONE: TI-Form
* DONE: TI (not TII) field (arguably necessary)
* DONE: OP_TERNARY (not OP_TERNARYI) in CSV/power_enums
* DONE: rename to ternlog[i] and TLI
renamed in both openpower and the wiki
* basic adaptable module (probably in nmutil)
  - DONE: module in nmutil
  - DONE: replace with Array version
  - DONE: unit test
  - DONE: formal
* DONE: add encoding of ternlogi to SVP64Asm class (as a 32bit op)
* DONE: add ternlogi to shiftrot pipe
* DONE: put shiftrot rotator module code back with its comment see comment #22
* DONE: add "pspec" option to enable/disable draft instruction, see comment #22
* DONE: fu unit tests
* DONE: fu formal
* DONE: add "ExpectedState" to bitmanip_cases.py (yes irony it
  needs a python implementation of ternlogi but it'll be 4 lines
  fortunately) this is so that
  - DONE add a stand-alone Simulator *only* runner of bitmanip_cases.py
* DONE tidy up lut.py in nmutil as it is a public API comment #45
* DONE: re-add Rc=1 to ternlogi (maybe)

Links/commits:

* https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/lut.py;h=5705d3d96e21c050b229e65b9b88d249dbdfe356;hb=3e45917af6e91df910b6fc77d031ee3a656c4116
* https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=892a3cd2f7d76a842c9f838d808af70ea87bf7f6
* https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=2863e83567d6ab0666d7a9749ec07a8c572e47b4;hb=3e45917af6e91df910b6fc77d031ee3a656c4116
* https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=878e4c6776203e12abb4cf4e78f6d251121418c1
* https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_core.py;h=a35545024e23ba05a8ad71ac96e2500e0c5b4612;hb=75bdc1747f32a4fb6cf848ed8b5c68ef2f683f4c#l245
* https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/main_stage.py;h=b8ec704199a800c6df652524612581e07d885bb2;hb=880bf8469c65dbb96c9853b32618d07a0c742cf0
* https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=79b540bb5d8f8ba139f56ff1e8010bde24df3d22
Comment 1 Luke Kenneth Casson Leighton 2021-11-12 13:46:45 GMT
these data structures and code can probably be utilised directly,
no modifications at all, in fact just simply import them in
the pipeline

*or*

even: simply add OP_TERNARY *to* shift_rot because it has
the exact register profile.  if deploying that trick, it
would be best to add a PSpec run/compile-time option to make
it possible to activate/deactivate (accessible as self.pspec
inside main_stage.py)

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/pipe_data.py;hb=HEAD

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/input_stage.py;hb=HEAD

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/output_stage.py;h=d207d4ff416e6f503b9d459fd885bf9ac6e366c7;hb=HEAD

if adding OP_TERNARY to shift_rot it would be here:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/shift_rot/main_stage.py;hb=HEAD

adding to an existing pipeline will save time, so makes a lot of
sense, not having more code to write (or review).

unit tests however should really go into this directory:
https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/test/shift_rot;hb=HEAD

but have their own separate file (i.e. not go into shift_rot_cases.py).
the unit tests for this one are going to be BIG.  256 combinations
of the selector.
Comment 2 Jacob Lifshay 2021-11-16 01:17:32 GMT
updated the plan https://bugs.libre-soc.org/show_bug.cgi?id=745#c0
also, ternary is not a very good name (too generic), imho a better name would be ternlog (like x86's instruction) or lut3
Comment 3 Jacob Lifshay 2021-11-17 03:13:18 GMT
Added BitwiseLut, which is a generic n-ary bitwise lut -- ternary is just BitwiseLut(input_count=3, width=64)

https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=3e45917af6e91df910b6fc77d031ee3a656c4116
Comment 4 Jacob Lifshay 2021-11-17 04:11:25 GMT
Added formal test for BitwiseLut:
https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=878e4c6776203e12abb4cf4e78f6d251121418c1
Comment 5 Luke Kenneth Casson Leighton 2021-11-17 12:16:37 GMT
(In reply to Jacob Lifshay from comment #4)
> Added formal test for BitwiseLut:
> https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;
> h=878e4c6776203e12abb4cf4e78f6d251121418c1

excellent.  btw the convention for those is that they go
in a directory named "formal" rather than a directory named
"test".

(In reply to Jacob Lifshay from comment #2)
> updated the plan https://bugs.libre-soc.org/show_bug.cgi?id=745#c0
> also, ternary is not a very good name (too generic), imho a better name
> would be ternlog (like x86's instruction) or lut3

lut3's good.  using x86 names... hmmm.

(In reply to Jacob Lifshay from comment #3)
> Added BitwiseLut, which is a generic n-ary bitwise lut -- ternary is just
> BitwiseLut(input_count=3, width=64)
> 
> https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;
> h=3e45917af6e91df910b6fc77d031ee3a656c4116

+                for i in range(self.input_count):
+                    if sel_values[i]:
+                        lut_index |= 2 ** i]

that could simply be done in one line as:

     lut_index.eq(Cat(*sel_values)

also... i'm not really sure this is a correct implementation.  there
are 256 unit tests required: one for each and every 4th input, because
the 4th input (lut) is 8-bit wide and therefore there are 256 possible
values.

also, i'm surprised it works without using Array()
(yes there are shorter ways to do this):

self._lut_underlying_signal = Signal(8)
lut_bits = []
for i in range(len(self._lut_underlying_signal)):
    lut_bits.append(self.lut_underlying_signal[i])
self.lut = Array(lut_bits)

without that dynamic indexing i would not expect it to work,
at all [Array creates a pmux at the back-end].

if it does work then i expect that what you've written is
a re-implementation of pmux (Array) using a lot more lines of
code than is necessary (30 instead of 2)

the whole module should be literally this - really not kidding,
only around 4 lines:

    lut = Array([lut_input[i] for i in range(lut_width)])
    for i in range(input_width):
        output[i].eq(lut[Cat(in1[i], in2[i], in3[i])])

that's it.  that's the entire implementation.


also, what is the Repl() for?  the idea is to take each bit
of the 3 inputs and put each of them through a lut Array
lookup.  the inputs should be of the exact length such that
no use of Repl() is required: that would be an additional
Module()'s job, or the user of this Module's responsibility.

Repl() - a secondary task - complicates both the unit test and the
Formal Correctness Proof because both the unit test and the Formal
Correctness Proof have two tasks to deal with, not one.

[unix philosophy: "do one job and do it well"]
Comment 6 Luke Kenneth Casson Leighton 2021-11-17 13:51:01 GMT
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/cr/main_stage.py;h=5f1edc7adb6fed02a2fc08b2a5ff0891905eabe9;hb=04c4144f0d962bb181c69de4ce5a998077314e24#l96

  96                 # look up the output bit in the lookup table
  97                 bit_o = Signal()
  98                 comb += bit_o.eq(Mux(bit_b,
  99                                      Mux(bit_a, lut[3], lut[1]),
 100                                      Mux(bit_a, lut[2], lut[0])))


hmm.  this should not have been done this way: it is manual construction
(replication) of a pmux.

that prevents the HDL tools from identifying a pmux opportunity, which
would in turn mean that if an optimised ASIC-grade pmux Cell is ever
created, the tools would not use it.

this code should be replaced with:

     lut_array = Array([lut[i] for i in range(4)]
     bit_o.eq[lut_array[Cat(bit_b, bit_a))

or, better, the use of the LUT module once converted to (much simpler)
use of Array.
Comment 7 Jacob Lifshay 2021-11-17 17:14:53 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to Jacob Lifshay from comment #4)
> > Added formal test for BitwiseLut:
> > https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;
> > h=878e4c6776203e12abb4cf4e78f6d251121418c1
> 
> excellent.  btw the convention for those is that they go
> in a directory named "formal" rather than a directory named
> "test".

k, i put them in the test file cuz I already had it open...didn't think about that. Also, I rely on the formal proof to ensure complete coverage since there are too many cases to check in a unit test.
> 
> (In reply to Jacob Lifshay from comment #2)
> > updated the plan https://bugs.libre-soc.org/show_bug.cgi?id=745#c0
> > also, ternary is not a very good name (too generic), imho a better name
> > would be ternlog (like x86's instruction) or lut3
> 
> lut3's good.  using x86 names... hmmm.

the benefit of using x86's name is someone who has seen the x86 instruction before will know exactly what it does, without having to look it up. The other names are worse cuz they don't indicate that the operation is bitwise... the log in ternlog indicates bitwise logic.

> (In reply to Jacob Lifshay from comment #3)
> > Added BitwiseLut, which is a generic n-ary bitwise lut -- ternary is just
> > BitwiseLut(input_count=3, width=64)
> > 
> > https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;
> > h=3e45917af6e91df910b6fc77d031ee3a656c4116
> 
> +                for i in range(self.input_count):
> +                    if sel_values[i]:
> +                        lut_index |= 2 ** i]
> 
> that could simply be done in one line as:
> 
>      lut_index.eq(Cat(*sel_values)

no, it can't cuz sel_values is a tuple of bools and lut_index is an int, they are not nmigen Values.
> 
> also... i'm not really sure this is a correct implementation.  there
> are 256 unit tests required: one for each and every 4th input, because
> the 4th input (lut) is 8-bit wide and therefore there are 256 possible
> values.

there's actually 2^56 possible input values cuz 3 * 16-bit inputs + 8-bit lut. I decided that was too many to test all of, so I rely on the formal proof for complete coverage and just pick 100 pseudorandom cases (using sha256 for reproducibility) to test in the unit test.

> also, i'm surprised it works without using Array()
> (yes there are shorter ways to do this):
> 
> self._lut_underlying_signal = Signal(8)
> lut_bits = []
> for i in range(len(self._lut_underlying_signal)):
>     lut_bits.append(self.lut_underlying_signal[i])
> self.lut = Array(lut_bits)
> 
> without that dynamic indexing i would not expect it to work,
> at all [Array creates a pmux at the back-end].
> 
> if it does work then i expect that what you've written is
> a re-implementation of pmux (Array) using a lot more lines of
> code than is necessary (30 instead of 2)
> 
> the whole module should be literally this - really not kidding,
> only around 4 lines:
> 
>     lut = Array([lut_input[i] for i in range(lut_width)])
>     for i in range(input_width):
>         output[i].eq(lut[Cat(in1[i], in2[i], in3[i])])
> 
> that's it.  that's the entire implementation.
> 
> 
> also, what is the Repl() for?
well, the way I implemented it is by building a binary tree of output-width wide BitwiseMux selecting based on the inputs, where each Signal in the tree is output-width wide -- the Repl is to splat each lut bit into an output-width wide value so the BitwiseMux tree has all the right input bits. if you use yosys show, you should get a good idea of the structure:

       output
          |
     /----+----\
in0-/ mux_0bxxx \
   /--0-------1--\
      |       |
      +---+   +-------------+
          |                 |
     /----+----\       /----+----\
in1-/ mux_0bxx0 \ in1-/ mux_0bxx1 \
   /--0-------1--\   /--0-------1--\
      |       |         |       |
      |       |         |       +-------------------------------+
      |       |         +---------------------+                 |
      +---+   +-------------+                 |                 |
          |                 |                 |                 |
     /----+----\       /----+----\       /----+----\       /----+----\
in2-/ mux_0bx00 \ in2-/ mux_0bx10 \ in2-/ mux_0bx01 \ in2-/ mux_0bx11 \
   /--0-------1--\   /--0-------1--\   /--0-------1--\   /--0-------1--\
      |       |         |       |         |       |         |       |
    splat   splat     splat   splat     splat   splat     splat   splat
      |       |         |       |         |       |         |       |
    lut[0]  lut[4]    lut[2]  lut[6]    lut[1]  lut[5]    lut[3]  lut[7]
Comment 8 Luke Kenneth Casson Leighton 2021-11-17 17:25:41 GMT
(In reply to Jacob Lifshay from comment #7)

> the benefit of using x86's name is someone who has seen the x86 instruction
> before will know exactly what it does, without having to look it up. The
> other names are worse cuz they don't indicate that the operation is
> bitwise... the log in ternlog indicates bitwise logic.

yep, like it.  ternlog (and ternlogi) it is


> there's actually 2^56 possible input values cuz 3 * 16-bit inputs + 8-bit
> lut. I decided that was too many to test all of, 

hell yes.

> so I rely on the formal proof for complete coverage 

which is what they're for.

> and just pick 100 pseudorandom cases (using
> sha256 for reproducibility) to test in the unit test.

good idea.
 

> > also, what is the Repl() for?
> well, the way I implemented it is by building a binary tree of output-width
> wide BitwiseMux selecting based on the inputs, where each Signal in the tree
> is output-width wide

if pmux did not exist, or was not covered in nmigen by Array(),
or if there was some massive inefficiency expected by yosys to
completely fail to create an optimal binary-fan-out mux-tree,
then it would be a good idea.

> -- the Repl is to splat each lut bit into an
> output-width wide value so the BitwiseMux tree has all the right input bits.

that can be covered by specifying a fixed width as an integer argument
for both input and output.

that results in the number of lines of code required literally reducing
down to four (4)

if someone wants Repl() externally they can do that... with another Module
or in the Module.

forcing the API to deal with a case that may - or may not - be even needed
introduces API and design complexity.

needs to go.  4 lines of code.  that's all.
Comment 9 Jacob Lifshay 2021-11-17 18:33:49 GMT
(In reply to Luke Kenneth Casson Leighton from comment #8)
> (In reply to Jacob Lifshay from comment #7)
> 
> > the benefit of using x86's name is someone who has seen the x86 instruction
> > before will know exactly what it does, without having to look it up. The
> > other names are worse cuz they don't indicate that the operation is
> > bitwise... the log in ternlog indicates bitwise logic.
> 
> yep, like it.  ternlog (and ternlogi) it is

k, i'll rename it to ternlog in both the openpower-isa.git and the wiki. imho the TI-form and TI-field should probably be renamed to TLI.

> if pmux did not exist, or was not covered in nmigen by Array(),
> or if there was some massive inefficiency expected by yosys to
> completely fail to create an optimal binary-fan-out mux-tree,
> then it would be a good idea.

yup, I'll replace it with using Array as suggested. Repl is only needed for the BitwiseMux-tree version.

> 
> > -- the Repl is to splat each lut bit into an
> > output-width wide value so the BitwiseMux tree has all the right input bits.
> 
> that can be covered by specifying a fixed width as an integer argument
> for both input and output.

umm, the input/outputs are already fixed-width with the width as an integer argument passed into __init__, i think you've misunderstood how Repl was used here...

> forcing the API to deal with a case that may - or may not - be even needed
> introduces API and design complexity.

it was needed to convert from each single-bit slice of lut into the width-wide inputs of the BitwiseMux tree, cuz otherwise you'd end up just assigning the lsb 
(cuz slices are unsigned) and only the lsb of the output would be correct, the rest of the output bits would always be zeros.
Comment 10 Jacob Lifshay 2021-11-17 19:01:53 GMT
(In reply to Jacob Lifshay from comment #9)
> yup, I'll replace it with using Array as suggested.

Done:
https://git.libre-soc.org/?p=nmutil.git;a=commitdiff;h=892a3cd2f7d76a842c9f838d808af70ea87bf7f6
Comment 11 Jacob Lifshay 2021-11-17 19:28:03 GMT
renamed ternary->ternlog and TI->TLI in both openpower-isa.git and the wiki
Comment 13 Luke Kenneth Casson Leighton 2021-11-17 22:15:20 GMT
(In reply to Jacob Lifshay from comment #12)
> WIP:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=309b35ee91f1ad6448379a1d804c304dd9681396

i forgot, an additional TODO (now added): basically the contents
of this patch should be shifted to SVP64Asm (as a 32bit instruction)
there are about 6-10 instructions now in that class, and the lst
argument can be chucked at it and it will create a .long xxxxxxxxxx
with the right "stuff".

> https://git.libre-soc.org/?p=soc.git;a=commit;
> h=e281a933b5e0b7b0c85040116a404873f4ee0f17

mm... a case could be made either way for the benefits of having
a separate pipeline vs putting ops into ones with identical
profiles.

basically, instructions must be grouped by their register profile,
*not* by the name of the PDF Page Section in the ISA Manual.

and OP_TERNLOG is an identical register profile to shiftrot.

when more instructions are added, this will start to make sense.
e.g. adding the CR-based variant of OP_TERNLOG, are another
three regs to be added to regspec?

that would make SIX input regs and two output regs!

RA RB RC BA BB BC

this would be quite insane to have a pipeline and a Dependency
Matrix Row with EIGHT registers!

see now why things have to be grouped with other instructions
with a similar register profile?

it is to keep the Dependency Hazard Tracking down to sane levels.
Comment 14 Jacob Lifshay 2021-11-20 00:14:52 GMT
(In reply to Luke Kenneth Casson Leighton from comment #13)
> (In reply to Jacob Lifshay from comment #12)
> > https://git.libre-soc.org/?p=soc.git;a=commit;
> > h=e281a933b5e0b7b0c85040116a404873f4ee0f17
> 
> mm... a case could be made either way for the benefits of having
> a separate pipeline vs putting ops into ones with identical
> profiles.

I was thinking I would put all integer bitmanip ops in one pipeline, and all cr bitmanip ops in another.
Comment 15 Luke Kenneth Casson Leighton 2021-11-20 01:01:25 GMT
(In reply to Jacob Lifshay from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #13)
> > (In reply to Jacob Lifshay from comment #12)
> > > https://git.libre-soc.org/?p=soc.git;a=commit;
> > > h=e281a933b5e0b7b0c85040116a404873f4ee0f17
> > 
> > mm... a case could be made either way for the benefits of having
> > a separate pipeline vs putting ops into ones with identical
> > profiles.
> 
> I was thinking I would put all integer bitmanip ops in one pipeline, and all
> cr bitmanip ops in another.

if you examine the source code - or were familiar with it already - you
would have seen how core.py works and how it auto-constructs register file
port access by analysing the regspecs and creating broadcast buses.
 
due to there being 10 pipelines already there are currently a whopping
35 register file read port accesses: this is after merging RA RB and RC into
one single register file port (which, clearly, requires 3 cycles delay
just to get all the operands).

for an in-order core assuming a 3R1W or 4R1W regfile that goes up to
a staggering 50 register file port read accesses, if that "merging" trick
is removed (in order to get latency down).

the consequences of what you are proposing is that the number of register
file ports goes up to almost 60 register file port read accesses.

that is just read port accessors.

the policy / convention of grouping by regspec profile - not by function -
which i have repeated four times now - is there to ensure that there is
no out-of-control explosion of port proliferation, which requires massive
wide OR gates and Muxes to handle.

when i provide summaries as instructions to you, it is to save time
explaining these things in detail.

it would be good if you could listen because you are still writing
3-5 times as much code as is required, and then i have to spend more
time explaining why that code has to be cut back, usually to what
i said would be needed, when i said it the first time.
Comment 16 Luke Kenneth Casson Leighton 2021-11-20 14:49:21 GMT
i put together an explanatory video for you, i recommend viewing it
and core.py https://youtu.be/7Th1b-jq40k
Comment 17 Jacob Lifshay 2021-11-23 05:47:33 GMT
(In reply to Luke Kenneth Casson Leighton from comment #16)

responded in new thread cuz it's more closely related to the core pipeline ALU/FU design than ternlog:
https://lists.libre-soc.org/pipermail/libre-soc-dev/2021-November/004178.html
Comment 18 Luke Kenneth Casson Leighton 2021-11-24 21:04:35 GMT
https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=14896da28d3eff637048c746d1e7a2aa5865f133;hb=892a3cd2f7d76a842c9f838d808af70ea87bf7f6#l33

jacob could you please remove that code entirely and use the standard
FHDLTestCase class just like all other formal correctness proofs,
which you should have known existed because you must have been
reading the source code and constantly examining it for over 2 years
now, so there is no excuse. 

again this is unnecessary duplicated code which has been written
and precious time has been wasted on more duplication, then more
time spent identifying the duplication, then more time asking you
to remove the duplication, and more time by you spent removing the
duplication.
Comment 19 Jacob Lifshay 2021-11-25 01:55:05 GMT
(In reply to Luke Kenneth Casson Leighton from comment #18)
> https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/test/test_lut.py;h=14896da28d3eff637048c746d1e7a2aa5865f133;hb=892a3cd2f7d76a842c9f838d808af70ea87bf7f6#l33
> 
> jacob could you please remove that code entirely and use the standard
> FHDLTestCase class just like all other formal correctness proofs,
> which you should have known existed because you must have been
> reading the source code and constantly examining it for over 2 years
> now, so there is no excuse.

Yeah, I can switch back, though FHDLTestCase has some serious shortcomings, including not working correctly when tests are run in parallel cuz it uses the same directory (fixed by formal's using get_test_path which uses a separate path for each test case fn as well as each time it's called in a single test fn), as well as breaking the API of the assertRaises, assertRaisesRegex, and assertWarns methods (cuz they yield None rather than what they should).
> 
> again this is unnecessary duplicated code which has been written
> and precious time has been wasted on more duplication,

it took me all of 30s to copy from where I already wrote it for unrelated code, so basically zero time was spent in that particular aspect.

> then more
> time spent identifying the duplication, then more time asking you
> to remove the duplication, and more time by you spent removing the
> duplication.

yeah, yeah...

I'll be busy for the rest of today and tomorrow, so will get to it on friday.
Comment 20 Luke Kenneth Casson Leighton 2021-11-25 07:05:52 GMT
(In reply to Jacob Lifshay from comment #19)
 
> Yeah, I can switch back, though FHDLTestCase has some serious shortcomings,
> including not working correctly when tests are run in parallel cuz it uses
> the same directory (fixed by formal's using get_test_path which uses a
> separate path for each test case fn as well as each time it's called in a
> single test fn), as well as breaking the API of the assertRaises,
> assertRaisesRegex, and assertWarns methods (cuz they yield None rather than
> what they should).

then you should have fixed it!

at the very least you should have raised it as a bugreport, not
least so that money can be allocated to it for you to
do the damn work!

> > 
> > again this is unnecessary duplicated code which has been written
> > and precious time has been wasted on more duplication,
> 
> it took me all of 30s to copy from where I already wrote it for unrelated
> code, so basically zero time was spent in that particular aspect.

by you.  creating a maintenance headache that i have to deal with.
which is very annoying to me

> > then more
> > time spent identifying the duplication, then more time asking you
> > to remove the duplication, and more time by you spent removing the
> > duplication.
> 
> yeah, yeah...
> 
> I'll be busy for the rest of today and tomorrow, so will get to it on friday.

please speed up.  it has been about six weeks and only one instruction
has [not] been written.
Comment 21 Jacob Lifshay 2021-12-02 03:10:01 GMT
(In reply to Luke Kenneth Casson Leighton from comment #20)
> (In reply to Jacob Lifshay from comment #19)
>  
> > Yeah, I can switch back, though FHDLTestCase has some serious shortcomings,
>
> then you should have fixed it!

Ok, I fixed it and switched the tests to use the newly-fixed FHDLTestCase.

I also moved ternlogi to the shiftrot pipe and removed the now-redundant bitmanip pipe, I still need to add the fu unit test.
Comment 22 Luke Kenneth Casson Leighton 2021-12-02 11:26:31 GMT
(In reply to Jacob Lifshay from comment #21)
> (In reply to Luke Kenneth Casson Leighton from comment #20)
> > (In reply to Jacob Lifshay from comment #19)
> >  
> > > Yeah, I can switch back, though FHDLTestCase has some serious shortcomings,
> >
> > then you should have fixed it!
> 
> Ok, I fixed it and switched the tests to use the newly-fixed FHDLTestCase.

brilliant.

> I also moved ternlogi to the shiftrot pipe 

great.  really, a compile-time option needs adding which can disable
it.  added to pspec.

  22     def __init__(self, pspec):
  22.5       self.draft_bitmanip = hasattr(pspec, "draft_bitmanip")
                       and psec.draftbitmanip == True
  23         super().__init__(pspec, "main")

  92.5       if self.draft_bitmanip:
  93             with m.Case(MicrOp.OP_TERNLOG):
  94                 # TODO: this only works for ternaryi, change to get lut value

that has the advantage of identifying draft instructions as well as
making it abundantly clear that they are, in fact, draft (unofficial)

see test_core.py as to how to set them up:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_core.py;h=a35545024e23ba05a8ad71ac96e2500e0c5b4612;hb=75bdc1747f32a4fb6cf848ed8b5c68ef2f683f4c#l245

the new option draft_bitmanip=True will need to be added



btw, this move should not have been left uncommented in the commit message:
it has nothing to do with ternlogi.

it should have been done as a completely separate commit,
"moving rotator module connectivity)

actually, please put that back so to line 102, back associated with
the Cat() because it now looks like the Cat statement has no comment
or explanation associated with it.

(edit: or move the Cat() and add a comment about it setting "mode")

 102         comb += Cat(rotator.right_shift,
 103                     rotator.clear_left,

@@ -65,6 +71,10 @@ class ShiftRotMainStage(PipeModBase):
 
         comb += o.ok.eq(1)  # defaults to enabled
 
+        # outputs from the microwatt rotator module
+        comb += [o.data.eq(rotator.result_o),
+                 self.o.xer_ca.data.eq(Repl(rotator.carry_out_o, 2))]
+
         # instruction rotate type
         mode = Signal(4, reset_less=T


> and removed the now-redundant
> bitmanip pipe, I still need to add the fu unit test.

great.
Comment 23 Jacob Lifshay 2021-12-07 03:41:13 GMT
(In reply to Luke Kenneth Casson Leighton from comment #22)
> (In reply to Jacob Lifshay from comment #21)
> > I also moved ternlogi to the shiftrot pipe 
> 
> great.  really, a compile-time option needs adding which can disable
> it.  added to pspec.

I added that to shift_rot's pspec, I have yet to figure out how to pass that in from all the places where ShiftRot pipelines are created...will try to figure that out.

> see test_core.py as to how to set them up:
> 
> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_core.py;h=a35545024e23ba05a8ad71ac96e2500e0c5b4612;hb=75bdc1747f32a4fb6cf848ed8b5c68ef2f683f4c#l245
> 
> the new option draft_bitmanip=True will need to be added


> btw, this move should not have been left uncommented in the commit message:
> it has nothing to do with ternlogi.
> 
> (edit: or move the Cat() and add a comment about it setting "mode")

moved the Cat
Comment 24 Luke Kenneth Casson Leighton 2021-12-07 11:26:04 GMT
(In reply to Jacob Lifshay from comment #23)
> (In reply to Luke Kenneth Casson Leighton from comment #22)
> > (In reply to Jacob Lifshay from comment #21)
> > > I also moved ternlogi to the shiftrot pipe 
> > 
> > great.  really, a compile-time option needs adding which can disable
> > it.  added to pspec.
> 
> I added that to shift_rot's pspec, I have yet to figure out how to pass that
> in from all the places where ShiftRot pipelines are created...will try to
> figure that out.

for anything related to TestIssuer unit test running, as well as ISACaller
running, the "base" class is here:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/runner.py;h=b8378a93ee550c67ac4eb48097005e9f631f0879;hb=74adc2b94a4409162119157f3f75e05a06e4c841#l135

which for TestIssuer simulations the extra argument must be passed here:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_runner.py;h=962d47a20e9e11eaf992c91c5e424c4c7bbdae5f;hb=20ffa8ec223be44fc37df5184ca09d648e85a844#l292

and a command-line argument added here:

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/simple/test/test_issuer.py;h=b8245cbf1a5356f39a6695531ecda1e96460617b;hb=20ffa8ec223be44fc37df5184ca09d648e85a844#l37

(actually to all of these)

https://git.libre-soc.org/?p=soc.git;a=tree;f=src/soc/simple/test;hb=HEAD

hmmm TestMemPSpec must move into openpower-isa, hmmm.  importing from
soc in the openpower-isa repo is... Bad(tm)

then there will also be fu/compunit tests, as well as individual
pipeline-tests that need it enabling.

basically, anywhere you can find a "grep -r TestmemPspec"

> > (edit: or move the Cat() and add a comment about it setting "mode")
> 
> moved the Cat

great.  need to keep things together, comments are critically important
in a project of this size.

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=20ffa8ec223be44fc37df5184ca09d648e85a844

+def get_pspec_draft_bitmanip(pspec):
+    return getattr(pspec, "draft_bitmanip", False)

it can't be done that way (at least, i don't think so, can you please
check and put in a code-comment if it does)

TestMemPspec actually is a subclass of unittest Mock() which has overrides
on __getattr__ etc. and consequently you cannot just access attributes
in the "normal" way.

you'll need to check that getattr() and hasattr() do the same thing:
they probably don't, which is most likely why i specifically used the
(odd-looking) "if hasattr(pspec, "attribute") and pspec.attribute == True)"

 class ShiftRotMainStage(PipeModBase):
     def __init__(self, pspec):
         super().__init__(pspec, "main")
+        self.draft_bitmanip = get_pspec_draft_bitmanip(pspec)
         self.fields = DecodeFields(SignalBitRange, [self.i.ctx.op.insn])
         self.fields.create_specs()

please comment that.  explain what is going on (that it's draft instructions,
include the URL of draft bitmanip instructions as well) and also cross-reference
at the top code-block back to the URL of the relevant bugreport comment as
to what work is going on here: something like "draft bitmanip instructions
are added here because of similar register profiles, they are not approved yet
by OPF, and can be disabled at compile-time.  draft spec is at {URL},
bugreport is at {URL#comment}"

this is so that we don't get completely and hopelessly lost even during
code-review right now, as well as later when other people wonder what the
heck is going on.

hmmm also can you please add this URL, it's missing
https://bugs.libre-soc.org/show_bug.cgi?id=339
and that it implements Power ISA v3.0B Book I Section 3.3.14 p101-110

(shiftrot was one of the very early pipelines and the practice of linking
the context (bugreport, Spec page numbers) hadn't been established yet).
Comment 25 Jacob Lifshay 2021-12-08 01:46:37 GMT
(In reply to Luke Kenneth Casson Leighton from comment #24)
> https://git.libre-soc.org/?p=soc.git;a=commitdiff;
> h=20ffa8ec223be44fc37df5184ca09d648e85a844
> 
> +def get_pspec_draft_bitmanip(pspec):
> +    return getattr(pspec, "draft_bitmanip", False)
> 
> it can't be done that way (at least, i don't think so, can you please
> check and put in a code-comment if it does)

yup, it won't work that way, if pspec is a Mock instance here. Mock will return new Mock instances for any attribute access. Mock is plain and simple the wrong tool for the job, dataclasses are better.

From my reading through the code, it appears as though the pspec passed into ShiftRot isn't the same pspec created for the whole core, so there shouldn't be an issue for that particular use case.

> TestMemPspec actually is a subclass of unittest Mock() which has overrides
> on __getattr__ etc. and consequently you cannot just access attributes
> in the "normal" way.
> 
> you'll need to check that getattr() and hasattr() do the same thing:
> they probably don't, which is most likely why i specifically used the
> (odd-looking) "if hasattr(pspec, "attribute") and pspec.attribute == True)"

(hasattr(pspec, "attr") and pspec.attr) isn't any better than getattr(pspec, "attr", default), cuz hasattr always returns true for any attr string.
Comment 26 Luke Kenneth Casson Leighton 2021-12-08 02:39:33 GMT
(In reply to Jacob Lifshay from comment #25)

> yup, it won't work that way, if pspec is a Mock instance here. Mock will
> return new Mock instances for any attribute access. Mock is plain and simple
> the wrong tool for the job, dataclasses are better.

Mock was there, it does the job, it saved 1 day writing a custom class
that does exactly the same thing.

now it will be 3-4 days going through the entire codebase removing it
and rerunning all unit tests.  which we csnnot afford to do.

this is about the lowest possible priority task of all the lowest
priority tasks and even discussing how great a replacement dataclass
should be by comparison is a waste of time.

if you had helped out at the time then you could have written such
a class.  but you did not help.  therefore under extreme pressure
i had to make drastic shortcut decisions and now we are stuck with it.

lesson learned, i trust.
 
> From my reading through the code, it appears as though the pspec passed into
> ShiftRot isn't the same pspec created for the whole core, so there shouldn't
> be an issue for that particular use case.

yes, annoyingly confusing choice of variable name, 18 months ago.
i think i meant to merge them, then changed my mind, and never corrected
the variable names.  now there are hundreds of pspecs in use and it will
be a pig to separate them all.

> (hasattr(pspec, "attr") and pspec.attr) isn't any better than getattr(pspec,
> "attr", default), cuz hasattr always returns true for any attr string.

now you know why i did "pspec.somevar == True" not "if pspec.somevar"
because that will always return true.

it does need to be *in* TestMemPSpec and passed through, not in the
pipeline spec (sorry about the confusion) because TestMemPSpec
is what gets set/activated from the commandline (and various unit
tests).

in this way there is a choice to compile with or without draft
bitmanip.

exactly the same as nosvp64 and other options.
see issuer_verilog.py for the full list, an option
is needed --with-draft-bitmanip

by default it should be enabled (incl in test_issuer.py)
because it will be a pain to type every time.
Comment 27 Luke Kenneth Casson Leighton 2021-12-08 11:01:58 GMT
the class where these two types of pspecs merge is AllFunctionUnits.

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/compunits/compunits.py;h=be3d4e69c5806dcce1cf68642e0a514ac37249e2;hb=4c98cc88be5aba23807c6f4bf97e9de6ba13fd73#l281

you can see here:

 293     def __init__(self, pspec, pilist=None, div_fsm=True):
 294         addrwid = pspec.addr_wid
 295         units = pspec.units
 296         microwatt_mmu = hasattr(pspec, "mmu") and pspec.mmu == True
 297         print("AllFunctionUnits.microwatt_mmu="+str(microwatt_mmu))

that is taking:

* the address width from the TestMemPSpec
* the units dictionary from the TestMemPSpec
* whether mmu is requested to be added

as a special-case the address width is specifically handed to
LDSTFunctionUnit(s):

 344         for i, pi in enumerate(pilist):
 345             self.fus["ldst%d" % (i)] = LDSTFunctionUnit(pi, addrwid, i)

and you can see how in both FunctionUnitBaseMulti and Single
the pspec - this one is the *pipe* specification - is created
and passed through to the ALU:

 135 class FunctionUnitBaseMulti(ReservationStations2):
 157     def __init__(self, speckls, pipekls, num_rows):
 158         id_wid = num_rows.bit_length()
 159         pspec = speckls(id_wid=id_wid)           # spec (NNNPipeSpec instance)
 160         opsubset = pspec.opsubsetkls             # get the operand subset class
 161         regspec = pspec.regspec                  # get the regspec
 162         alu = pipekls(pspec)                # create actual NNNBasePipe
 163         self.pspec = pspec

there's a couple of options here:

1) pass the TestMemPSpec instance through to *everything* (an extra argument
   to every function unit class, in addition to idx

2) add a function that passes it in, post-instantiation (which then
   passes it through to the self.alu instance with a follow-on call)

i'd recommend (1) because the constructor of the PipeSpec and also
the ALU may need to adapt.

this will *need* code comments explaining what the hell is going on,
it's the first time these two (identical-named) variables have "met"
Comment 28 Luke Kenneth Casson Leighton 2021-12-08 20:56:49 GMT
love the commit comments, yes a dog's dinner would be prettier :)
it'll have to do for now, with enough warnings and explanatory notes:
changing low-level structures in python code is too risky (or needs
very careful intermediary planning)
Comment 29 Jacob Lifshay 2021-12-09 04:06:20 GMT
I decided that just passing the cpu's pspec into everything would work best, so added it as parent_pspec variables, since it being parent makes it more obvious what's going on. Hopefully I didn't miss anything important in my refactor...

I also added forwarding via __getattr__ so code expecting attributes to be set on pspec will automatically get them from parent_pspec too.

class CommonPipeSpec:
    """CommonPipeSpec: base class for all pipeline specifications
    see README.md for explanation of members.
    """

    def __init__(self, id_wid, parent_pspec):
        self.pipekls = SimpleHandshakeRedir
        self.id_wid = id_wid
        self.opkls = lambda _: self.opsubsetkls()
        self.op_wid = get_rec_width(self.opkls(None))  # hmm..
        self.stage = None
        self.parent_pspec = parent_pspec

    # forward attributes from parent_pspec
    def __getattr__(self, name):
        return getattr(self.parent_pspec, name)
Comment 30 Jacob Lifshay 2021-12-09 04:57:17 GMT
I added the ternlogi spec pseudo-code.

While I'm at it, I become annoyed with constantly having -.csv and sv_decode.vhdl be untracked files in openpower/isatables/
Should I add them to .gitignore or should I commit them?
Comment 31 Jacob Lifshay 2021-12-09 06:07:53 GMT
I got the ternlogi tests to run, they all seem to work correctly, except that somehow the simulator ends up with SO set even though nothing writes to it, Luke, please help debug that
Comment 32 Luke Kenneth Casson Leighton 2021-12-09 10:34:00 GMT
(In reply to Jacob Lifshay from comment #30)
> I added the ternlogi spec pseudo-code.
> 
> While I'm at it, I become annoyed with constantly having -.csv 

erm...

> and sv_decode.vhdl 

that one should be .gitignored

> be untracked files in openpower/isatables/
> Should I add them to .gitignore or should I commit them?

ermmm...ermermerm....  these are actually important to have
properly examined, lwzu for example is clearly a mistake.
can you please raise a bugreport about them, to be investigated
in the meantime i'll put something in sv_analysis.py
(edit: please cross-ref this commit in the bugreport
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=81c20b9e15d931ef58195644767a58944d933336)

insn,CONDITIONS,Ptype,Etype,0,1,2,3,in1,in2,in3,out,CR in,CR out,out2
dcbz,,0,EXTRA2,0,0,0,0,RA_OR_ZERO,RB,0,0,0,0,0
lwzu,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA
lbzu,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA
lhzu,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA
lhau,SVP64BREV,0,EXTRA2,0,0,0,0,RA_OR_ZERO,0,RC,RT,0,0,RA
lfsu,SVP64BREV,0,EXTRA2,0,0,0,0,RA,0,RC,FRT,0,0,RA
lfdu,SVP64BREV,0,EXTRA2,0,0,0,0,RA,0,RC,FRT,0,0,RA
1/6=mtfsb1,,0,EXTRA2,0,0,0,0,0,0,0,0,0,CR1,0
2/0=mcrfs,,0,EXTRA2,0,0,0,0,0,0,0,0,0,BF,0
2/6=mtfsb0,,0,EXTRA2,0,0,0,0,0,0,0,0,0,CR1,0
4/6=mtfsfi,,0,EXTRA2,0,0,0,0,0,0,0,0,0,CR1,0
Comment 33 Jacob Lifshay 2021-12-10 20:44:38 GMT
(In reply to Luke Kenneth Casson Leighton from comment #32)
> (In reply to Jacob Lifshay from comment #30)
> > I added the ternlogi spec pseudo-code.
> > 
> > While I'm at it, I become annoyed with constantly having -.csv 
> 
> erm...
> 
> > and sv_decode.vhdl 
> 
> that one should be .gitignored

Done.
> 
> > be untracked files in openpower/isatables/
> > Should I add them to .gitignore or should I commit them?
> 
> ermmm...ermermerm....  these are actually important to have
> properly examined, lwzu for example is clearly a mistake.
> can you please raise a bugreport about them,

Done.
Comment 34 Jacob Lifshay 2021-12-10 20:47:03 GMT
The FU unit tests pass now.
openpower-isa.git:
commit 3d9d4a0864e4e5d4ab6dc13348ec3289748285d6 (HEAD -> master, origin/master, origin/HEAD)
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Fri Dec 10 12:34:23 2021 -0800

    change ternlogi to not have Rc field
Comment 35 Jacob Lifshay 2021-12-10 21:22:12 GMT
added to SVP64Asm
Comment 36 Jacob Lifshay 2021-12-10 21:59:03 GMT
soc.git:
commit 798afa4ccda4a0e8065e03dcf90fb60922ba4b00 (HEAD -> master, origin/master, origin/HEAD)
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Fri Dec 10 13:54:22 2021 -0800

    add ternlogi to shift_rot formal test
Comment 37 Luke Kenneth Casson Leighton 2021-12-10 22:48:01 GMT
(In reply to Jacob Lifshay from comment #36)

>     add ternlogi to shift_rot formal test

brilliant.  give me a chance to review everything first before moving
it to "FIXED" because "FIXED" has the annoying side-effect of removing
the bug from public searches, and i want to make sure the wiki TODO
list has a crossreference.

in the meantime do pick another instruction, you know the ropes, now,
which is great. feel free to leave e.g. min/max to someone whom
we can give it to as a "beginner" exercise
Comment 38 Luke Kenneth Casson Leighton 2021-12-11 14:47:55 GMT
(edit: done, except 7-idx)

    result <- [0] * XLEN
    idx <- [0] * 3
    do i = 0 to XLEN - 1
      idx[0] <- (RT)[i]
      idx[1] <- (RA)[i]
      idx[2] <- (RB)[i]
      result[i] <- (TLI & ROTL64(1, idx)) != 0

the style in the Power ISA is to use less lines, and that's possible
here with:

    do i = 0 to XLEN - 1
      idx <- (RT)[i] || (RA)[i] || (RB)[i]
      result[i] <- (TLI & ROTL64(1, idx)) != 0

also, using ROTL64 seems overkill (and is supposed to be for shiftrot)

i suspect this will do:

    do i = 0 to XLEN - 1
      idx <- (RT)[i] || (RA)[i] || (RB)[i]
      result[i] <- TLI[idx]

don't be tempted to substitute idx there, the parser won't be able
to cope because it can't identify the size of idx.  if those 3 lines
don't work, then this should "fix" that: pre-declaring idx to a known
size.

    idx <- [0] * 3
    do i = 0 to XLEN - 1
      idx <- (RT)[i] || (RA)[i] || (RB)[i]
      result[i] <- TLI[idx]

no i don't want the parser to turn into a 2-pass Monster right now.
Comment 39 Luke Kenneth Casson Leighton 2021-12-11 15:13:55 GMT
commit 8abcd7b74c5416c8a5348efc467854ad3080f12f (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Dec 11 15:13:15 2021 +0000

    use concat in ternlogi to reduce code size

works.

     result <- [0] * XLEN
-    idx <- [0] * 3
     do i = 0 to XLEN - 1
-      idx[0] <- (RT)[i]
-      idx[1] <- (RA)[i]
-      idx[2] <- (RB)[i]
+      idx <- (RT)[i] || (RA)[i] || (RB)[i]
Comment 40 Luke Kenneth Casson Leighton 2021-12-11 15:19:16 GMT
ahh, and *this* works.

@@ -12,7 +12,7 @@ Pseudo-code:
     result <- [0] * XLEN
     do i = 0 to XLEN - 1
       idx <- (RT)[i] || (RA)[i] || (RB)[i]
-      result[i] <- (TLI & ROTL64(1, idx)) != 0
+      result[i] <- TLI[7-idx]
     RT <- result
 
nuts.  note the 7-idx.  IBM will have a fit about that,
it's LSB0-to-MSB0 conversion and i'm pretty sure they'll
want (insist on) MSB0 being in the spec.

deep breath: the spec needs to read:

      result[i] <- TLI[idx]

and... sigh... all the unit tests, HDL, everything, changed to
match that.

i know.  it's annoying, but we don't want to irritate the
IBM members of the ISA WG by not following the conventions
of 30 years.
Comment 41 Luke Kenneth Casson Leighton 2021-12-11 15:29:35 GMT
unit tests pass and run great btw!

python3 fu/shift_rot/test/test_pipe_calr.py  >& /tmp/f
Ran 3 tests in 8.743s
OK
Comment 42 Luke Kenneth Casson Leighton 2021-12-11 22:18:28 GMT
also (sorry!) just saw that bitmanip_cases.py doesn't have a stand-alone
Simulator-only test, which it should, and also have an ExpectedState
(see alu_cases.py for examples)

see decoder/isa/test_caller_alu.py and literally copy it, or, if you feel
inclined, work out a common function and update test_caller_bitmanip.py
and test_caller_alu.py and test_caller_shiftrot.py to use that common
function, rather than duplicate 30 lines of near-identical code.  no
big deal if you duplicate, there.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/alu/alu_cases.py;hb=HEAD
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_alu.py;hb=HEAD
Comment 43 Jacob Lifshay 2021-12-12 00:45:55 GMT
(In reply to Luke Kenneth Casson Leighton from comment #40)
> ahh, and *this* works.
> 
> @@ -12,7 +12,7 @@ Pseudo-code:
>      result <- [0] * XLEN
>      do i = 0 to XLEN - 1
>        idx <- (RT)[i] || (RA)[i] || (RB)[i]
> -      result[i] <- (TLI & ROTL64(1, idx)) != 0
> +      result[i] <- TLI[7-idx]
>      RT <- result
>  
> nuts.  note the 7-idx.  IBM will have a fit about that,
> it's LSB0-to-MSB0 conversion and i'm pretty sure they'll
> want (insist on) MSB0 being in the spec.
> 
> deep breath: the spec needs to read:
> 
>       result[i] <- TLI[idx]

I'd strongly argue we need little-endian anyway for essentially the same reasons as SV integer predicates being in little-endian, if we're ever going to have the non-immediate form. If we know for sure that we'll only have ternlogi, then it doesn't matter as much cuz the compiler can easily do the endian swap.

That's why I used ROTL64 there, cuz it gives natural-looking little-endian, and the left-shift operator apparently doesn't exist.
Comment 44 Luke Kenneth Casson Leighton 2021-12-12 01:08:08 GMT
(In reply to Jacob Lifshay from comment #43)
 
> I'd strongly argue we need little-endian anyway for essentially the same
> reasons as SV integer predicates being in little-endian, if we're ever going
> to have the non-immediate form.

in some fashion or other, even if it's similar to bpermd.

i do wish this stuff wasn't so bleedin hard to get yer head round.

thoughts out loud:

* the lowest bit in e.g. RC has to be the lowest index
* therefore it will be 63-0
* therefore it is indeed ok to have MSB number inversion
* and, actually, that's consistent with everything else
  where we just did a massive XLEN-i substitution
* which i had totally forgotten about

so yes if questioned there exists a justification for keeping
7-idx because it will match with XLEN-id for the nonimmed version

am all caught up now.

ROTL64 just looks messy and unreadable. specifications are supposed
to be clear.
Comment 45 Luke Kenneth Casson Leighton 2021-12-12 01:47:36 GMT
just going over lut.py:

delete these. again they are unnecessary and violate LSP

  29         assert isinstance(input_count, int)
  30         assert isinstance(width, int)

all of these can go

  40         def lut_index(i):
  41             return Signal(input_count, name=f"lut_index_{i}")
  42         self._lut_indexes = [lut_index(i) for i in range(width)]

insert this between 47 and 48


  47         for i in range(self.width):
                 lut_idx = Signal(input_count, name=f"lut_index_{i}")
  48             for j in range(self.input_count):

remove self. in front of lut_idx

also remove self. on front of lut and move into elaborate
where self. is also removed.

  38         self.lut = Signal(2 ** input_count)

the style that you are using is exposing private local signals
unnecessarily.


because this is a public library, there are not enough code-comments.
as with grev.py:

* link to bugreport
* add explicit copyright and link to NLnet and NGI POINTER
* """docstring what the module does"""
* explanatory comments
Comment 46 Jacob Lifshay 2021-12-17 01:23:40 GMT
(In reply to Luke Kenneth Casson Leighton from comment #45)
I simplified the code and added docs.

> also remove self. on front of lut and move into elaborate
> where self. is also removed.
> 
>   38         self.lut = Signal(2 ** input_count)

No, self.lut is an *input*, it would break if I had it private.
Comment 47 Luke Kenneth Casson Leighton 2021-12-17 12:02:02 GMT
(In reply to Jacob Lifshay from comment #46)
> (In reply to Luke Kenneth Casson Leighton from comment #45)
> I simplified the code and added docs.

nice.  it's tidy and consistent. looks great.  ah, just spotted:

* missing copyright notice
* using php-style functions-inside-text (blech!)
* spaces in front of docstrings (i know, it looks untidy, but that's
  the style)
* although saying "see BitwiseLUT docstring" is fine in this case
  (because the API is identical), actually saying *that* the API
  is identical (and why) is missing.

> No, self.lut is an *input*, it would break if I had it private.

that would be why it would need to have been documented :)
Comment 48 Luke Kenneth Casson Leighton 2021-12-17 12:20:33 GMT
oh: and definitely don't use docstrings *under* (after) a line of code
to explain it: put a # comment *before* the code, or at the end of the
line (if there's space to do so).

    self.input
    """ something after-the-fact about input: NO"

    # something about the input which is coming up next....
    self.input # something short about the input
Comment 49 Jacob Lifshay 2021-12-17 16:22:49 GMT
(In reply to Luke Kenneth Casson Leighton from comment #48)
> oh: and definitely don't use docstrings *under* (after) a line of code
> to explain it

But that's the official Python standard for documenting attributes in a way that tools can extract the docstrings:
https://www.python.org/dev/peps/pep-0257/
> String literals occurring immediately after a simple assignment at the top
> level of a module, class, or __init__ method are called "attribute
> docstrings".
Comment 50 Luke Kenneth Casson Leighton 2021-12-17 16:43:17 GMT
(In reply to Jacob Lifshay from comment #49)
> (In reply to Luke Kenneth Casson Leighton from comment #48)
> > oh: and definitely don't use docstrings *under* (after) a line of code
> > to explain it
> 
> But that's the official Python standard for documenting attributes in a way
> that tools can extract the docstrings:
> https://www.python.org/dev/peps/pep-0257/

A: because it's totally counter-intuitive
Q: why is it awful to have the docstring after the attribute it refers to?

> > String literals occurring immediately after a simple assignment at the top
> > level of a module, class, or __init__ method are called "attribute
> > docstrings".

i've never seen this used - ever - in 20 years.  it's great that you're
reading the specs, but some things like this just in practice never get used,
and they'll confuse the hell out people.
Comment 51 Jacob Lifshay 2021-12-17 17:19:21 GMT
(In reply to Luke Kenneth Casson Leighton from comment #50)
> (In reply to Jacob Lifshay from comment #49)
> > (In reply to Luke Kenneth Casson Leighton from comment #48)
> > > String literals occurring immediately after a simple assignment at the top
> > > level of a module, class, or __init__ method are called "attribute
> > > docstrings".
> 
> i've never seen this used - ever - in 20 years.  it's great that you're
> reading the specs, but some things like this just in practice never get used,
> and they'll confuse the hell out people.

They do get used pretty often:
https://repo.or.cz/docutils.git/blob/3c60c830ac4cafaede096539d09c45f51544686a:/docutils/docutils/transforms/frontmatter.py#l382

https://github.com/pdoc3/pdoc/blob/4aa70de2221a34a3003a7e5f52a9b91965f0e359/pdoc/__init__.py#L502

Also, probably most importantly, they're the format that basically *all* tools understand, sphinx/docutils, pdoc3, vscode, etc.
Comment 52 Luke Kenneth Casson Leighton 2021-12-17 23:21:56 GMT
(In reply to Jacob Lifshay from comment #51)

> They do get used pretty often:
> https://repo.or.cz/docutils.git/blob/
> 3c60c830ac4cafaede096539d09c45f51544686a:/docutils/docutils/transforms/
> frontmatter.py#l382
> 
> https://github.com/pdoc3/pdoc/blob/4aa70de2221a34a3003a7e5f52a9b91965f0e359/
> pdoc/__init__.py#L502

blech :)
 
> Also, probably most importantly, they're the format that basically *all*
> tools understand, sphinx/docutils, pdoc3, vscode, etc.

very true.  i also deduced why they must have done post-docstring, because
that's what's done on functions and classes.  the docstring comes directly
after the function declaration or class declaration [it's still "blech" :) ]

i know what it was: the lack of space between them

inputs = ...
"""the inputs"""
outputs = ....
"""the outputs"""

etc. this just confused the hell out of me.  i tried restoring the docstrings
and adding some extra spaces and it just irritated me that there was extra
space, given that it becomes nine (9!!) lines instead of three (3).

this by contrast is compact, clear, and fits in under 80 chars

+        self.inputs = tuple(inp(i) for i in range(input_count)) # inputs
+        self.lut = Signal(2 ** input_count)                     # lookup input
+        self.output = Signal(width)                             # output
Comment 53 Jacob Lifshay 2022-01-06 02:47:24 GMT
I added the stand-alone simulator test. lkcl, please mark this as completed after you check and add some funding, cuz I'll need more soon.
Comment 54 Jacob Lifshay 2022-03-10 00:37:26 GMT
Since the OE getting set when we ask for Rc=1 thing is a bug that I figured out how to work around, I think we should add Rc=1 back to ternlogi, for consistency with OpenPower base instructions and./or./xor./etc.
Comment 55 Jacob Lifshay 2022-03-10 00:40:59 GMT
(In reply to Jacob Lifshay from comment #54)
> Since the OE getting set when we ask for Rc=1 thing is a bug that I figured
> out how to work around, I think we should add Rc=1 back to ternlogi, for
> consistency with OpenPower base instructions and./or./xor./etc.

Additional description:
https://bugs.libre-soc.org/show_bug.cgi?id=765#c0

Work around by adding ternlog to hard-coded list:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder2.py;h=a636fdf5160cbfa680cdbf8ea36bced0304ca7f7;hb=HEAD#l573
Comment 56 Luke Kenneth Casson Leighton 2022-03-10 01:20:38 GMT
(In reply to Jacob Lifshay from comment #54)
> Since the OE getting set when we ask for Rc=1 thing is a bug that I figured
> out how to work around, I think we should add Rc=1 back to ternlogi, for
> consistency with OpenPower base instructions and./or./xor./etc.

nowhere near enough encoding space in opcode 22.

if moved to an alternative major opcode tbere is room.
Comment 57 Jacob Lifshay 2022-03-10 01:41:11 GMT
(In reply to Luke Kenneth Casson Leighton from comment #56)
> (In reply to Jacob Lifshay from comment #54)
> > Since the OE getting set when we ask for Rc=1 thing is a bug that I figured
> > out how to work around, I think we should add Rc=1 back to ternlogi, for
> > consistency with OpenPower base instructions and./or./xor./etc.
> 
> nowhere near enough encoding space in opcode 22.
> 
> if moved to an alternative major opcode tbere is room.

Ok, I'll add a note to the bitmanip page on the wiki.
Comment 58 Luke Kenneth Casson Leighton 2022-03-12 11:27:44 GMT
sorry jacob i moved OP_TERNLOG to its own
major op (to be chosen) good news is, Rc=1
is now possible. GT LE actually have partial
meaning there which is nice.

i am putting sone thought into the stranger
ternlog ops as well, the dynamic one in particular,
i would like to see a way to do FPGA emulation
without static immediates (the 8bit lut3 be a register)

also the extra space allows 3in 1out for CRs.
BT BA BB BC
Comment 59 Jacob Lifshay 2022-05-03 09:40:18 BST
(In reply to Luke Kenneth Casson Leighton from comment #58)
> sorry jacob i moved OP_TERNLOG to its own
> major op (to be chosen)

lets pick major 5 for now, since that means I have to change less rn.

> good news is, Rc=1
> is now possible. 

Ok, added Rc=1 to the code.

https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=79b540bb5d8f8ba139f56ff1e8010bde24df3d22

I also had to clean up where you accidentally used HTML comments instead of python comments in the pseudocode, which caused compilation to fail (evidently you forgot to try compiling it...)

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4e181c8e868230e930292df33f9985c3bb78e30c

Afaict this is complete now, Luke, can you allocate some budget to this and mark this bug resolved, so I can get paid soon?
Comment 60 Luke Kenneth Casson Leighton 2022-05-03 09:46:59 BST
(In reply to Jacob Lifshay from comment #59)
> (In reply to Luke Kenneth Casson Leighton from comment #58)
> > sorry jacob i moved OP_TERNLOG to its own
> > major op (to be chosen)
> 
> lets pick major 5 for now, since that means I have to change less rn.

ack.
 
> > good news is, Rc=1
> > is now possible. 
> 
> Ok, added Rc=1 to the code.
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commit;
> h=79b540bb5d8f8ba139f56ff1e8010bde24df3d22

excellent.
 
> I also had to clean up where you accidentally used HTML comments instead of
> python comments in the pseudocode,

no, i did not make that mistake. HTML comments as long as they are on
one line are intentionally supported.


>  which caused compilation to fail
> (evidently you forgot to try compiling it...)

yes, i expected it to work.

> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=4e181c8e868230e930292df33f9985c3bb78e30c

those comments aren't intended to go into the pseudocode,
they are as intended.  HTML-style comments *are* supported
(and desirable) and the parser should be fixed. can you
please revert that.

> Afaict this is complete now, Luke, can you allocate some budget to this and
> mark this bug resolved, so I can get paid soon?

that's down to michiel signing off the MoU.
Comment 61 Jacob Lifshay 2022-05-03 10:47:13 BST
(In reply to Luke Kenneth Casson Leighton from comment #60)
> (In reply to Jacob Lifshay from comment #59)
> > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> > h=4e181c8e868230e930292df33f9985c3bb78e30c
> 
> those comments aren't intended to go into the pseudocode,
> they are as intended.  HTML-style comments *are* supported
> (and desirable) and the parser should be fixed. can you
> please revert that.

sure, once the parser is fixed. until then, I'm going to leave that in so I can actually work on the pseudo-code for clmul and friends, which is what I'm planning on working on next.

I'll create a bug report for fixing the parser.
Comment 62 Luke Kenneth Casson Leighton 2022-05-17 16:02:58 BST
https://libre-soc.org/irclog/%23libre-soc.2022-05-17.log.html#t2022-05-17T00:41:29

> programmerjake	oh, lkcl, reading through your modifications to
> bitmanip, crbinlog imho should take the look-up table from a GPR rather
> than a CR. this will make it much easier to generate.	00:41
> programmerjake	much easier to use in user code because CR's are
> generally conditions rather than arbitrary binary values.	00:41

unfortunately there's nowhere near enough space in the opcode

| 0.5|6.8 | 9.11|12.14|15.17|18.22|23...30  |31|
| -- | -- | --- | --- | --- |-----| --------|--|
| NN | BT | BA  | BB  | BC  |m0-m2|00101110 |m3|

* the 4-bit mask selects which CR Field gets updated
* BC contains the LUT2 which happens to be 4-bit
* BT BA BB BC are a total of 12 bits.

i'm nervous about using some of the 5-bit XO table
(the one above the 10-bit XO which this instruction
 is already intruding on)

but more than that, the crweird instructions allow for more powerful
nibble-based transfer between GPR and CR Fields, including merging
and also that special mode where QTY 2of CR Fields can be put into
the 8 LSBs of a GPR.

i kinda like the idea, though, let me take a look / reminder of what
crweirds do.