Bug 671 - convert spec pseudocode to use XLEN width
Summary: convert spec pseudocode to use XLEN width
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: dmitry.selyutin
URL:
Depends on: 679 680 681 683 684 682 685
Blocks: 663
  Show dependency treegraph
 
Reported: 2021-08-21 12:45 BST by Luke Kenneth Casson Leighton
Modified: 2021-09-29 09:17 BST (History)
4 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Wishbone
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-08-21 12:45:20 BST
all pseudocode needs to use XLEN width instead of
hard-coded values 64 as width.

due to MSB0 numbering GREAT CARE NEEDS TO BE TAKEN

    RS[56:63]

becomes

    RS[XLEN-8:XLEN-1]

things that operate on the entire number such as Logical XOR
or "add" instructions DO NOT need conversion.

    Pseudo-code:

    RT <- (RA|0) + EXTS(SI)

does NOT need conversion.

    prod[0:127] <- MULS((RA), EXTS(SI))
    RT <- prod[64:127]

needs conversion to:

    prod[0:XLEN*2] <- MULS(.....)
    RT <- prod[XLEN:XLEN*2-1]

use the individual unit tests, either those in openpower-isa
alone or those which run against HDL.

See https://libre-soc.org/irclog/%23libre-soc.2021-08-30.log.html#t2021-08-30T13:54:12

there are pipeline-based tests specifically for one pipeline
which correspond near 100% with the corresponding markdown
file.
Comment 1 Luke Kenneth Casson Leighton 2021-08-21 12:49:46 BST
first priority, add an XLEN integer constant (not a SelectableInt)
to the namespace in ISACaller.

commit 89622cf45a5b6c99e23d6d8865635a84ee36d909 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Aug 21 12:52:13 2021 +0100

    set XLEN=64 in ISACaller
Comment 2 Luke Kenneth Casson Leighton 2021-08-22 16:08:36 BST
annoyingly i have discovered that the pseudocode parser, which is based on GardenSnake.py, does not do RS[EXPR:EXPR] only RS[NUM:NUM]

this should be a relatively easy fix in parser.py i will investigate
Comment 3 Luke Kenneth Casson Leighton 2021-08-22 16:15:12 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/parser.py;h=4f11f6f14c996caf50ddb5a83e99c2a9e1e707c7;hb=refs/heads/master#l826

RS[0:XLEN-1] should match perfectly well there, needs investigating.
Comment 4 Luke Kenneth Casson Leighton 2021-08-22 17:00:44 BST
works perfectly:

     RA[XLEN:XLEN - 1]

fails with a syntax error:

     RA[XLEN:XLEN-1]

because it's interpreted as a number - "-1"

which matches against *two* tokens:

    {VARIABLE}{NUMBER}

*not*

    {VARIABLE}{MINUSOPERATOR}{NUMBER}

doh that's very annoying, and i'm not sure how to fix it.
Comment 5 dmitry.selyutin 2021-08-22 18:06:30 BST
I managed to "teach" parser and and lexer XLEN, but I doubt this is how we want it to be handled. I thought that XLEN should arrive from the instruction encoding.

diff --git a/src/openpower/decoder/pseudo/lexer.py b/src/openpower/decoder/pseudo/lexer.py
index aad11a0..ce79e6a 100644
--- a/src/openpower/decoder/pseudo/lexer.py
+++ b/src/openpower/decoder/pseudo/lexer.py
@@ -326,6 +326,7 @@ class PowerLexer:
         'INDENT',
         'DEDENT',
         'ENDMARKER',
+        'XLEN',
     )
 
     # Build the lexer
@@ -398,6 +399,11 @@ class PowerLexer:
         "default": "DEFAULT",
     }
 
+    def t_XLEN(self, t):
+        r'XLEN'
+        t.type = self.RESERVED.get(t.value, "XLEN")
+        return t
+
     def t_NAME(self, t):
         r'[a-zA-Z_][a-zA-Z0-9_]*'
         t.type = self.RESERVED.get(t.value, "NAME")
diff --git a/src/openpower/decoder/pseudo/parser.py b/src/openpower/decoder/pseudo/parser.py
index 4f11f6f..61376ab 100644
--- a/src/openpower/decoder/pseudo/parser.py
+++ b/src/openpower/decoder/pseudo/parser.py
@@ -721,6 +721,10 @@ class PowerParser:
                 if name in regs + fregs:
                     self.read_regs.add(name)
 
+    def p_XLEN(self, p):
+        """atom : XLEN"""
+        p[0] = ast.Constant(64)
+
     def p_atom_name(self, p):
         """atom : NAME"""
         name = p[1]
Comment 6 Luke Kenneth Casson Leighton 2021-08-22 18:25:53 BST
too low level.  already taken care of, see comment #1

what does need fixing is "XLEN-1" or in fact any expression "z-1" to be recognised as {EXPR} {MINUS} {NUMBER}

and that's probably best done by stopping the lexer.py from recognising "minus" in t_NUMBER's regexp, and teaching the parser about a new BNF number "MINUS NUMBER" which returns t[2] after first inverting the Constant value in t[2]

then have to worry about the ordering, because if the parser encounters that before it encounters extpressions we end up with exactly the same issue, sigh: {MINUS} followed by {NUMBER} gets eaten and replaced by ast.{-Constant} is exactly what we want to avoid, give {EXPR} {MINUS} {NUMBER} higher priority.

i *think* that's achieved by getting the order right that BNF functions are added into parser.py.

i think.
Comment 7 dmitry.selyutin 2021-08-22 18:31:14 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> doh that's very annoying, and i'm not sure how to fix it.

I'm by no means an expert in PLY, but we have the following there:

r"""[-]?(\d+(\.\d*)?|\.\d+)([eE][-+]? \d+)?"""

The sign is considered to be a part of the token. If I drop [-]?, it gets parsed.
Comment 8 dmitry.selyutin 2021-08-22 18:44:04 BST
(In reply to dmitry.selyutin from comment #7)
> The sign is considered to be a part of the token. If I drop [-]?, it gets
> parsed.

RA[XLEN-XLEN:XLEN-1] = RB[XLEN-XLEN:XLEN-1]

is converted to

eq(RA[XLEN - XLEN:XLEN - 1 + 1], RB[XLEN - XLEN:XLEN - 1 + 1])

Also, I've checked that we indeed don't need XLEN to be known, it's a simple t_NAME.
Comment 9 Luke Kenneth Casson Leighton 2021-08-22 19:04:22 BST
that's two bits of good news, can you re-run pyfnwriter and a full pywriter then lots of test_caller_*.py (all of them) before committing?

once you commit that after tests pass i will run the soc test_issuer ones, and the ffmpeg media ones, to make sure all the existing pseudocode works and hasn't been damaged.

amazed that was so simple.
Comment 10 dmitry.selyutin 2021-08-22 19:24:58 BST
(In reply to Luke Kenneth Casson Leighton from comment #9)
> that's two bits of good news, can you re-run pyfnwriter and a full pywriter
> then lots of test_caller_*.py (all of them) before committing?

Sure, I'll do it! Anyway, if it runs successfully, I think we _will_ break non-trivial cases like `(a - -b)` (I don't think these are handled in parser now). I think those are not used now, but anyway.
Comment 11 dmitry.selyutin 2021-08-22 19:34:13 BST
(In reply to dmitry.selyutin from comment #10)
> I think we _will_ break non-trivial cases like `(a - -b)` (I don't think these are handled in parser now).

Yes, we do break them, and even simpler ones, like simple negative numbers. Anyway, I still think this should be done at parser level. I'm checking it.
Comment 12 Luke Kenneth Casson Leighton 2021-08-22 19:52:55 BST
(In reply to dmitry.selyutin from comment #11)
> (In reply to dmitry.selyutin from comment #10)
> > I think we _will_ break non-trivial cases like `(a - -b)` (I don't think these are handled in parser now).
> 
> Yes, we do break them, and even simpler ones, like simple negative numbers.


yyep this is what i expected, can be fixed with a BNF case in
parser.py comment #6
Comment 13 Jacob Lifshay 2021-08-22 19:57:42 BST
usually unary minus is handled as a separate operator...

See JavaScript's syntax as an example:
https://262.ecma-international.org/#sec-unary-operators

I can help out with this bug tomorrow, busy today.
Comment 14 Luke Kenneth Casson Leighton 2021-08-22 20:40:25 BST
(In reply to Jacob Lifshay from comment #13)
> usually unary minus is handled as a separate operator...
> 
> See JavaScript's syntax as an example:
> https://262.ecma-international.org/#sec-unary-operators
> 
> I can help out with this bug tomorrow, busy today.

ohh, a separate item in the BNF?

so, atom would be:

expr |
number |
negnumber |
...

or.. oh wait, no i got it.  like:

unary =
    ~x
  | !x

etc, you would also have:

    -x

in that list.

ok yeah, then that would also work for "EXPR - - NUMBER"

it also i think solves the case of not matching against EXPR NUMBER
Comment 15 Jacob Lifshay 2021-08-22 20:43:05 BST
(In reply to Luke Kenneth Casson Leighton from comment #14)
> (In reply to Jacob Lifshay from comment #13)
> > usually unary minus is handled as a separate operator...

> unary =
>     ~x
>   | !x
> 
> etc, you would also have:
> 
>     -x
> 
> in that list.

yes, exactly!
Comment 16 dmitry.selyutin 2021-08-22 20:54:36 BST
Actually the issue is simpler. We have `comparison MINUS`, which should've been `MINUS comparison`. I have no idea why it's called comparison, since it's more close to expr, but anyway. The only test that fails among $(find . -name "test_*caller*") is ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this failure related to the changes, so I pushed them.
Comment 17 Luke Kenneth Casson Leighton 2021-08-22 21:05:29 BST
(In reply to Jacob Lifshay from comment #15)
> yes, exactly!

okaaay, cool.

rrright. well, ngggh the (very basic, initially-just-a-ply-example)
parser was heavily simplified from the original BNF used by python 2
(which would have been what it was based on, back in... 2006).

it therefore looks like unary ops are detected in the comparison
BNF:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/parser.py;h=4f11f6f14c996caf50ddb5a83e99c2a9e1e707c7;hb=HEAD#l638

i have nooo idea if adding "MINUS comparison" to that list would work
or not.
Comment 18 Jacob Lifshay 2021-08-22 21:14:10 BST
(In reply to dmitry.selyutin from comment #16)
> Actually the issue is simpler. We have `comparison MINUS`, which should've
> been `MINUS comparison`. I have no idea why it's called comparison, since
> it's more close to expr, but anyway. The only test that fails among $(find .
> -name "test_*caller*") is
> ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this
> failure related to the changes, so I pushed them.

I think the unary minus should be detected in the `unary` production, otherwise you have the issue that `-a < b` is parsed as `-(a < b)`, which is not desired.
Comment 19 Luke Kenneth Casson Leighton 2021-08-22 21:21:46 BST
(In reply to dmitry.selyutin from comment #16)

> ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this
> failure related to the changes, so I pushed them.

ok i'll check out before that (commit 89622cf45a5b6c) and run it.
it's probably a unit test that was already failing.
Comment 20 Luke Kenneth Casson Leighton 2021-08-22 21:55:32 BST
(In reply to Luke Kenneth Casson Leighton from comment #19)
> (In reply to dmitry.selyutin from comment #16)
> 
> > ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this
> > failure related to the changes, so I pushed them.
> 
> ok i'll check out before that (commit 89622cf45a5b6c) and run it.
> it's probably a unit test that was already failing.

yep, already failing. Not Your Problem :)  mine to deal with.
Comment 21 Luke Kenneth Casson Leighton 2021-08-25 13:18:24 BST
https://libre-soc.org/irclog/%23libre-soc.2021-08-25.log.html#t2021-08-25T07:54:08

discussion about adding new "helper" routines.  if this was not a
specification that required a Formal Process for submitting (and
voting on) by an independent Foundation (OPF) i would say "yes great
idea" immediately to creating as many "helper" routines as we like
and need.

however as a *specification* the practice of creating references
that need the reader to have read a second page (somewhere buried
in a 1,300 page document) where they have no idea how to even find
the information that they need, this makes the specification hostile
to read.

bottom line, it's not just "pseudocode", we cannot just think as
programmers (only), we have to forward-think:

1) "how will this be received by the OpenPOWER Foundation ISA Working Group"
2) "how will users with near-zero-prior knowledge react to reading this"
Comment 22 dmitry.selyutin 2021-08-25 16:13:04 BST
The progress so far is available in xlen branch. I've processed parts of fixedarith and fixedlogical; the latter needed some changes on the parser side (we haven't supported expressions in loops).
I'm waiting for replies wrt test_pipe_caller_long.py, since it basically blocks me in testing mul instructions (and I suspect most of the rest).
Comment 23 Luke Kenneth Casson Leighton 2021-08-25 17:17:31 BST
(In reply to dmitry.selyutin from comment #22)
> The progress so far is available in xlen branch. 

i hate branches, you know that, right? :)

> I've processed parts of
> fixedarith and fixedlogical; the latter needed some changes on the parser
> side (we haven't supported expressions in loops).

ah that makes sense.

what you can show "works" just drop it into master branch.

> I'm waiting for replies wrt test_pipe_caller_long.py, since it basically
> blocks me in testing mul instructions (and I suspect most of the rest).

jacob can help with that.  jacob you need to adjust the dev-env-setup acript so that it can be run "straight" (and not part of an external non-libresoc redource)

running test_issuer.py (from soc) is an important one, it compares against the
HDL (which, obviously, has not changed)
Comment 24 Jacob Lifshay 2021-08-25 17:24:54 BST
(In reply to Luke Kenneth Casson Leighton from comment #23)
> jacob you need to adjust the dev-env-setup acript
> so that it can be run "straight" (and not part of an external non-libresoc
> redource)

I already adjusted dev-env-setup
https://git.libre-soc.org/?p=dev-env-setup.git;a=history;f=pia-install;h=5e8ee78d1087e7b9e6525d200468bb39f9854cd7;hb=6e6203c9017e0471070fcd1f606b37f87d723ac6
Comment 25 Luke Kenneth Casson Leighton 2021-08-25 17:30:02 BST
(In reply to Jacob Lifshay from comment #24)
>
> I already adjusted dev-env-setup
> https://git.libre-soc.org/?p=dev-env-setup.git;a=history;f=pia-install;
> h=5e8ee78d1087e7b9e6525d200468bb39f9854cd7;
> hb=6e6203c9017e0471070fcd1f606b37f87d723ac6

fantastic, dmitry can you run that and check it
Comment 26 Luke Kenneth Casson Leighton 2021-08-25 17:36:36 BST
for i in 0 to 7
   index <- (RS)[8*i:8*i+7]

in bpermd, RS may be 32 16 or 8 bit. we need to decide
what to do here.

bpermd is one of those unusual ones, not so straightforward.

see bug #680
Comment 28 dmitry.selyutin 2021-08-30 09:21:23 BST
Updated the branch with some of div instructions. The "pipe" tests are fragile and there are 900+ tests failed even on master (perhaps this means that I did something wrong upon establishing pia and soc dependencies). Any help is appreciated.
I'll continue updating the instructions; once I get to testing, I'll resolve issues, if any, cumulatively, then I'll rebase the branch atop of master.
Comment 29 dmitry.selyutin 2021-08-30 09:22:51 BST
(In reply to dmitry.selyutin from comment #28)
> once I get to testing

For now, I only check that pywriter can generate the Python code.
Comment 30 dmitry.selyutin 2021-08-30 09:26:55 BST
Also, a question on prtyd instruction. In specs we have the following: "s ⊕ (RS)i%8+7". In fixedlogical.mdwn, we have "s ^ (RS)[i*8+7]". Is it correct?
Comment 31 Luke Kenneth Casson Leighton 2021-08-30 12:03:20 BST
(In reply to dmitry.selyutin from comment #30)
> Also, a question on prtyd instruction. In specs we have the following: "s ⊕
> (RS)i%8+7". In fixedlogical.mdwn, we have "s ^ (RS)[i*8+7]". Is it correct?

look at section 1.3.2 the symbol is defined there.
we obviously had to use the standard ASCII symbol
for XOR.

however you are probably referring to the use of
"%" not "*" which looks like a bug in the v3.0B
specification.

it may have already been raised.
Comment 32 Luke Kenneth Casson Leighton 2021-08-30 12:11:16 BST
(In reply to dmitry.selyutin from comment #28)
> Updated the branch with some of div instructions. The "pipe" tests are
> fragile and there are 900+ tests failed even on master (perhaps this means
> that I did something wrong upon establishing pia and soc dependencies). Any
> help is appreciated.

jacob can you please help Dmitry by running the tests and investigating
the software library that you are responsible for, it is holding Dmitry
up.

> I'll continue updating the instructions; once I get to testing, I'll resolve
> issues, if any, cumulatively, then I'll rebase the branch atop of master.

thid is precisely why i do not like branches.
it would be much much better for the work to be done
only in master branch by running individual tests
and on confirmation of one instruction being functional
to push it, then move on to the next one.

even trying to do that, now, due to the large number of commits
in a branch, is going to be a massive headache.

divergence of branches is a huge operational inconvenience.

we now have to do an all-or-nothing approach and it is being
held up by jacob.

jacob csn you please pay attention.
Comment 33 Luke Kenneth Casson Leighton 2021-08-30 12:17:08 BST
https://libre-soc.org/irclog/%23libre-soc.2021-08-30.log.html#t2021-08-30T09:29:38

changing the element width to 32/16/8 on fields that are only
4 bit in width in the first place makes no sense.  so anything
that goes into CR leave it alone. we'll have to deal with that
separately (in a different way), but element-width overrides
will not be done on CRs.
Comment 34 Luke Kenneth Casson Leighton 2021-08-30 12:50:52 BST
a new test in power_pseudo.py

     [0]*16]

produces this:

Module(body=[Expr(value=Call(func=Name(id='concat', ctx=Load()), args=[Constant(value=0)], keywords=[keyword(arg='repeat', value=Constant(value=16))]))])
astor dump
Module(
    body=[
        Expr(
            value=Call(func=Name(id='concat'),
                args=[Constant(value=0)],
                keywords=[keyword(arg='repeat', value=Constant(value=16))]))])
to source
concat(0, repeat=16)


where the following:

     [0]*(XLEN-16)

instead produces this:

Module(body=[BinOp(left=List(elts=[Constant(value=0)], ctx=Load()), op=Mult(), right=BinOp(left=Name(id='XLEN', ctx=Load()), op=Sub(), right=Constant(value=16)))])
astor dump
Module(
    body=[
        BinOp(left=List(elts=[Constant(value=0)]),
            op=Mult,
            right=BinOp(left=Name(id='XLEN'), op=Sub, right=Constant(value=16)))])
to source
([0] * (XLEN - 16))


this is a bug.

the source code generated should have been:

concat(0, repeat=XLEN-16)
Comment 35 Luke Kenneth Casson Leighton 2021-08-30 12:54:29 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/parser.py;h=03d0fd5e6aca7128ecfa2d42d36be84e0ca0d353;hb=b3061c38ef984418e266c6f74b366a7e953840a5#l179

identify selectable int multiply pattern is not able to recognise

    [0] * (expression)

only a Constant.
Comment 36 Luke Kenneth Casson Leighton 2021-08-30 13:16:27 BST
(In reply to Luke Kenneth Casson Leighton from comment #35)
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/
> decoder/pseudo/parser.py;h=03d0fd5e6aca7128ecfa2d42d36be84e0ca0d353;
> hb=b3061c38ef984418e266c6f74b366a7e953840a5#l179
> 
> identify selectable int multiply pattern is not able to recognise
> 
>     [0] * (expression)
> 
> only a Constant.

sorted. 

commit 73d15d3c4c5550ef85fbd1ea68b9afca2500d0bd
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Aug 30 13:05:59 2021 +0100

    also add pattern-recognition for just
    
         [0] * XLEN
    
    have to keep a close eye on this

commit 39a0dc7397c592b4b53631b2284d142fd20ac485
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Aug 30 13:03:53 2021 +0100

    fix pattern-match for an expression such as "XLEN-16" when looking
    for concat substitutions
    
    [item] * NUMBER was replaced with
    concat(item, repeat=NUMBER)
    
    but [item] * (XLEN-16) was not matching
    
    by adding a HACK which spots ast.Binop then [item]*(XLEN-16) can be
    recognised
Comment 37 Luke Kenneth Casson Leighton 2021-08-30 13:22:36 BST
reviewing this one:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6bc1419dbcd0ee84cd28c8867eaf7dae7954c4e5

-    n <- 32
-    do while n < 64
+    n <- (XLEN/2)
+    do while n < XLEN
        if (RS)[n] = 1 then
            leave
        n <- n + 1
-    RA <- n - 32
+    RA <- n - (XLEN/2)

 258 Pseudo-code:
 259 
 260     n <- (XLEN/2)
 261     do while n < XLEN
 262        if (RS)[n] = 1 then
 263            leave
 264        n <- n + 1
 265     RA <- n - (XLEN/2)
 266 

i *think* that's ok.  the concept of a "word" is effectively
redefined to "half-of-the-XLEN-width" which is something that's
being done for FP "single" numbers.

(any operation fmuls for example we are DEFINING as
"actually the "s" means "do the FP op at half of the elwidth"")

it does however get very weird for 8-bit.  plus, we need to
actually add *full* width cntlzd

noted in bug #678
Comment 38 Luke Kenneth Casson Leighton 2021-08-30 13:49:11 BST
in the new RANGE helper, the inverted direction was missing

-        if start.value > end.value:  # start greater than end, must go -ve
-            # auto-subtract-one (sigh) due to python range
-            end = ast.BinOp(p[6], ast.Add(), ast.Constant(-1))
-            arange = [start, end, ast.Constant(-1)]

this needed to be converted to:

     return range(start, end, -1)

the inverted countdown (-1) was missing

this was found by running:

     python3 fu/cr/test/test_pipe_caller.py  >& /tmp/f

where the pseudocode for mtocrf in sprset.mwdn is:

* mtocrf FXM,RS

Pseudo-code:

    n <- 7
    do i = 7 to 0         <---- needs to be range(7,-1,-1) not range(7,-1)
      if FXM[i] = 1 then
        n <- i
Comment 39 Luke Kenneth Casson Leighton 2021-08-30 14:02:11 BST
(In reply to Luke Kenneth Casson Leighton from comment #31)
> (In reply to dmitry.selyutin from comment #30)
> > Also, a question on prtyd instruction. In specs we have the following: "s ⊕
> > (RS)i%8+7". In fixedlogical.mdwn, we have "s ^ (RS)[i*8+7]". Is it correct?
> 
> look at section 1.3.2 the symbol is defined there.
> we obviously had to use the standard ASCII symbol
> for XOR.
> 
> however you are probably referring to the use of
> "%" not "*" which looks like a bug in the v3.0B
> specification.
> 
> it may have already been raised.

yes it was

http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-May/000046.html

the spec is wrong.
Comment 40 Luke Kenneth Casson Leighton 2021-08-30 14:09:31 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ff49e2ca41d621532795f960c715d7b7e9c5379c

 332 * popcntw RA, RS
 333 
 334 Pseudo-code:
 335 
 336     do i = 0 to ((XLEN/32)-1)
 337        n <-  0
 338        do j = 0 to 31
 339           if (RS)[(i*32)+j] = 1 then
 340               n <- n+1
 341        RA[(i*32):(i*32)+31] <- n

this one doesn't look right.  my intuition is that it should be
redefined to be "counting halves of the input".

more like:

 336     do i = 0 to 1
 337        n <-  0
 338        do j = 0 to (XLEN/2)-1
 339           if (RS)[(i*(XLEN/2))+j] = 1 then

down at elwidth=8 it gets particularly hairy / interesting because
the counting (both of the src RS into RA) would go into nibbles.
but, actually, i think even that would work.

not cherry-picking this one over.

(edit: raised as bug #679 so as to keep track)
Comment 41 Luke Kenneth Casson Leighton 2021-08-30 14:19:56 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ef0109fb8e9d6e9f941cfa90bff389d522927c8e

looks good.  extsw.  weird, but good.  redefines the concept
behind extsw as "sign-extend half of the elwidth" rather
than sign-extend from a fixed hard-coded quantity of 32"
Comment 42 Luke Kenneth Casson Leighton 2021-08-30 14:24:00 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=a9670992d6c9348d7f5c6d3d657a942a0b811f8e

-    RA <- EXTZ64(n)
+    RA <- EXTZ(n)

aaahno.

it would be better to over-ride the behaviour of EXTZ64 or at least
not make that kind of "functional" change, right now.

not cherry-picking this one (cnttzd)  adding to bug #679
Comment 43 Luke Kenneth Casson Leighton 2021-08-30 16:07:42 BST
moving on to fixedarith (the mul cases)

found that auto-creation (auto-assignment) was stopping on
expressions:

    prod[0:XLEN-1] <- something

the variable "prod" does not exist so has to be auto-created
however the detection was ONLY capable of recognising

   prod[0:31] <- something

fixed: quite a big intrusive change so re-running test_issuer.py
(for the 5th time today)

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=69c1f44388226d1fde2e6aa70ce2fd5bc660bd87
Comment 44 Luke Kenneth Casson Leighton 2021-08-30 17:26:02 BST
just going through cherry-picking from xlen branch (bleh) spotted
that mulli doesn't have a short test, so made one

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5e7b5ce86ded1f7686e4fa45538df0f0d7a5c5c2
Comment 45 Luke Kenneth Casson Leighton 2021-08-30 17:37:08 BST
created bug #681 there are no madd* unit tests in ISACaller
(stand-alone for openpower-isa) so i have cherry-picked
the madd* xlen branch code over for now, untested.
this is fine because it can't do any "damage", with no
actual implementation.
Comment 46 dmitry.selyutin 2021-08-30 20:08:25 BST
(In reply to Luke Kenneth Casson Leighton from comment #40)
> this one doesn't look right.  my intuition is that it should be
> redefined to be "counting halves of the input".

The confusion is raised by these words from the spec:

    A count of the number of one bits in each word of regis-
    ter RS is placed into the corresponding word of register
    RA. This number ranges from 0 to 32, inclusive.

I assumed "word" implies 32-bit. However, I think your rationale fits better. Let me re-check these.
Comment 47 dmitry.selyutin 2021-08-30 20:16:09 BST
(In reply to Luke Kenneth Casson Leighton from comment #39)
> (In reply to Luke Kenneth Casson Leighton from comment #31)
> > (In reply to dmitry.selyutin from comment #30)
> > however you are probably referring to the use of
> > "%" not "*" which looks like a bug in the v3.0B
> > specification.
> > 
> > it may have already been raised.
> 
> the spec is wrong.


Yep, % is the thing I meant. The issue is still present at revision C.
Comment 48 Luke Kenneth Casson Leighton 2021-08-30 20:17:41 BST
(In reply to dmitry.selyutin from comment #46)

> The confusion is raised by these words from the spec:
> 
>     A count of the number of one bits in each word of regis-
>     ter RS is placed into the corresponding word of register
>     RA. This number ranges from 0 to 32, inclusive.
> 
> I assumed "word" implies 32-bit. However, I think your rationale fits
> better. Let me re-check these.

with elwidth overrides we are wandering into a partiticularly important
strategic part of SVP64, which is *not* approved yet by the OPF ISA WG
and will need to be submitted via an official process.

what we are doing - right now - is actually *defining* how SVP64 shall
fit on top of these Power ISA scalar v3 0B operations.

i.e. it is *not* up to the creators of the Power ISA v3.0B
to say how the combination of Draft SVP64 shall behave: that is down
to us.

thus it is a slightly weird situation in which we have to think,
"64 bit versions of these instructions, must be v3.0B compliant
without a shadow of doubt, but 32 16 and 8 bit versions,
those are our responsibility to think through and define...
... oh and then submit to the OPF ISA WG when they are ready"

consequently, the actual wording of the v3.0B scalar spec is *not*
helpful for defining what shall happen at 32, 16, or 8 bit: it is
only strictly helpful at the full 64 bit
Comment 49 dmitry.selyutin 2021-08-30 20:40:39 BST
(In reply to Luke Kenneth Casson Leighton from comment #40)
> but, actually, i think even that would work.

This seems to do the trick:

   do i = 0 to 1
       n <-  0
       do j = 0 to ((XLEN/2)-1)
          if (RS)[(i*(XLEN/2))+j] = 1 then
              n <- n+1
       RA[(i*(XLEN/2)):(i*(XLEN/2))+((XLEN/2)-1)] <- n

...the last line looks particularly impressive.
Comment 50 Luke Kenneth Casson Leighton 2021-08-30 22:54:40 BST
(In reply to dmitry.selyutin from comment #49)
> (In reply to Luke Kenneth Casson Leighton from comment #40)
> > but, actually, i think even that would work.
> 
> This seems to do the trick:
> 
>    do i = 0 to 1
>        n <-  0
>        do j = 0 to ((XLEN/2)-1)
>           if (RS)[(i*(XLEN/2))+j] = 1 then
>               n <- n+1
>        RA[(i*(XLEN/2)):(i*(XLEN/2))+((XLEN/2)-1)] <- n
> 
> ...the last line looks particularly impressive.

much joy. let's write a spec that wins an obfuscation contest,
double bonus points.

much as i am tempted by that possibility,
can i suggest, a temp var or two:

    e = XLEN/2-1
    do i = 0 to 1
        s = i*XLEN/2
        n <- 0
        do j = 0 to e
           if (RS)[s+j] = 1 then
               n <- n+1
        RA[s:s+e] <- n

or other mercifully short temp var names

(edit: done)
commit cc990388b1f29c87f4d7ff53edbef9d34312b6f8 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Aug 31 12:19:49 2021 +0100

    adjusted popcntw to simplify by using temp vars
    https://bugs.libre-soc.org/show_bug.cgi?id=671#c50
    e <- (XLEN/2)-1
    s <- i*XLEN/2
Comment 51 Luke Kenneth Casson Leighton 2021-08-31 00:53:59 BST
-    EA <- b + EXTS(D)
-    RT <- [0]*56 || MEM(EA, 1)
+    EA <- b + EXTSREG(D)
+    RT <- ([0] * (XLEN-8)) || MEM(EA, 1)

nope, sorry dmitry, i said we musn't add new
helper functions, unless there is an absolute
critical need that there is absolutely no other
way.

in this case EXTS already does the job due to
already having been designed to do "dynamic"
bitlength (setting bits to 256).
Comment 52 Luke Kenneth Casson Leighton 2021-08-31 01:33:39 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=fc68ee12c54f7f096a2631a9254db30fe4cde4e8

again, this one, it's important not to do changes to use
an undocumented arbitrary helper function

the idea here us, this pseudocode ACTUALLY will be submitted
as ACTUAL v3.N specification with an OFFICIAL Request For Change
to the OpenPOWER ISA.

therefore all and any "helper" routines have to have a full audit,
review, full justification, documentation, explanation, and finally
submission to the ISA WG.

i.e. a total pain in the ass.

if we can simply overload EXTS64 via operator-overloading
then when the pseudocode is submitted *without* EXTSREG
the *reviewer* says, "this is meaningless please change it"
*now* we have justification.

in addition i said we are not doing width chsnges yet: width
is currently hardcoded to 64.

the *second* phase once hardcoded numbers *ONLY* have been replaced
by XLEN will be to look at functions and try XLEN=32/16/8

one incremental step at a time, one task at a time.

the larger the task the more critically important it is to ONLY
strictly do that one regular task.
Comment 53 Jacob Lifshay 2021-08-31 07:34:48 BST
(In reply to Luke Kenneth Casson Leighton from comment #32)
> (In reply to dmitry.selyutin from comment #28)
> > Updated the branch with some of div instructions. The "pipe" tests are
> > fragile and there are 900+ tests failed even on master (perhaps this means
> > that I did something wrong upon establishing pia and soc dependencies). Any
> > help is appreciated.
> 
> jacob can you please help Dmitry by running the tests and investigating
> the software library that you are responsible for, it is holding Dmitry
> up.

I tried:
https://libre-soc.org/irclog/%23libre-soc.2021-08-31.log.html#t2021-08-31T05:48:58
Comment 54 Luke Kenneth Casson Leighton 2021-08-31 12:25:08 BST
(In reply to Jacob Lifshay from comment #53)

> I tried:
> https://libre-soc.org/irclog/%23libre-soc.2021-08-31.log.html#t2021-08-31T05:
> 48:58

brilliant, thanks jacob.

ghostmansd: fixedload.mdwn reviewed and cherry-picked over.

am finding that although it's tedious as hell the xlen branch is working
because the cherry-picking forces me to read each individual patch (and run
unit tests to confirm it).

if you can let me do the cherry-picking, that will make me do the
testing.
Comment 55 Luke Kenneth Casson Leighton 2021-08-31 13:06:21 BST
all the div/mod patches are cherry-picked as well, unit tests pass.

4 hours ago	Dmitry Selyutin	fixedarith: switch modsd to XLEN	
4 hours ago	Dmitry Selyutin	fixedarith: switch divdeX to XLEN	
4 hours ago	Dmitry Selyutin	fixedarith: switch divdX to XLEN	
4 hours ago	Dmitry Selyutin	fixedarith: switch modsw to XLEN	
4 hours ago	Dmitry Selyutin	fixedarith: switch divweuX to XLEN	
4 hours ago	Dmitry Selyutin	fixedarith: switch divweX to XLEN	
4 hours ago	Dmitry Selyutin	fixedarith: switch divwX to XLEN
Comment 56 dmitry.selyutin 2021-08-31 20:16:27 BST
(In reply to Luke Kenneth Casson Leighton from comment #54)
> (In reply to Jacob Lifshay from comment #53)
> if you can let me do the cherry-picking, that will make me do the
> testing.

I think it's what we have to do for now, but I actually dislike it. Ideally I should be able to test everything out-of-box. Since tests from soc look like way too beefy for machine I possess, I think we should eventually make talos do the job. This, however, has its difficulties as well (pia, rust, pip, other potential problems which I'm not aware about yet).
Comment 57 Luke Kenneth Casson Leighton 2021-09-01 20:37:55 BST
all shift patches in, bug #682 there's a srad bug though.
Comment 58 Luke Kenneth Casson Leighton 2021-09-03 17:58:09 BST
https://libre-soc.org/openpower/isa/comparefixed/

this one has some weirdness:

1) don't touch CRs (elwidth overrides do not apply to CRs).

2) only do RA, RB, RS

3) cmpeqb is... well, it's unusual (non-standard). should be
   rewritten as a for-loop just like is done in bpermd
   for i = 0 to (XLEN/8)

4) cmprb is *particularly* odd.  i have no problem *at all*
   with saying that this one *only* works on XLEN=64 and
   XLEN=32 and that XLEN=16 or XLEN=8 is UNDEFINED behaviour.
Comment 59 dmitry.selyutin 2021-09-04 20:52:08 BST
I'm not sure whether BCD are appropriate candidates. They all speak of _sixes_, which, depending on context, designate either 6 4-bit BCD numbers upon input to be converted to four declets (two declets per word), or four declets (two declets per word) _together_ are converted to 6 4-bit BCD numbers, or, as with addg6s, generate sixes upon addition. I don't feel like this logic can be moved to our needs, at least it'll need a significant wording update.
Comment 60 dmitry.selyutin 2021-09-04 20:57:19 BST
Moreover, even if we're into trying to somehow change the semantics, we won't still be able to put even a single DPD into 8 bits (as it occupies 10 bits). So is for BCDs, actually, because we always take at least 3-digit BCD (which makes 12 bits).
Comment 61 Luke Kenneth Casson Leighton 2021-09-04 22:19:22 BST
(In reply to dmitry.selyutin from comment #59)
> I'm not sure whether BCD are appropriate candidates. They all speak of

ok, then let's just skip them entirely.
Comment 62 Jacob Lifshay 2021-09-06 04:29:59 BST
(In reply to dmitry.selyutin from comment #59)
> I'm not sure whether BCD are appropriate candidates. They all speak of
> _sixes_, which, depending on context, designate either 6 4-bit BCD numbers
> upon input to be converted to four declets (two declets per word), or four
> declets (two declets per word) _together_ are converted to 6 4-bit BCD
> numbers, or, as with addg6s, generate sixes upon addition. I don't feel like
> this logic can be moved to our needs, at least it'll need a significant
> wording update.

In the case of addg6s, it totally can have XLEN overridden, since what it does is done on individual BCD digits (4-bit units), so elwidth override should work just fine:
elwidth =
8: corrects for BCD addition of 2 BCD digits (like x86's daa instruction)
16: corrects for BCD addition of 4 BCD digits
32: corrects for BCD addition of 8 BCD digits
64: corrects for BCD addition of 16 BCD digits (unmodified addg6s)
Comment 63 Jacob Lifshay 2021-09-06 04:37:03 BST
(In reply to Jacob Lifshay from comment #62)
> In the case of addg6s, it totally can have XLEN overridden, since what it
> does is done on individual BCD digits (4-bit units), so elwidth override
> should work just fine:
> elwidth =
> 8: corrects for BCD addition of 2 BCD digits (like x86's daa instruction)

so adding an 8-bit BCD number on x86 does:
# inputs in al and cl
add al, cl
daa
# output in al

adding an 8-bit BCD number on SVP64 does:
# inputs in r3 and r4
sv.add/elwidth=8 r5.s, r3.s, r4.s
sv.addg6s/elwidth=8 r3.s, r3.s, r4.s
sv.add/elwidth=8 r3.s, r3.s, r5.s
# output in r3
Comment 64 dmitry.selyutin 2021-09-06 04:58:30 BST
(In reply to Jacob Lifshay from comment #62)
> In the case of addg6s, it totally can have XLEN overridden, since what it
> does is done on individual BCD digits (4-bit units)

I'm talking not only of can it be done or not, but also of the specs. One might argue that we can work even with nibbles, since nibble is sufficient to hold a single-digit BCD or truncated DPD (as long as it's less than 10). But with the wording the specs uses there's not much room for that, and, before doing this, we must agree that:
1. The specs wording is to be changed (e.g. for addg6s there should be no mentioning of 16 bits, or double word); the whole wording should be register-agnostic.
2. The pseudo-code should be updated respectively.

Once we agree on these, I can fill whatever logic we find appropriate.
Comment 65 Jacob Lifshay 2021-09-06 05:09:46 BST
(In reply to dmitry.selyutin from comment #64)
> (In reply to Jacob Lifshay from comment #62)
> > In the case of addg6s, it totally can have XLEN overridden, since what it
> > does is done on individual BCD digits (4-bit units)
> 
> I'm talking not only of can it be done or not, but also of the specs.

ok, though changing the spec's wording may be a pain, I don't think we should just declare BCD instructions too hard to support with SV when there is an obvious way to extend addg6s in particular to arbitrary elwidth.

If we did just decide to skip some instructions just because we'd have to modify the spec text, I think the people on the ISA WG reviewing how SV works would take that as evidence of SV being inconsistently applied and half-baked, increasing our chances of rejection.
Comment 66 dmitry.selyutin 2021-09-06 07:58:09 BST
(In reply to Jacob Lifshay from comment #65)
> If we did just decide to skip some instructions just because we'd have to
> modify the spec text, I think the people on the ISA WG reviewing how SV
> works would take that as evidence of SV being inconsistently applied and
> half-baked, increasing our chances of rejection.

Jacob, it looks like if you've skipped parts of my messages. Here are relevant parts, please, check.

https://bugs.libre-soc.org/show_bug.cgi?id=671#c59
I don't feel like this logic can be moved to our needs, at least it'll need a significant wording update.

https://bugs.libre-soc.org/show_bug.cgi?id=671#c64
I'm talking not only of can it be done or not, but also of the specs.
1. The specs wording is to be changed (e.g. for addg6s there should be no mentioning of 16 bits, or double word); the whole wording should be register-agnostic.
2. The pseudo-code should be updated respectively.

A short summary: if we intend to ever touch these, the current wording from the spec does not leave us much room. These instructions _must_ be updated both in specs and pseudo-code. You only consider the simplest one, addg6s, but even that one needs changes. If we're OK with changing the specs and the original semantics -- let's do it.
Comment 67 Luke Kenneth Casson Leighton 2021-09-06 11:43:50 BST
(In reply to Jacob Lifshay from comment #65)

> If we did just decide to skip some instructions just because we'd have to
> modify the spec text, I think the people on the ISA WG reviewing how SV
> works would take that as evidence of SV being inconsistently applied and
> half-baked, increasing our chances of rejection.

hmm, good points

(In reply to dmitry.selyutin from comment #66)

> A short summary: if we intend to ever touch these, the current wording from
> the spec does not leave us much room. These instructions _must_ be updated
> both in specs and pseudo-code. You only consider the simplest one, addg6s,
> but even that one needs changes. If we're OK with changing the specs and the
> original semantics -- let's do it.

yes. there's no such restriction. every piece of pseudocode must be
accompanied by associated wording in english.
Comment 68 Jacob Lifshay 2021-09-06 15:02:08 BST
According to Wikipedia:
https://en.wikipedia.org/wiki/Densely_packed_decimal
(matches OpenPower/IEEE 754 definition of DPD)

DPD is designed so that one or two BCD digits can be encoded into DPD as just the LSB 4 or 8 bits respectively, meaning that truncating the input/output of cdtbcd and cbcdtd totally are useful and still cover all BCD digits:

elwidth=64:
use existing definition for compatibility

elwidth=32:
use LSB 32-bits of existing definition for compatibility with PowerPC 32-bit ISAs

elwidth=16:
use LSB 16-bits of existing definition, you end up ignoring/zeroing the MSB 2 bits of the 16-bit DPD-side input/output for all valid BCD inputs:
cbcdtd:
    # for input digits 0-9, only lsb 4 bits of DPD are set;
    # idk for 0xA-0xF so keep extra bits anyway
    # for consistency with larger definition
    output = (BCD_TO_DPD(input >> 12) & 0x3F) << 10
    output |= BCD_TO_DPD(input & 0xFFF)
cdtbcd:
    output = (DPD_TO_BCD(input >> 10) & 0xF) << 12
    output |= DPD_TO_BCD(input & 0x3FF)

elwidth=8:
use LSB 8-bits of existing definition, all valid BCD inputs convert to DPD that fits in 8-bits:
cbcdtd:
    # for input digits 0-9, only lsb 8 bits of DPD are set
    output = BCD_TO_DPD(input) & 0xFF
cdtbcd:
    output = DPD_TO_BCD(input) & 0xFF
Comment 69 Jacob Lifshay 2021-09-06 15:26:21 BST
(In reply to Jacob Lifshay from comment #68)
> elwidth=16:
> use LSB 16-bits of existing definition, you end up ignoring/zeroing the MSB
> 2 bits of the 16-bit DPD-side input/output for all valid BCD inputs:

turns out the MSB 2 bits aren't ignored for out-of-range inputs/outputs, so we need to keep them. the pseudo-code is still correct afaict.
Comment 70 Luke Kenneth Casson Leighton 2021-09-06 15:32:16 BST
jacob, dmitry is going to be away for a week, and this bug is getting pretty long, can you raise and link a separate bugreport and do the 3 BCD instructions? you have a pretty good handle on them.  drop into xlen branch, it works well.
try to minimise changes, make them really clear, create temp vars involving
XLEN/8 (or XLEN/4) etc if they recur repetitively.
Comment 71 Jacob Lifshay 2021-09-07 02:30:07 BST
(In reply to Jacob Lifshay from comment #63)
> adding an 8-bit BCD number on SVP64 does:
> # inputs in r3 and r4
> sv.add/elwidth=8 r5.s, r3.s, r4.s
> sv.addg6s/elwidth=8 r3.s, r3.s, r4.s
> sv.add/elwidth=8 r3.s, r3.s, r5.s
> # output in r3

turns out I have the wrong assembly code for OpenPower, the correct code can be derived from the programming note on page 103 (134) of OpenPower ISA v2.07B:
# inputs in r3 and r4
sv.addi/elwidth=8 r3.s, r3.s, 0x66
sv.add/elwidth=8 r5.s, r3.s, r4.s
sv.addg6s/elwidth=8 r3.s, r3.s, r4.s
sv.subf/elwidth=8 r3.s, r3.s, r5.s
# output in r3
Comment 72 dmitry.selyutin 2021-09-29 09:12:39 BST
It seems I missed extsb and extsh routines. These must also be converted. Ideas on the semantics?
Comment 73 Jacob Lifshay 2021-09-29 09:17:09 BST
(In reply to dmitry.selyutin from comment #72)
> It seems I missed extsb and extsh routines. These must also be converted.
> Ideas on the semantics?

sign-extend/truncate the input from XLEN to 8/16 bits, then sign-extend/truncate to XLEN

so extsb is the equivalent of C's:
return (xlen_t)(int8_t)input;

and extsh is:
return (xlen_t)(int16_t)input;