all pseudocode needs to use XLEN width instead of hard-coded values 64 as width. due to MSB0 numbering GREAT CARE NEEDS TO BE TAKEN RS[56:63] becomes RS[XLEN-8:XLEN-1] things that operate on the entire number such as Logical XOR or "add" instructions DO NOT need conversion. Pseudo-code: RT <- (RA|0) + EXTS(SI) does NOT need conversion. prod[0:127] <- MULS((RA), EXTS(SI)) RT <- prod[64:127] needs conversion to: prod[0:XLEN*2] <- MULS(.....) RT <- prod[XLEN:XLEN*2-1] use the individual unit tests, either those in openpower-isa alone or those which run against HDL. See https://libre-soc.org/irclog/%23libre-soc.2021-08-30.log.html#t2021-08-30T13:54:12 there are pipeline-based tests specifically for one pipeline which correspond near 100% with the corresponding markdown file.
first priority, add an XLEN integer constant (not a SelectableInt) to the namespace in ISACaller. commit 89622cf45a5b6c99e23d6d8865635a84ee36d909 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Aug 21 12:52:13 2021 +0100 set XLEN=64 in ISACaller
annoyingly i have discovered that the pseudocode parser, which is based on GardenSnake.py, does not do RS[EXPR:EXPR] only RS[NUM:NUM] this should be a relatively easy fix in parser.py i will investigate
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/parser.py;h=4f11f6f14c996caf50ddb5a83e99c2a9e1e707c7;hb=refs/heads/master#l826 RS[0:XLEN-1] should match perfectly well there, needs investigating.
works perfectly: RA[XLEN:XLEN - 1] fails with a syntax error: RA[XLEN:XLEN-1] because it's interpreted as a number - "-1" which matches against *two* tokens: {VARIABLE}{NUMBER} *not* {VARIABLE}{MINUSOPERATOR}{NUMBER} doh that's very annoying, and i'm not sure how to fix it.
I managed to "teach" parser and and lexer XLEN, but I doubt this is how we want it to be handled. I thought that XLEN should arrive from the instruction encoding. diff --git a/src/openpower/decoder/pseudo/lexer.py b/src/openpower/decoder/pseudo/lexer.py index aad11a0..ce79e6a 100644 --- a/src/openpower/decoder/pseudo/lexer.py +++ b/src/openpower/decoder/pseudo/lexer.py @@ -326,6 +326,7 @@ class PowerLexer: 'INDENT', 'DEDENT', 'ENDMARKER', + 'XLEN', ) # Build the lexer @@ -398,6 +399,11 @@ class PowerLexer: "default": "DEFAULT", } + def t_XLEN(self, t): + r'XLEN' + t.type = self.RESERVED.get(t.value, "XLEN") + return t + def t_NAME(self, t): r'[a-zA-Z_][a-zA-Z0-9_]*' t.type = self.RESERVED.get(t.value, "NAME") diff --git a/src/openpower/decoder/pseudo/parser.py b/src/openpower/decoder/pseudo/parser.py index 4f11f6f..61376ab 100644 --- a/src/openpower/decoder/pseudo/parser.py +++ b/src/openpower/decoder/pseudo/parser.py @@ -721,6 +721,10 @@ class PowerParser: if name in regs + fregs: self.read_regs.add(name) + def p_XLEN(self, p): + """atom : XLEN""" + p[0] = ast.Constant(64) + def p_atom_name(self, p): """atom : NAME""" name = p[1]
too low level. already taken care of, see comment #1 what does need fixing is "XLEN-1" or in fact any expression "z-1" to be recognised as {EXPR} {MINUS} {NUMBER} and that's probably best done by stopping the lexer.py from recognising "minus" in t_NUMBER's regexp, and teaching the parser about a new BNF number "MINUS NUMBER" which returns t[2] after first inverting the Constant value in t[2] then have to worry about the ordering, because if the parser encounters that before it encounters extpressions we end up with exactly the same issue, sigh: {MINUS} followed by {NUMBER} gets eaten and replaced by ast.{-Constant} is exactly what we want to avoid, give {EXPR} {MINUS} {NUMBER} higher priority. i *think* that's achieved by getting the order right that BNF functions are added into parser.py. i think.
(In reply to Luke Kenneth Casson Leighton from comment #4) > doh that's very annoying, and i'm not sure how to fix it. I'm by no means an expert in PLY, but we have the following there: r"""[-]?(\d+(\.\d*)?|\.\d+)([eE][-+]? \d+)?""" The sign is considered to be a part of the token. If I drop [-]?, it gets parsed.
(In reply to dmitry.selyutin from comment #7) > The sign is considered to be a part of the token. If I drop [-]?, it gets > parsed. RA[XLEN-XLEN:XLEN-1] = RB[XLEN-XLEN:XLEN-1] is converted to eq(RA[XLEN - XLEN:XLEN - 1 + 1], RB[XLEN - XLEN:XLEN - 1 + 1]) Also, I've checked that we indeed don't need XLEN to be known, it's a simple t_NAME.
that's two bits of good news, can you re-run pyfnwriter and a full pywriter then lots of test_caller_*.py (all of them) before committing? once you commit that after tests pass i will run the soc test_issuer ones, and the ffmpeg media ones, to make sure all the existing pseudocode works and hasn't been damaged. amazed that was so simple.
(In reply to Luke Kenneth Casson Leighton from comment #9) > that's two bits of good news, can you re-run pyfnwriter and a full pywriter > then lots of test_caller_*.py (all of them) before committing? Sure, I'll do it! Anyway, if it runs successfully, I think we _will_ break non-trivial cases like `(a - -b)` (I don't think these are handled in parser now). I think those are not used now, but anyway.
(In reply to dmitry.selyutin from comment #10) > I think we _will_ break non-trivial cases like `(a - -b)` (I don't think these are handled in parser now). Yes, we do break them, and even simpler ones, like simple negative numbers. Anyway, I still think this should be done at parser level. I'm checking it.
(In reply to dmitry.selyutin from comment #11) > (In reply to dmitry.selyutin from comment #10) > > I think we _will_ break non-trivial cases like `(a - -b)` (I don't think these are handled in parser now). > > Yes, we do break them, and even simpler ones, like simple negative numbers. yyep this is what i expected, can be fixed with a BNF case in parser.py comment #6
usually unary minus is handled as a separate operator... See JavaScript's syntax as an example: https://262.ecma-international.org/#sec-unary-operators I can help out with this bug tomorrow, busy today.
(In reply to Jacob Lifshay from comment #13) > usually unary minus is handled as a separate operator... > > See JavaScript's syntax as an example: > https://262.ecma-international.org/#sec-unary-operators > > I can help out with this bug tomorrow, busy today. ohh, a separate item in the BNF? so, atom would be: expr | number | negnumber | ... or.. oh wait, no i got it. like: unary = ~x | !x etc, you would also have: -x in that list. ok yeah, then that would also work for "EXPR - - NUMBER" it also i think solves the case of not matching against EXPR NUMBER
(In reply to Luke Kenneth Casson Leighton from comment #14) > (In reply to Jacob Lifshay from comment #13) > > usually unary minus is handled as a separate operator... > unary = > ~x > | !x > > etc, you would also have: > > -x > > in that list. yes, exactly!
Actually the issue is simpler. We have `comparison MINUS`, which should've been `MINUS comparison`. I have no idea why it's called comparison, since it's more close to expr, but anyway. The only test that fails among $(find . -name "test_*caller*") is ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this failure related to the changes, so I pushed them.
(In reply to Jacob Lifshay from comment #15) > yes, exactly! okaaay, cool. rrright. well, ngggh the (very basic, initially-just-a-ply-example) parser was heavily simplified from the original BNF used by python 2 (which would have been what it was based on, back in... 2006). it therefore looks like unary ops are detected in the comparison BNF: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/parser.py;h=4f11f6f14c996caf50ddb5a83e99c2a9e1e707c7;hb=HEAD#l638 i have nooo idea if adding "MINUS comparison" to that list would work or not.
(In reply to dmitry.selyutin from comment #16) > Actually the issue is simpler. We have `comparison MINUS`, which should've > been `MINUS comparison`. I have no idea why it's called comparison, since > it's more close to expr, but anyway. The only test that fails among $(find . > -name "test_*caller*") is > ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this > failure related to the changes, so I pushed them. I think the unary minus should be detected in the `unary` production, otherwise you have the issue that `-a < b` is parsed as `-(a < b)`, which is not desired.
(In reply to dmitry.selyutin from comment #16) > ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this > failure related to the changes, so I pushed them. ok i'll check out before that (commit 89622cf45a5b6c) and run it. it's probably a unit test that was already failing.
(In reply to Luke Kenneth Casson Leighton from comment #19) > (In reply to dmitry.selyutin from comment #16) > > > ./src/openpower/decoder/isa/test_caller_svstate.py. I don't see how this > > failure related to the changes, so I pushed them. > > ok i'll check out before that (commit 89622cf45a5b6c) and run it. > it's probably a unit test that was already failing. yep, already failing. Not Your Problem :) mine to deal with.
https://libre-soc.org/irclog/%23libre-soc.2021-08-25.log.html#t2021-08-25T07:54:08 discussion about adding new "helper" routines. if this was not a specification that required a Formal Process for submitting (and voting on) by an independent Foundation (OPF) i would say "yes great idea" immediately to creating as many "helper" routines as we like and need. however as a *specification* the practice of creating references that need the reader to have read a second page (somewhere buried in a 1,300 page document) where they have no idea how to even find the information that they need, this makes the specification hostile to read. bottom line, it's not just "pseudocode", we cannot just think as programmers (only), we have to forward-think: 1) "how will this be received by the OpenPOWER Foundation ISA Working Group" 2) "how will users with near-zero-prior knowledge react to reading this"
The progress so far is available in xlen branch. I've processed parts of fixedarith and fixedlogical; the latter needed some changes on the parser side (we haven't supported expressions in loops). I'm waiting for replies wrt test_pipe_caller_long.py, since it basically blocks me in testing mul instructions (and I suspect most of the rest).
(In reply to dmitry.selyutin from comment #22) > The progress so far is available in xlen branch. i hate branches, you know that, right? :) > I've processed parts of > fixedarith and fixedlogical; the latter needed some changes on the parser > side (we haven't supported expressions in loops). ah that makes sense. what you can show "works" just drop it into master branch. > I'm waiting for replies wrt test_pipe_caller_long.py, since it basically > blocks me in testing mul instructions (and I suspect most of the rest). jacob can help with that. jacob you need to adjust the dev-env-setup acript so that it can be run "straight" (and not part of an external non-libresoc redource) running test_issuer.py (from soc) is an important one, it compares against the HDL (which, obviously, has not changed)
(In reply to Luke Kenneth Casson Leighton from comment #23) > jacob you need to adjust the dev-env-setup acript > so that it can be run "straight" (and not part of an external non-libresoc > redource) I already adjusted dev-env-setup https://git.libre-soc.org/?p=dev-env-setup.git;a=history;f=pia-install;h=5e8ee78d1087e7b9e6525d200468bb39f9854cd7;hb=6e6203c9017e0471070fcd1f606b37f87d723ac6
(In reply to Jacob Lifshay from comment #24) > > I already adjusted dev-env-setup > https://git.libre-soc.org/?p=dev-env-setup.git;a=history;f=pia-install; > h=5e8ee78d1087e7b9e6525d200468bb39f9854cd7; > hb=6e6203c9017e0471070fcd1f606b37f87d723ac6 fantastic, dmitry can you run that and check it
for i in 0 to 7 index <- (RS)[8*i:8*i+7] in bpermd, RS may be 32 16 or 8 bit. we need to decide what to do here. bpermd is one of those unusual ones, not so straightforward. see bug #680
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=a14ebcbc17e690976d054be5ec361c0ecc043c97 neat.
Updated the branch with some of div instructions. The "pipe" tests are fragile and there are 900+ tests failed even on master (perhaps this means that I did something wrong upon establishing pia and soc dependencies). Any help is appreciated. I'll continue updating the instructions; once I get to testing, I'll resolve issues, if any, cumulatively, then I'll rebase the branch atop of master.
(In reply to dmitry.selyutin from comment #28) > once I get to testing For now, I only check that pywriter can generate the Python code.
Also, a question on prtyd instruction. In specs we have the following: "s ⊕ (RS)i%8+7". In fixedlogical.mdwn, we have "s ^ (RS)[i*8+7]". Is it correct?
(In reply to dmitry.selyutin from comment #30) > Also, a question on prtyd instruction. In specs we have the following: "s ⊕ > (RS)i%8+7". In fixedlogical.mdwn, we have "s ^ (RS)[i*8+7]". Is it correct? look at section 1.3.2 the symbol is defined there. we obviously had to use the standard ASCII symbol for XOR. however you are probably referring to the use of "%" not "*" which looks like a bug in the v3.0B specification. it may have already been raised.
(In reply to dmitry.selyutin from comment #28) > Updated the branch with some of div instructions. The "pipe" tests are > fragile and there are 900+ tests failed even on master (perhaps this means > that I did something wrong upon establishing pia and soc dependencies). Any > help is appreciated. jacob can you please help Dmitry by running the tests and investigating the software library that you are responsible for, it is holding Dmitry up. > I'll continue updating the instructions; once I get to testing, I'll resolve > issues, if any, cumulatively, then I'll rebase the branch atop of master. thid is precisely why i do not like branches. it would be much much better for the work to be done only in master branch by running individual tests and on confirmation of one instruction being functional to push it, then move on to the next one. even trying to do that, now, due to the large number of commits in a branch, is going to be a massive headache. divergence of branches is a huge operational inconvenience. we now have to do an all-or-nothing approach and it is being held up by jacob. jacob csn you please pay attention.
https://libre-soc.org/irclog/%23libre-soc.2021-08-30.log.html#t2021-08-30T09:29:38 changing the element width to 32/16/8 on fields that are only 4 bit in width in the first place makes no sense. so anything that goes into CR leave it alone. we'll have to deal with that separately (in a different way), but element-width overrides will not be done on CRs.
a new test in power_pseudo.py [0]*16] produces this: Module(body=[Expr(value=Call(func=Name(id='concat', ctx=Load()), args=[Constant(value=0)], keywords=[keyword(arg='repeat', value=Constant(value=16))]))]) astor dump Module( body=[ Expr( value=Call(func=Name(id='concat'), args=[Constant(value=0)], keywords=[keyword(arg='repeat', value=Constant(value=16))]))]) to source concat(0, repeat=16) where the following: [0]*(XLEN-16) instead produces this: Module(body=[BinOp(left=List(elts=[Constant(value=0)], ctx=Load()), op=Mult(), right=BinOp(left=Name(id='XLEN', ctx=Load()), op=Sub(), right=Constant(value=16)))]) astor dump Module( body=[ BinOp(left=List(elts=[Constant(value=0)]), op=Mult, right=BinOp(left=Name(id='XLEN'), op=Sub, right=Constant(value=16)))]) to source ([0] * (XLEN - 16)) this is a bug. the source code generated should have been: concat(0, repeat=XLEN-16)
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/parser.py;h=03d0fd5e6aca7128ecfa2d42d36be84e0ca0d353;hb=b3061c38ef984418e266c6f74b366a7e953840a5#l179 identify selectable int multiply pattern is not able to recognise [0] * (expression) only a Constant.
(In reply to Luke Kenneth Casson Leighton from comment #35) > https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/ > decoder/pseudo/parser.py;h=03d0fd5e6aca7128ecfa2d42d36be84e0ca0d353; > hb=b3061c38ef984418e266c6f74b366a7e953840a5#l179 > > identify selectable int multiply pattern is not able to recognise > > [0] * (expression) > > only a Constant. sorted. commit 73d15d3c4c5550ef85fbd1ea68b9afca2500d0bd Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Aug 30 13:05:59 2021 +0100 also add pattern-recognition for just [0] * XLEN have to keep a close eye on this commit 39a0dc7397c592b4b53631b2284d142fd20ac485 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Aug 30 13:03:53 2021 +0100 fix pattern-match for an expression such as "XLEN-16" when looking for concat substitutions [item] * NUMBER was replaced with concat(item, repeat=NUMBER) but [item] * (XLEN-16) was not matching by adding a HACK which spots ast.Binop then [item]*(XLEN-16) can be recognised
reviewing this one: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=6bc1419dbcd0ee84cd28c8867eaf7dae7954c4e5 - n <- 32 - do while n < 64 + n <- (XLEN/2) + do while n < XLEN if (RS)[n] = 1 then leave n <- n + 1 - RA <- n - 32 + RA <- n - (XLEN/2) 258 Pseudo-code: 259 260 n <- (XLEN/2) 261 do while n < XLEN 262 if (RS)[n] = 1 then 263 leave 264 n <- n + 1 265 RA <- n - (XLEN/2) 266 i *think* that's ok. the concept of a "word" is effectively redefined to "half-of-the-XLEN-width" which is something that's being done for FP "single" numbers. (any operation fmuls for example we are DEFINING as "actually the "s" means "do the FP op at half of the elwidth"") it does however get very weird for 8-bit. plus, we need to actually add *full* width cntlzd noted in bug #678
in the new RANGE helper, the inverted direction was missing - if start.value > end.value: # start greater than end, must go -ve - # auto-subtract-one (sigh) due to python range - end = ast.BinOp(p[6], ast.Add(), ast.Constant(-1)) - arange = [start, end, ast.Constant(-1)] this needed to be converted to: return range(start, end, -1) the inverted countdown (-1) was missing this was found by running: python3 fu/cr/test/test_pipe_caller.py >& /tmp/f where the pseudocode for mtocrf in sprset.mwdn is: * mtocrf FXM,RS Pseudo-code: n <- 7 do i = 7 to 0 <---- needs to be range(7,-1,-1) not range(7,-1) if FXM[i] = 1 then n <- i
(In reply to Luke Kenneth Casson Leighton from comment #31) > (In reply to dmitry.selyutin from comment #30) > > Also, a question on prtyd instruction. In specs we have the following: "s ⊕ > > (RS)i%8+7". In fixedlogical.mdwn, we have "s ^ (RS)[i*8+7]". Is it correct? > > look at section 1.3.2 the symbol is defined there. > we obviously had to use the standard ASCII symbol > for XOR. > > however you are probably referring to the use of > "%" not "*" which looks like a bug in the v3.0B > specification. > > it may have already been raised. yes it was http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-May/000046.html the spec is wrong.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ff49e2ca41d621532795f960c715d7b7e9c5379c 332 * popcntw RA, RS 333 334 Pseudo-code: 335 336 do i = 0 to ((XLEN/32)-1) 337 n <- 0 338 do j = 0 to 31 339 if (RS)[(i*32)+j] = 1 then 340 n <- n+1 341 RA[(i*32):(i*32)+31] <- n this one doesn't look right. my intuition is that it should be redefined to be "counting halves of the input". more like: 336 do i = 0 to 1 337 n <- 0 338 do j = 0 to (XLEN/2)-1 339 if (RS)[(i*(XLEN/2))+j] = 1 then down at elwidth=8 it gets particularly hairy / interesting because the counting (both of the src RS into RA) would go into nibbles. but, actually, i think even that would work. not cherry-picking this one over. (edit: raised as bug #679 so as to keep track)
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ef0109fb8e9d6e9f941cfa90bff389d522927c8e looks good. extsw. weird, but good. redefines the concept behind extsw as "sign-extend half of the elwidth" rather than sign-extend from a fixed hard-coded quantity of 32"
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=a9670992d6c9348d7f5c6d3d657a942a0b811f8e - RA <- EXTZ64(n) + RA <- EXTZ(n) aaahno. it would be better to over-ride the behaviour of EXTZ64 or at least not make that kind of "functional" change, right now. not cherry-picking this one (cnttzd) adding to bug #679
moving on to fixedarith (the mul cases) found that auto-creation (auto-assignment) was stopping on expressions: prod[0:XLEN-1] <- something the variable "prod" does not exist so has to be auto-created however the detection was ONLY capable of recognising prod[0:31] <- something fixed: quite a big intrusive change so re-running test_issuer.py (for the 5th time today) https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=69c1f44388226d1fde2e6aa70ce2fd5bc660bd87
just going through cherry-picking from xlen branch (bleh) spotted that mulli doesn't have a short test, so made one https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5e7b5ce86ded1f7686e4fa45538df0f0d7a5c5c2
created bug #681 there are no madd* unit tests in ISACaller (stand-alone for openpower-isa) so i have cherry-picked the madd* xlen branch code over for now, untested. this is fine because it can't do any "damage", with no actual implementation.
(In reply to Luke Kenneth Casson Leighton from comment #40) > this one doesn't look right. my intuition is that it should be > redefined to be "counting halves of the input". The confusion is raised by these words from the spec: A count of the number of one bits in each word of regis- ter RS is placed into the corresponding word of register RA. This number ranges from 0 to 32, inclusive. I assumed "word" implies 32-bit. However, I think your rationale fits better. Let me re-check these.
(In reply to Luke Kenneth Casson Leighton from comment #39) > (In reply to Luke Kenneth Casson Leighton from comment #31) > > (In reply to dmitry.selyutin from comment #30) > > however you are probably referring to the use of > > "%" not "*" which looks like a bug in the v3.0B > > specification. > > > > it may have already been raised. > > the spec is wrong. Yep, % is the thing I meant. The issue is still present at revision C.
(In reply to dmitry.selyutin from comment #46) > The confusion is raised by these words from the spec: > > A count of the number of one bits in each word of regis- > ter RS is placed into the corresponding word of register > RA. This number ranges from 0 to 32, inclusive. > > I assumed "word" implies 32-bit. However, I think your rationale fits > better. Let me re-check these. with elwidth overrides we are wandering into a partiticularly important strategic part of SVP64, which is *not* approved yet by the OPF ISA WG and will need to be submitted via an official process. what we are doing - right now - is actually *defining* how SVP64 shall fit on top of these Power ISA scalar v3 0B operations. i.e. it is *not* up to the creators of the Power ISA v3.0B to say how the combination of Draft SVP64 shall behave: that is down to us. thus it is a slightly weird situation in which we have to think, "64 bit versions of these instructions, must be v3.0B compliant without a shadow of doubt, but 32 16 and 8 bit versions, those are our responsibility to think through and define... ... oh and then submit to the OPF ISA WG when they are ready" consequently, the actual wording of the v3.0B scalar spec is *not* helpful for defining what shall happen at 32, 16, or 8 bit: it is only strictly helpful at the full 64 bit
(In reply to Luke Kenneth Casson Leighton from comment #40) > but, actually, i think even that would work. This seems to do the trick: do i = 0 to 1 n <- 0 do j = 0 to ((XLEN/2)-1) if (RS)[(i*(XLEN/2))+j] = 1 then n <- n+1 RA[(i*(XLEN/2)):(i*(XLEN/2))+((XLEN/2)-1)] <- n ...the last line looks particularly impressive.
(In reply to dmitry.selyutin from comment #49) > (In reply to Luke Kenneth Casson Leighton from comment #40) > > but, actually, i think even that would work. > > This seems to do the trick: > > do i = 0 to 1 > n <- 0 > do j = 0 to ((XLEN/2)-1) > if (RS)[(i*(XLEN/2))+j] = 1 then > n <- n+1 > RA[(i*(XLEN/2)):(i*(XLEN/2))+((XLEN/2)-1)] <- n > > ...the last line looks particularly impressive. much joy. let's write a spec that wins an obfuscation contest, double bonus points. much as i am tempted by that possibility, can i suggest, a temp var or two: e = XLEN/2-1 do i = 0 to 1 s = i*XLEN/2 n <- 0 do j = 0 to e if (RS)[s+j] = 1 then n <- n+1 RA[s:s+e] <- n or other mercifully short temp var names (edit: done) commit cc990388b1f29c87f4d7ff53edbef9d34312b6f8 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Aug 31 12:19:49 2021 +0100 adjusted popcntw to simplify by using temp vars https://bugs.libre-soc.org/show_bug.cgi?id=671#c50 e <- (XLEN/2)-1 s <- i*XLEN/2
- EA <- b + EXTS(D) - RT <- [0]*56 || MEM(EA, 1) + EA <- b + EXTSREG(D) + RT <- ([0] * (XLEN-8)) || MEM(EA, 1) nope, sorry dmitry, i said we musn't add new helper functions, unless there is an absolute critical need that there is absolutely no other way. in this case EXTS already does the job due to already having been designed to do "dynamic" bitlength (setting bits to 256).
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=fc68ee12c54f7f096a2631a9254db30fe4cde4e8 again, this one, it's important not to do changes to use an undocumented arbitrary helper function the idea here us, this pseudocode ACTUALLY will be submitted as ACTUAL v3.N specification with an OFFICIAL Request For Change to the OpenPOWER ISA. therefore all and any "helper" routines have to have a full audit, review, full justification, documentation, explanation, and finally submission to the ISA WG. i.e. a total pain in the ass. if we can simply overload EXTS64 via operator-overloading then when the pseudocode is submitted *without* EXTSREG the *reviewer* says, "this is meaningless please change it" *now* we have justification. in addition i said we are not doing width chsnges yet: width is currently hardcoded to 64. the *second* phase once hardcoded numbers *ONLY* have been replaced by XLEN will be to look at functions and try XLEN=32/16/8 one incremental step at a time, one task at a time. the larger the task the more critically important it is to ONLY strictly do that one regular task.
(In reply to Luke Kenneth Casson Leighton from comment #32) > (In reply to dmitry.selyutin from comment #28) > > Updated the branch with some of div instructions. The "pipe" tests are > > fragile and there are 900+ tests failed even on master (perhaps this means > > that I did something wrong upon establishing pia and soc dependencies). Any > > help is appreciated. > > jacob can you please help Dmitry by running the tests and investigating > the software library that you are responsible for, it is holding Dmitry > up. I tried: https://libre-soc.org/irclog/%23libre-soc.2021-08-31.log.html#t2021-08-31T05:48:58
(In reply to Jacob Lifshay from comment #53) > I tried: > https://libre-soc.org/irclog/%23libre-soc.2021-08-31.log.html#t2021-08-31T05: > 48:58 brilliant, thanks jacob. ghostmansd: fixedload.mdwn reviewed and cherry-picked over. am finding that although it's tedious as hell the xlen branch is working because the cherry-picking forces me to read each individual patch (and run unit tests to confirm it). if you can let me do the cherry-picking, that will make me do the testing.
all the div/mod patches are cherry-picked as well, unit tests pass. 4 hours ago Dmitry Selyutin fixedarith: switch modsd to XLEN 4 hours ago Dmitry Selyutin fixedarith: switch divdeX to XLEN 4 hours ago Dmitry Selyutin fixedarith: switch divdX to XLEN 4 hours ago Dmitry Selyutin fixedarith: switch modsw to XLEN 4 hours ago Dmitry Selyutin fixedarith: switch divweuX to XLEN 4 hours ago Dmitry Selyutin fixedarith: switch divweX to XLEN 4 hours ago Dmitry Selyutin fixedarith: switch divwX to XLEN
(In reply to Luke Kenneth Casson Leighton from comment #54) > (In reply to Jacob Lifshay from comment #53) > if you can let me do the cherry-picking, that will make me do the > testing. I think it's what we have to do for now, but I actually dislike it. Ideally I should be able to test everything out-of-box. Since tests from soc look like way too beefy for machine I possess, I think we should eventually make talos do the job. This, however, has its difficulties as well (pia, rust, pip, other potential problems which I'm not aware about yet).
all shift patches in, bug #682 there's a srad bug though.
https://libre-soc.org/openpower/isa/comparefixed/ this one has some weirdness: 1) don't touch CRs (elwidth overrides do not apply to CRs). 2) only do RA, RB, RS 3) cmpeqb is... well, it's unusual (non-standard). should be rewritten as a for-loop just like is done in bpermd for i = 0 to (XLEN/8) 4) cmprb is *particularly* odd. i have no problem *at all* with saying that this one *only* works on XLEN=64 and XLEN=32 and that XLEN=16 or XLEN=8 is UNDEFINED behaviour.
I'm not sure whether BCD are appropriate candidates. They all speak of _sixes_, which, depending on context, designate either 6 4-bit BCD numbers upon input to be converted to four declets (two declets per word), or four declets (two declets per word) _together_ are converted to 6 4-bit BCD numbers, or, as with addg6s, generate sixes upon addition. I don't feel like this logic can be moved to our needs, at least it'll need a significant wording update.
Moreover, even if we're into trying to somehow change the semantics, we won't still be able to put even a single DPD into 8 bits (as it occupies 10 bits). So is for BCDs, actually, because we always take at least 3-digit BCD (which makes 12 bits).
(In reply to dmitry.selyutin from comment #59) > I'm not sure whether BCD are appropriate candidates. They all speak of ok, then let's just skip them entirely.
(In reply to dmitry.selyutin from comment #59) > I'm not sure whether BCD are appropriate candidates. They all speak of > _sixes_, which, depending on context, designate either 6 4-bit BCD numbers > upon input to be converted to four declets (two declets per word), or four > declets (two declets per word) _together_ are converted to 6 4-bit BCD > numbers, or, as with addg6s, generate sixes upon addition. I don't feel like > this logic can be moved to our needs, at least it'll need a significant > wording update. In the case of addg6s, it totally can have XLEN overridden, since what it does is done on individual BCD digits (4-bit units), so elwidth override should work just fine: elwidth = 8: corrects for BCD addition of 2 BCD digits (like x86's daa instruction) 16: corrects for BCD addition of 4 BCD digits 32: corrects for BCD addition of 8 BCD digits 64: corrects for BCD addition of 16 BCD digits (unmodified addg6s)
(In reply to Jacob Lifshay from comment #62) > In the case of addg6s, it totally can have XLEN overridden, since what it > does is done on individual BCD digits (4-bit units), so elwidth override > should work just fine: > elwidth = > 8: corrects for BCD addition of 2 BCD digits (like x86's daa instruction) so adding an 8-bit BCD number on x86 does: # inputs in al and cl add al, cl daa # output in al adding an 8-bit BCD number on SVP64 does: # inputs in r3 and r4 sv.add/elwidth=8 r5.s, r3.s, r4.s sv.addg6s/elwidth=8 r3.s, r3.s, r4.s sv.add/elwidth=8 r3.s, r3.s, r5.s # output in r3
(In reply to Jacob Lifshay from comment #62) > In the case of addg6s, it totally can have XLEN overridden, since what it > does is done on individual BCD digits (4-bit units) I'm talking not only of can it be done or not, but also of the specs. One might argue that we can work even with nibbles, since nibble is sufficient to hold a single-digit BCD or truncated DPD (as long as it's less than 10). But with the wording the specs uses there's not much room for that, and, before doing this, we must agree that: 1. The specs wording is to be changed (e.g. for addg6s there should be no mentioning of 16 bits, or double word); the whole wording should be register-agnostic. 2. The pseudo-code should be updated respectively. Once we agree on these, I can fill whatever logic we find appropriate.
(In reply to dmitry.selyutin from comment #64) > (In reply to Jacob Lifshay from comment #62) > > In the case of addg6s, it totally can have XLEN overridden, since what it > > does is done on individual BCD digits (4-bit units) > > I'm talking not only of can it be done or not, but also of the specs. ok, though changing the spec's wording may be a pain, I don't think we should just declare BCD instructions too hard to support with SV when there is an obvious way to extend addg6s in particular to arbitrary elwidth. If we did just decide to skip some instructions just because we'd have to modify the spec text, I think the people on the ISA WG reviewing how SV works would take that as evidence of SV being inconsistently applied and half-baked, increasing our chances of rejection.
(In reply to Jacob Lifshay from comment #65) > If we did just decide to skip some instructions just because we'd have to > modify the spec text, I think the people on the ISA WG reviewing how SV > works would take that as evidence of SV being inconsistently applied and > half-baked, increasing our chances of rejection. Jacob, it looks like if you've skipped parts of my messages. Here are relevant parts, please, check. https://bugs.libre-soc.org/show_bug.cgi?id=671#c59 I don't feel like this logic can be moved to our needs, at least it'll need a significant wording update. https://bugs.libre-soc.org/show_bug.cgi?id=671#c64 I'm talking not only of can it be done or not, but also of the specs. 1. The specs wording is to be changed (e.g. for addg6s there should be no mentioning of 16 bits, or double word); the whole wording should be register-agnostic. 2. The pseudo-code should be updated respectively. A short summary: if we intend to ever touch these, the current wording from the spec does not leave us much room. These instructions _must_ be updated both in specs and pseudo-code. You only consider the simplest one, addg6s, but even that one needs changes. If we're OK with changing the specs and the original semantics -- let's do it.
(In reply to Jacob Lifshay from comment #65) > If we did just decide to skip some instructions just because we'd have to > modify the spec text, I think the people on the ISA WG reviewing how SV > works would take that as evidence of SV being inconsistently applied and > half-baked, increasing our chances of rejection. hmm, good points (In reply to dmitry.selyutin from comment #66) > A short summary: if we intend to ever touch these, the current wording from > the spec does not leave us much room. These instructions _must_ be updated > both in specs and pseudo-code. You only consider the simplest one, addg6s, > but even that one needs changes. If we're OK with changing the specs and the > original semantics -- let's do it. yes. there's no such restriction. every piece of pseudocode must be accompanied by associated wording in english.
According to Wikipedia: https://en.wikipedia.org/wiki/Densely_packed_decimal (matches OpenPower/IEEE 754 definition of DPD) DPD is designed so that one or two BCD digits can be encoded into DPD as just the LSB 4 or 8 bits respectively, meaning that truncating the input/output of cdtbcd and cbcdtd totally are useful and still cover all BCD digits: elwidth=64: use existing definition for compatibility elwidth=32: use LSB 32-bits of existing definition for compatibility with PowerPC 32-bit ISAs elwidth=16: use LSB 16-bits of existing definition, you end up ignoring/zeroing the MSB 2 bits of the 16-bit DPD-side input/output for all valid BCD inputs: cbcdtd: # for input digits 0-9, only lsb 4 bits of DPD are set; # idk for 0xA-0xF so keep extra bits anyway # for consistency with larger definition output = (BCD_TO_DPD(input >> 12) & 0x3F) << 10 output |= BCD_TO_DPD(input & 0xFFF) cdtbcd: output = (DPD_TO_BCD(input >> 10) & 0xF) << 12 output |= DPD_TO_BCD(input & 0x3FF) elwidth=8: use LSB 8-bits of existing definition, all valid BCD inputs convert to DPD that fits in 8-bits: cbcdtd: # for input digits 0-9, only lsb 8 bits of DPD are set output = BCD_TO_DPD(input) & 0xFF cdtbcd: output = DPD_TO_BCD(input) & 0xFF
(In reply to Jacob Lifshay from comment #68) > elwidth=16: > use LSB 16-bits of existing definition, you end up ignoring/zeroing the MSB > 2 bits of the 16-bit DPD-side input/output for all valid BCD inputs: turns out the MSB 2 bits aren't ignored for out-of-range inputs/outputs, so we need to keep them. the pseudo-code is still correct afaict.
jacob, dmitry is going to be away for a week, and this bug is getting pretty long, can you raise and link a separate bugreport and do the 3 BCD instructions? you have a pretty good handle on them. drop into xlen branch, it works well. try to minimise changes, make them really clear, create temp vars involving XLEN/8 (or XLEN/4) etc if they recur repetitively.
(In reply to Jacob Lifshay from comment #63) > adding an 8-bit BCD number on SVP64 does: > # inputs in r3 and r4 > sv.add/elwidth=8 r5.s, r3.s, r4.s > sv.addg6s/elwidth=8 r3.s, r3.s, r4.s > sv.add/elwidth=8 r3.s, r3.s, r5.s > # output in r3 turns out I have the wrong assembly code for OpenPower, the correct code can be derived from the programming note on page 103 (134) of OpenPower ISA v2.07B: # inputs in r3 and r4 sv.addi/elwidth=8 r3.s, r3.s, 0x66 sv.add/elwidth=8 r5.s, r3.s, r4.s sv.addg6s/elwidth=8 r3.s, r3.s, r4.s sv.subf/elwidth=8 r3.s, r3.s, r5.s # output in r3
It seems I missed extsb and extsh routines. These must also be converted. Ideas on the semantics?
(In reply to dmitry.selyutin from comment #72) > It seems I missed extsb and extsh routines. These must also be converted. > Ideas on the semantics? sign-extend/truncate the input from XLEN to 8/16 bits, then sign-extend/truncate to XLEN so extsb is the equivalent of C's: return (xlen_t)(int8_t)input; and extsh is: return (xlen_t)(int16_t)input;
Marking as deferred for now due to binutils tasks.
(In reply to Dmitry Selyutin from comment #74) > Marking as deferred for now due to binutils tasks. this task can in no way be "deferred" to a future indefinite time, with zero scheduled discussion on when that should take place. it *has* to be completed and is really important. this one will certainly get linked to the cavatools project, 2021-08-071.
Ah OK, sorry for confusion. I wanted to stress that right now I'm not working on it due to binutils, "deferred" seemed to be the best choice, since I have plans to work on this task after I complete binutils.