656 – add v3.0B BCD instructions to simulator

Bug 656 - add v3.0B BCD instructions to simulator

Summary: add v3.0B BCD instructions to simulator

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Dmitry Selyutin

URL:

Depends on:	657
Blocks:
	Show dependency tree / graph

Reported:	2021-07-14 12:50 BST by Luke Kenneth Casson Leighton
Modified:	2022-06-13 20:06 BST (History)
CC List:	3 users (show)

See Also:	657
NLnet milestone:	NLNet.2019.10.046.Standards
total budget (EUR) for completion of task and all subtasks:	200
budget (EUR) for this task, excluding subtasks' budget:	200
parent task for budget allocation:	241
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	ghostmansd={amount=100, paid=2021-11-01} maciej_3mdeb=100

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2021-07-14 12:50:22 BST

3 new instructions are needed, for BCD:
* addg6s
* cdtbcd
* cbcdtd

these are on p109 book I section 3.3.9 v3.0C
https://ftp.libre-soc.org/PowerISA_public.v3.0C.pdf

to support cdtbcd and cbcdtd, two functions will need to be created:

*  BCD_TO_DPD
*  DPD_TO_BCD

these will need adding in src/openpower/decoder/helpers.py
and then added to the top of the code-generating template
in pywriter.py

to get the carry-out bit, the pseudocode for addg6s will need to be
rewritten into a "programmatic" way rather than a "here's the general
idea of how to get the 65th bit from a 64-bit add"

do i = 0 to 15
   dc[i] <- carry_out(RA[4*i:63] + RB[4*i:63])

will need to be changed to something like:

   temp <- (0b0 || RA) + (0b0 || RB)
   dc[16] <- [0]*16
   do i = 0 to 15
       dc[i] <- temp[4*i]

(something like that)

and unit tests will need writing which check that they're functional.

oh.  haha, they're already added, just not yet tested.
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/bcd.mdwn;hb=HEAD

entries will need to be added to minor_31.csv, based on this: https://github.com/antonblanchard/microwatt/blob/f9654428ff28744dcf1ad4e7fe43604415689f36/decode1.vhdl#L206

OP_ADDG6S and OP_BCD will need to be added to power_enums.py:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_enums.py;hb=HEAD

also  addg6s cdtbcd cbcdtd to _insns in alphabetical order

TODO checklist *in this order* due to dependencies (edit as needed):

* add OP_* to power_enums.py    - TODO
* add insn asmcodes to _insns   - TODO
* add minor_31.csv entries      - TODO
* add DCT_TO_BCD helper         - TODO
* add BCD_TO_DCD helper         - TODO
* correct addg6s pseudocode     - TODO
* then move on to bug #657      - TODO

at each step, *other* unit tests (at least one) need to be run (doesn't
matter which ones) as a way to verify that no "damage" has been done,
no python syntax errors etc.

commits on each of the above to be done separately and pushed with
appropriate commit messages, matching each of the TODO checklist
bulletpoints.

progress (copy of each commit log message incl. hash) to be sent here.

Comment 1 Luke Kenneth Casson Leighton 2021-07-28 11:20:46 BST

dmitry the pseudocode for cdtbcd and cbcdtd uses a couple of helper functions
which need to be added before they can "work".  addg6s on the other hand
you can do straight away without this.

i found in "v3.0B p787 Book I Appendix B.1 and B.2" the following
pseudocode:

B.1 BCD-to-DPD Translation
The translation from a 3-digit BCD number to a 10-bit
DPD can be performed through the following Boolean
operations.

 p = (f & a & i & Â¬e) | (j & a & Â¬i) | (b & Â¬a)
 q = (g & a & i & Â¬e) | (k & a & Â¬i) | (c & Â¬a)
 r = d
 s = (j  &  Â¬a & e & Â¬i) | (f & Â¬i & Â¬e) |
      (f &  Â¬a & Â¬e) | (e & i)
 t = (k  &  Â¬a & e & Â¬i) | (g & Â¬i & Â¬e) |
      (g &  Â¬a & Â¬e) | (a & i)
 u = h
 v = a | e | i
 w = (Â¬e & j & Â¬i) | (e & i) | a
 x = (Â¬a & k & Â¬i) | (a & i) | e
 y = m

B.2 DPD-to-BCD Translation

The translation from a 10-bit DPD to a 3-digit BCD
number can be performed through the following Bool-
ean operations.

 a =  (Â¬s & v & w) | (t & v & w & s) | (v & w & Â¬x)
 b =  (p & s & x & Â¬t) | (p & Â¬w) | (p & Â¬v)
 c =  (q & s & x & Â¬t) | (q & Â¬w) | (q & Â¬v)
 d =  r
 e = (v & Â¬w & x) | (s & v & w & x) |
      (Â¬t & v & x & w)
 f = (p & t & v & w & x & Â¬s) | (s & Â¬x & v) |
      (s & Â¬v)
 g = (q & t & w & v & x & Â¬s) | (t & Â¬x & v) |
      (t & Â¬v)
 h = u
 i = (t  & v & w & x) | (s & v & w & x) |
      (v & Â¬w & Â¬x)
 j = (p  & Â¬s & Â¬t & w & v) | (s & v & Â¬w & x) |
      (p & w & Â¬x & v) | (w & Â¬v)
 k = (q  & Â¬s & Â¬t & v & w) | (t & v & Â¬w & x) |
      (q & v & w & Â¬x) | (x & Â¬v)
 m = y


after removing the silliness of *yet another* non-ASCII symbol for "¬"
being used, those will need to go into openpower/isafunctions/bcd.mdwn,
along with comments saying where they come from.  see here for example
to cut/paste:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isafunctions/fpfromint.mdwn;hb=HEAD

then you can run "pyfnwriter" to generate the actual python code

then, in pywriter.py you will need to *add* those helper functions
*manually* (at the moment, really they should be done automatically)

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/pywriter.py;hb=HEAD

at line 33, add
from openpower.decoder.isafunctions.bcd import BCD_TO_DPD, DPD_TO_BCD

then re-run pywriter noall bcd in order to get the decoder/isa/bcd.py
to be re-compiled with the import of the helper function added.

or, cheat, and manually edit decoder/isa/bcd.py :)

Comment 2 Jacob Lifshay 2021-07-28 21:34:29 BST

(In reply to Luke Kenneth Casson Leighton from comment #0)
> to get the carry-out bit, the pseudocode for addg6s will need to be
> rewritten into a "programmatic" way rather than a "here's the general
> idea of how to get the 65th bit from a 64-bit add"
> 
> do i = 0 to 15
>    dc[i] <- carry_out(RA[4*i:63] + RB[4*i:63])

the above pseudocode is copied from the spec and I'm assuming it is correct.

> will need to be changed to something like:
> 
>    temp <- (0b0 || RA) + (0b0 || RB)
>    dc[16] <- [0]*16
>    do i = 0 to 15
>        dc[i] <- temp[4*i]

That rewrite is incorrect, since if you add 0x10 + 0x20, it sets dc[15] = 1 even though 0 + 0 has no carry out.

corrected code (afaict):
dc <- [0]*16
carry <- 0b0
do i = 15 to 0
    sum <- (0b0 || RA[4*i:4*i+3]) + (0b0 || RB[4*i:4*i+3]) + (0b0000 || carry)
    carry <- sum[0]
    dc[i] <- carry

Comment 3 Luke Kenneth Casson Leighton 2021-07-28 23:45:25 BST

(In reply to Jacob Lifshay from comment #2)

> do i = 15 to 0
>     sum <- (0b0 || RA[4*i:4*i+3]) + (0b0 || RB[4*i:4*i+3]) + (0b0000 ||
> carry)

note 4*i:  >>>63<<<
not  4*i:  4*i+3

> > do i = 0 to 15
> >    dc[i] <- carry_out(RA[4*i:63] + RB[4*i:63])

therefore, each of the selections of RA and RB parts must be
expanded in length by 1 bit:

    temp <- (0b0 || RA[4*i:63]) + (0b0 || RB[4*i:63])

and because MSB0 order you take the bit numbered ZERO to get the MSB:

    dc[i] <- temp[0]

Comment 4 Jacob Lifshay 2021-07-28 23:52:29 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> 
> > do i = 15 to 0
> >     sum <- (0b0 || RA[4*i:4*i+3]) + (0b0 || RB[4*i:4*i+3]) + (0b0000 ||
> > carry)
> 
> note 4*i:  >>>63<<<
> not  4*i:  4*i+3

the code I wrote intentionally uses RA[4*i:4*i+3] since it's doing the add 4-bits at a time and manually propagating the carries rather than doing sequence of 60, 56, 52, 48, ... 8, and 4 bit adds and relying on the add to propagate carry implicitly. So, yes, 4*i+3 is correct here.

It uses a different algorithm than in the openpower spec.

Comment 5 Luke Kenneth Casson Leighton 2021-07-29 00:11:51 BST

(In reply to Jacob Lifshay from comment #4)

> the code I wrote intentionally uses RA[4*i:4*i+3] since it's doing the add
> 4-bits at a time and manually propagating the carries rather than doing
> sequence of 60, 56, 52, 48, ... 8, and 4 bit adds and relying on the add to
> propagate carry implicitly. So, yes, 4*i+3 is correct here.
> 
> It uses a different algorithm than in the openpower spec.

then given that we are using the pseudocode as a recommendation for
changes to the OpenPOWER spec it is inadviseable to make unnecessary
changes regardless of "efficiency".

making such a change should go through the OPF ISA WG
where we pick it up from there once authorised and approved.

otherwise we have yet another discrepancy that requires yet more time
and yet more resources.

the reason why we had to make changes to mul, div and mod
pseudocode is because they were plain unuseable.

the existing pseudocode is useable.

if you want to propose a change to the Authorised ISA Spec through
the process soon to be established please feel free to do so, however
i would very much prefer that the LibreSOC pseudocode not contain
discrepancies that place a larger burden of time and energy on us
when the existing spec pseudocode generates the right values.

Comment 6 Jacob Lifshay 2021-07-29 00:25:39 BST

(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to Jacob Lifshay from comment #4)
> > It uses a different algorithm than in the openpower spec.
> 
> then given that we are using the pseudocode as a recommendation for
> changes to the OpenPOWER spec it is inadviseable to make unnecessary
> changes regardless of "efficiency".

well, my reason for changing algorithms was understandability due to not having different bitwidths for every iteration...that said, I did kinda forget about trying to keep the same pseudocode

Comment 7 Luke Kenneth Casson Leighton 2021-07-29 01:03:28 BST

(In reply to Jacob Lifshay from comment #6)

> well, my reason for changing algorithms was understandability due to not
> having different bitwidths for every iteration...that said, I did kinda
> forget about trying to keep the same pseudocode

in the HDL it's a different matter, we're free there to do whatever we see fit.

Comment 8 Dmitry Selyutin 2021-07-29 06:35:09 BST

So, to summarize, does the following work for us?

    dc[16] = [0]*16
    do i = 0 to 15
        temp <- (0b0 || RA[4*i:63]) + (0b0 || RB[4*i:63])
        dc[i] <- temp[0]
    c <- ([dc[0]]*4 || [dc[1]]*4 || [dc[2]]*4 || [dc[3]]*4 ||
            [dc[4]]*4 || [dc[5]]*4 || [dc[6]]*4 || [dc[7]]*4 ||
            [dc[8]]*4 || [dc[9]]*4 || [dc[10]]*4 || [dc[11]]*4 ||
            [dc[12]]*4 || [dc[13]]*4 || [dc[14]]*4 || [dc[15]]*4)
    RT <- (¬c) & 0x6666_6666_6666_6666

Comment 9 Jacob Lifshay 2021-07-29 06:56:31 BST

(In reply to dmitry.selyutin from comment #8)
> So, to summarize, does the following work for us?
> 
>     dc[16] = [0]*16
>     do i = 0 to 15
>         temp <- (0b0 || RA[4*i:63]) + (0b0 || RB[4*i:63])
>         dc[i] <- temp[0]
>     c <- ([dc[0]]*4 || [dc[1]]*4 || [dc[2]]*4 || [dc[3]]*4 ||
>             [dc[4]]*4 || [dc[5]]*4 || [dc[6]]*4 || [dc[7]]*4 ||
>             [dc[8]]*4 || [dc[9]]*4 || [dc[10]]*4 || [dc[11]]*4 ||
>             [dc[12]]*4 || [dc[13]]*4 || [dc[14]]*4 || [dc[15]]*4)
>     RT <- (¬c) & 0x6666_6666_6666_6666

Yes, it should for now, however it will likely be a total pain later when we generate C code due to `temp` changing its type each iteration. The pseudocode I suggested doesn't have that issue.

Comment 10 Luke Kenneth Casson Leighton 2021-07-29 19:02:54 BST

https://libre-soc.org/irclog/%23libre-soc.2021-07-28.log.html#t2021-07-28T19:18:02

dmitry: we have no idea how these instructions work. therefore, unit tests
can be written which bounce off of qemu, by way of comparison.

you can experiment with random values and/or fixed values, using
this as a base:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/simulator/test_sim.py;hb=HEAD

that code will co-simulate the instruction against both qemu
*and* the simulator ISACaller, extracting the full contents
of the register files *and* memory, performing a byte-level
comparison.

if the results are different, you know one of two things:

1) qemu is f*****d.
2) ISACaller is borked.

we have actually had (1) occur, it has actually produced the wrong
answers in certain circumstances.  if that happens, we will need
to investigate further.

Comment 11 Luke Kenneth Casson Leighton 2021-07-30 21:03:22 BST

diff --git a/openpower/isa/bcd.mdwn b/openpower/isa/bcd.mdwn
index 30bb161..04753ed 100644
--- a/openpower/isa/bcd.mdwn
+++ b/openpower/isa/bcd.mdwn
@@ -33,13 +33,15 @@ XO-Form
 
 Pseudo-code:
 
+    dc[16] = [0]*16
     do i = 0 to 15

this is a comparison rather than an assignment (can't count the number
of times i've made that error), and it should be dc <- [0]*16.

"dc[16] <- [0]*16" will be trying to assign a 16-bit value into an
implicit 17th bit of dc, which is only 1 bit wide

SelectableInt is quite specific about bit-width matching its inputs
and outputs, so would throw a runtime exception.

Comment 12 Dmitry Selyutin 2021-07-30 21:46:09 BST

Hi Luke, thanks for the fix! It'll probably take me a while until I get used to the notation. I've pushed the fix; do we have some syntax-related test? Or `=` instead of `<-` is considered to be acceptable in this context?

Comment 13 Dmitry Selyutin 2021-07-30 21:54:21 BST

Ah, I misread part of your comment.

> "dc[16] <- [0]*16" will be trying to assign a 16-bit value into an implicit 17th bit of dc, which is only 1 bit wide.

Now it's clear. Thank you again!

Comment 14 Luke Kenneth Casson Leighton 2021-07-30 23:22:06 BST

no problem.  it's quite minimalist and spartan: if this was gcc there would
be massive amounts of error checking, nice error messages etc.
we don't have a huge userbase, nor the resources for that. you have
to adjust accordingly, take care, make minimal changes and carefully
inspect the output and rely on unit tests to ensure the pseudocode
is correct.

Comment 15 Luke Kenneth Casson Leighton 2021-07-31 10:26:17 BST

(In reply to dmitry.selyutin from comment #13)
> Ah, I misread part of your comment.
> 
> > "dc[16] <- [0]*16" will be trying to assign a 16-bit value into an implicit 17th bit of dc, which is only 1 bit wide.
> 
> Now it's clear. Thank you again!

the assignment operator is now being correctly used, however this is still
trying to assign 16 bits of zeros into one single bit (the 17th) of dc.

* one bit wide, 17th bit: dc[16]
* 16 bits wide (zeros)  : [0]*16
* 16 bits trying to be assigned into the 17th bit: dc[16] <- [0]*16

what you want is just

   dc <- [0] * 16

and the compiler is smart enough (note, only for very simple assignments
like this) to note the bit-width of the assignment and automatically create
a variable, dc, of width 16 bit.

you would have found this when writing the unit test which executed this
code.


diff --git a/openpower/isa/bcd.mdwn b/openpower/isa/bcd.mdwn
index 04753ed..63c42b2 100644
--- a/openpower/isa/bcd.mdwn
+++ b/openpower/isa/bcd.mdwn
@@ -33,7 +33,7 @@ XO-Form
 
 Pseudo-code:
 
-    dc[16] = [0]*16
+    dc[16] <- [0]*16
     do i = 0 to 15
         temp <- (0b0 || RA[4*i:63]) + (0b0 || RB[4*i:63])
         dc[i] <- temp[0]

Comment 16 Luke Kenneth Casson Leighton 2021-07-31 16:53:34 BST

https://libre-soc.org/irclog/latest.log.html#t2021-07-31T16:47:57

Comment 17 Luke Kenneth Casson Leighton 2021-07-31 18:53:42 BST

Spec Appendix B p 787 states:

    which supports the representation of decimal integers
    of arbitrary length. Translation operates on three
    Binary Coded Decimal (BCD) digits at a time com-
    pressing the 12 bits into 10 bits with an algorithm that
    can be applied or reversed using simple Boolean oper-
    ations. In the following examples, a 3-digit BCD num-
    ber is represented as (abcd)(efgh)(ijkm), a 10-bit DPD
    number is represented as (pqr)(stu)(v)(wxy), and the
    Boolean operations, & (AND), | (OR), and Â¬ (NOT) are
    used.

therefore, in the "Translation" boolean stuff, let us assume that
the input (val) is 12 bits:

def BCD_TO_DPD(val):
    # first digit, let us assume (guess) that the digits are in
    # left-is-Most-significant, right-is-least-significant order
    # i.e it is:
    #      ....represented as (abcd)(efgh)(ijkm)
    #                         Top   Middle Least significant digit
    # therefore:

    # top BCD digit
    a = val[0] # MSB0 order
    b = val[1] ...
    c = val[2]
    d = val[3]
    ...
    # least-significant BCD digit
    i = val[8]
    j = val[9]
    k = val[10]
    l = val[11]


and that likewise p q r .... are similarly ordered such that
    * p is MSB0-ordered bit zero (0)
    * q is MSB0-ordered bit one
    * ...
    * y is MSB0-ordered bit nine (9)

that should be sufficient to complete the function *WHICH MUST BE IN
PSEUDOCODE NOT CONVERTED TO PYTHON*.

repeat: the PSEUDOCODE from section B.1 shall be placed into a
markdown file.

NOT, repeat, NOT: "take the pseudocode and waste time converting it
by hand to a python function".

Comment 18 Luke Kenneth Casson Leighton 2021-07-31 20:03:13 BST

added SINGLE function to double2single.mdwn:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=55163c2d1f5487e4d5213b256652527843ce002b

add it into pywriter.py so that the compiler uses it:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2fae1ac20baefa05c1f1aac00fa7f0e88d68242c


remove the function which should never have been converted by
hand in the first place from pseudocode manually to python:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=4c83f3216f802de3c6eb8dd2729107ba0a499dea

Comment 19 Dmitry Selyutin 2021-07-31 20:45:51 BST

Pushed the changes into the master. Hopefully we can move to 657. Luke, should I mark it as resolved, or this should be done by you?

Comment 20 Luke Kenneth Casson Leighton 2021-07-31 20:58:39 BST

(In reply to dmitry.selyutin from comment #19)
> Pushed the changes into the master. Hopefully we can move to 657. Luke,
> should I mark it as resolved, or this should be done by you?

once resolved we "lose" it in searches, and it makes tracking payments
very difficult.  to help deal with that, we create user pages,
see http://libre-soc.org/lkcl for an example (to template),
and that helps when creating the RFPs (Requests for Payment)

when the unit tests pass, and it's cross-ref'd on the wiki,
this one can be marked "resolved".

Comment 21 Jacob Lifshay 2021-08-01 05:18:54 BST

(In reply to Luke Kenneth Casson Leighton from comment #20)
> (In reply to dmitry.selyutin from comment #19)
> > Pushed the changes into the master. Hopefully we can move to 657. Luke,
> > should I mark it as resolved, or this should be done by you?
> 
> once resolved we "lose" it in searches, and it makes tracking payments
> very difficult.  to help deal with that, we create user pages,
> see http://libre-soc.org/lkcl for an example (to template),
> and that helps when creating the RFPs (Requests for Payment)

Actually, for the payment stuff, marking it as resolved with dmitry as the assignee and some money assigned to this bug will make it show up on the markdown generated by budget-sync (in /task_db/mdwn/ on the website) next time someone runs and uploads budget-sync's output.

Example generated markdown (exact bugs out-of-date):
## Completed but not yet added to payees list

* [Bug #145](https://bugs.libre-soc.org/show_bug.cgi?id=145):
  reference FP emulation using algebraic numbers

Comment 22 Jacob Lifshay 2021-08-01 10:34:04 BST

explanation of why addg6s is useful:
https://libre-soc.org/irclog/%23libre-soc.2021-08-01.log.html#t2021-08-01T10:10:13

> addg6s is used for doing a BCD addition, like x86's daa instruction:
> https://www.felixcloutier.com/x86/daa
> afaict to do a bcd add of a and b, you do `add r, a, b` then
> `addg6s t, a, b` then `add r, r, t`

Comment 23 Luke Kenneth Casson Leighton 2021-08-01 12:04:10 BST

(In reply to Jacob Lifshay from comment #21)

> Actually, for the payment stuff, marking it as resolved with dmitry as the
> assignee and some money assigned to this bug will make it show up on the
> markdown generated by budget-sync (in /task_db/mdwn/ on the website) next
> time someone runs and uploads budget-sync's output.

this forces me, as the site administrator, to spend my time being the one
to run that script and run the upload.

this is not a reasonable expectation.

additionally, the user page can be used by each team member to contain
work-in-progress notes, where the sync program is completely devoid of any
such information, containing (sterile) information solely and exclusively
about payments.

Comment 24 Jacob Lifshay 2021-08-01 18:10:34 BST

(In reply to Luke Kenneth Casson Leighton from comment #23)
> (In reply to Jacob Lifshay from comment #21)
> 
> > Actually, for the payment stuff, marking it as resolved with dmitry as the
> > assignee and some money assigned to this bug will make it show up on the
> > markdown generated by budget-sync (in /task_db/mdwn/ on the website) next
> > time someone runs and uploads budget-sync's output.
> 
> this forces me, as the site administrator, to spend my time being the one
> to run that script and run the upload.
> 
> this is not a reasonable expectation.

True, hence why I think the script should be automatically run, say in a daily/hourly cron job. Normally this would be taken care of by CI (GitLab CI can be set to run on a time-based schedule), but we're still waiting on getting mailing lists set up: see bug #190

Comment 25 Maciej Pijanowski 2021-08-13 08:57:36 BST

Is this task considered finished? 
According to Dmitry it is (correct me if I'm wrong please).

Luke, what is the procedure then to close the task?

Comment 26 Luke Kenneth Casson Leighton 2021-08-13 12:09:07 BST

(In reply to Maciej Pijanowski from comment #25)
> Is this task considered finished? 
> According to Dmitry it is (correct me if I'm wrong please).
> 
> Luke, what is the procedure then to close the task?

see comment #20, i have a mild preference for it to be closed when
the unit tests confirm that it's correct.