Bug 968 - document shift-and-add instruction
Summary: document shift-and-add instruction
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Andrey Miroshnikov
URL: https://libre-soc.org/openpower/sv/bi...
Depends on:
Blocks:
 
Reported: 2022-10-23 12:06 BST by Luke Kenneth Casson Leighton
Modified: 2023-05-27 12:45 BST (History)
5 users (show)

See Also:
NLnet milestone: NLnet.2021.02A.052.CryptoRouter
total budget (EUR) for completion of task and all subtasks: 900
budget (EUR) for this task, excluding subtasks' budget: 900
parent task for budget allocation: 776
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
octavius={amount=450} lkcl={amount=450,submitted=2023-03-25,paid=2023-04-26}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2022-10-23 12:06:27 BST
create/update spec and documentation page for shift-and-add instruction
Comment 1 Luke Kenneth Casson Leighton 2022-10-23 12:07:49 BST
andrey do you also want to have a go at creating ls004 for this one?
Comment 2 Luke Kenneth Casson Leighton 2022-10-23 12:14:04 BST
part of the justification for this instruction does come from twofish
(as well as LD-ST-address-calculate-with-a-shift)

#define ENCRYPT_RND( A,B,C,D, T0, T1, xkey, r ) \
    T0 = g0(A,xkey); T1 = g1(B,xkey);\
    C ^= T0+T1+xkey->K[8+2*(r)]; C = ROR32(C,1);\
    D = ROL32(D,1); D ^= T0+2*T1+xkey->K[8+2*(r)+1]   

and there is additional shifting occuring in creation of a q-table:

        ae = i>>4; be = i&0xf;   
        ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8);   
        ae = t[0][ao]; be = t[1][bo];   
        ao = ae ^ be; bo = ae ^ ROR4BY1(be) ^ ((ae<<3)&8);   
        ae = t[2][ao]; be = t[3][bo];
Comment 3 Andrey Miroshnikov 2022-10-23 17:48:21 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> andrey do you also want to have a go at creating ls004 for this one?

Yes, will look into this. Been busy sorting out VAT returns this weekend.
Comment 4 Andrey Miroshnikov 2022-10-24 13:28:40 BST
Drafted the pseudo-code for shadd and shadduw, please give it a check:
https://libre-soc.org/openpower/sv/bitmanip/#shift-add
Comment 5 Dmitry Selyutin 2022-10-25 21:12:55 BST
What's left in the scope of this task? Should I update the code in the docs with the stuff we have in markdown?
Comment 6 Luke Kenneth Casson Leighton 2022-10-25 21:44:53 BST
(In reply to Dmitry Selyutin from comment #5)
> What's left in the scope of this task? Should I update the code in the docs
> with the stuff we have in markdown?

make sure the description is accurate, pseudocode in URL accurate,
any examples that need to go in ls004 RFC, links in wiki markdown
are there, cross-referencing.  a lot of small details.
Comment 7 Andrey Miroshnikov 2022-10-31 19:14:59 GMT
(In reply to Luke Kenneth Casson Leighton from comment #1)
> andrey do you also want to have a go at creating ls004 for this one?

(In reply to Luke Kenneth Casson Leighton from comment #6)
> any examples that need to go in ls004 RFC, links in wiki markdown
> are there, cross-referencing.  a lot of small details.

Added initial draft for ls004, containing what I know:

https://libre-soc.org/openpower/sv/rfc/ls004/
Comment 8 Luke Kenneth Casson Leighton 2022-10-31 22:58:49 GMT
(In reply to Andrey Miroshnikov from comment #7)
> (In reply to Luke Kenneth Casson Leighton from comment #1)
> > andrey do you also want to have a go at creating ls004 for this one?
> 
> (In reply to Luke Kenneth Casson Leighton from comment #6)
> > any examples that need to go in ls004 RFC, links in wiki markdown
> > are there, cross-referencing.  a lot of small details.
> 
> Added initial draft for ls004, containing what I know:
> 
> https://libre-soc.org/openpower/sv/rfc/ls004/

the word descriptioj makes no sense at all

    `shift` is determined by the 2-bit bitfield `sm`+1.
    The minimum shift as 1, maximum 4.
    The result is shifted (RB) + (RA), and is stored in register RT.

these are not "instructions", noone could follow them step by step.
try reading them, and following them.  do you know, looking at
nothing else, what will happen?

now try this:

   wben sm is zero, the contents of register RB are multiplied by two,
   added to the contents of register RA, and the result stored in RT.

this is obvious instructions which can be followed, yes?

shadduw is "unsigned word" btw
Comment 9 Andrey Miroshnikov 2022-11-01 14:58:48 GMT
(In reply to Luke Kenneth Casson Leighton from comment #8)
> the word descriptioj makes no sense at all

Yes, true.

> now try this:
> 
>    wben sm is zero, the contents of register RB are multiplied by two,
>    added to the contents of register RA, and the result stored in RT.

Changed the descriptions:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=2ee653d030e00ffd1e1a93644088e6f2fee59650

> shadduw is "unsigned word" btw

This is not obvious to me (I'm probably not understanding). Not sure what the purpose of this instruction is.

The pseudocode of shadd doesn't indicate that it is **signed**, therefore why have shadduw?

Second line of shadduw pseudocode:
n <- (RB)[XLEN/2:XLEN-1]  # btw we agreed on not using XLEN in RFCs for now?

With XLEN=64, this would be (RB)[32:63], or the lower half (MSB ordering?).
Why ignore bits (RB)[0:31]?
Comment 10 Luke Kenneth Casson Leighton 2022-11-01 18:27:22 GMT
(In reply to Andrey Miroshnikov from comment #9)

> The pseudocode of shadd doesn't indicate that it is **signed**, 

64-bit on both operands, it makes no difference whatsoever.

> therefore why have shadduw?

compare:

   ADD( RA, SIGNEXTEND( LOWERHALF(RB) )

with:

   ADD( RA, ZEROEXTEND( LOWERHALF(RB) )

then set RB=0x0000_0000_8000_0000 and compare the two.

> Second line of shadduw pseudocode:
> n <- (RB)[XLEN/2:XLEN-1]  # btw we agreed on not using XLEN in RFCs for now?

in RFCs markdown pseudocode yes
in markdown pseudocode used by ISACaller *no*.

> With XLEN=64, this would be (RB)[32:63], or the lower half (MSB ordering?).

yep.

> Why ignore bits (RB)[0:31]?

because it is more compact when doing address offsets.  takes up less
space.  RA base 64 bit, offsets into data structure need only be 32bit.
Comment 11 Andrey Miroshnikov 2022-11-01 18:50:38 GMT
(In reply to Luke Kenneth Casson Leighton from comment #10)
> 64-bit on both operands, it makes no difference whatsoever.
Ah ok, although the operands are unsigned, the sign extension allows to do address offsetting.

> 
> compare:
> 
>    ADD( RA, SIGNEXTEND( LOWERHALF(RB) )
> 
> with:
> 
>    ADD( RA, ZEROEXTEND( LOWERHALF(RB) )
> 
> then set RB=0x0000_0000_8000_0000 and compare the two.

Thanks, that info was missing from bitmanip.mdwn.

Where are these statements located?
grep'ing for those I didn't find anything in openpower-isa repo.

> in RFCs markdown pseudocode yes
> in markdown pseudocode used by ISACaller *no*.

Thanks for clearing that up.

> because it is more compact when doing address offsets.  takes up less
> space.  RA base 64 bit, offsets into data structure need only be 32bit.

Does that mean shadduw pseudocode is missing a EXTS() statement?
There is no explicit sign-extension in the pseudocode.

Adjusted based on your explanation:
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=95a74ba3566333ac498bb1110781e563f86b4c06
Comment 12 Dmitry Selyutin 2022-11-01 19:30:58 GMT
Regarding the pseudocode:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=0389a26f6d725335ee1e3d920dc9e87861bff1d6

Otherwise it'd overflow.
Comment 13 Dmitry Selyutin 2022-11-01 19:33:54 GMT
Andrey, it seems that you handle this task, right? I fixed the pseudocode on the openpower-isa side, and you update the docs in librerisv repository.

It seems that I no longer have the write access to libreriscv repository; I vaguely recall I had it before, what happened?
Comment 14 Luke Kenneth Casson Leighton 2022-11-01 19:53:54 GMT
(In reply to Andrey Miroshnikov from comment #11)
> (In reply to Luke Kenneth Casson Leighton from comment #10)
> > 64-bit on both operands, it makes no difference whatsoever.
> Ah ok, although the operands are unsigned, the sign extension allows to do
> address offsetting.
> 
> > 
> > compare:
> > 
> >    ADD( RA, SIGNEXTEND( LOWERHALF(RB) )
> > 
> > with:
> > 
> >    ADD( RA, ZEROEXTEND( LOWERHALF(RB) )
> > 
> > then set RB=0x0000_0000_8000_0000 and compare the two.
> 
> Thanks, that info was missing from bitmanip.mdwn.

yes. you don't put massive explanations into pseudocode.
it goes into notes (here) or a discussion page.

> Where are these statements located?

they are not. i wrote them for you, just now.

> > because it is more compact when doing address offsets.  takes up less
> > space.  RA base 64 bit, offsets into data structure need only be 32bit.
> 
> Does that mean shadduw pseudocode is missing a EXTS() statement?

no it does not.

> There is no explicit sign-extension in the pseudocode.

correct.  if the desired behaviour was to perform a signed 32bit add
then not only would EXTS be used, the instruction would be called "shaddsw"
(or sigh shaddws)

> Adjusted based on your explanation:
> https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;
> h=95a74ba3566333ac498bb1110781e563f86b4c06

read again.  i said *no* on changing the pseudocode markdown for
ISACaller.  why did you then change the pseudocode markdown used
by ISACaller?

please revert

@@ -122,17 +124,21 @@ shadd r4, r1, r2, 3
 Pseudocode:
 
     shift <- sm + 1                                    # Shift is between 1-4
-    n <- (RB)[XLEN/2:XLEN-1]               # Limit RB to upper word (32-bits)
+    n <- (RB)[32:63]                           # Only use lower 32-bits of RB
     sum[0:63] <- (n << shift) + (RA)    # Shift n, add RA
     RT <- sum                                      # Result stored in RT
Comment 15 Luke Kenneth Casson Leighton 2022-11-01 19:58:48 GMT
(In reply to Dmitry Selyutin from comment #12)
> Regarding the pseudocode:
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=0389a26f6d725335ee1e3d920dc9e87861bff1d6
> 
> Otherwise it'd overflow.

jacob pointed out that using a switch/case is better for
static compilation (to c) which we will be doing shortly.

(In reply to Dmitry Selyutin from comment #13)

> It seems that I no longer have the write access to libreriscv repository; I
> vaguely recall I had it before, what happened?

jacob made unauthorised modifications for about the 3rd time, not following
ptoject procedures, and i had to perform a force-master push (again). jacob
was directly responsible for communicating with everyone to ensure that they
had updated, but obviously missed you.  i only re-add people once i know
that no overwrite damage can occur.  please do check that you are at latest
master commit.
Comment 16 Luke Kenneth Casson Leighton 2022-11-01 20:01:37 GMT
(In reply to Luke Kenneth Casson Leighton from comment #14)

> @@ -122,17 +124,21 @@ shadd r4, r1, r2, 3
>  Pseudocode:
>  
>      shift <- sm + 1                                    # Shift is between
> 1-4
> -    n <- (RB)[XLEN/2:XLEN-1]               # Limit RB to upper word
> (32-bits)
> +    n <- (RB)[32:63]                           # Only use lower 32-bits of
> RB
>      sum[0:63] <- (n << shift) + (RA)    # Shift n, add RA
>      RT <- sum                                      # Result stored in RT

no, sorry, sorry, you're right: that's ls004.mdwn, not
openpower-isa bitmanip.mdwn.

distracted, hard to pay attention from an apartment.
Comment 17 Dmitry Selyutin 2022-11-01 20:06:45 GMT
(In reply to Luke Kenneth Casson Leighton from comment #15)
> (In reply to Dmitry Selyutin from comment #12)
> > Otherwise it'd overflow.
> 
> jacob pointed out that using a switch/case is better for
> static compilation (to c) which we will be doing shortly.

Any static compilation, if it's so complicated that it handles switch, should likely be able to deal with constants, shouldn't it? This switch/case is so ugly... so ugly that the only reason to keep it would be the way our integers overflow. :-) Anyway, this works now, I'm just somewhat unsatisfied that the code is explicit about widths, since this seems to be an implementation detail.

> i only re-add people once i know
> that no overwrite damage can occur.  please do check that you are at latest
> master commit.

I am at latest commit, but I don't already have to push anything, since Andrey handles this task, if I got it correctly. :-)
Comment 18 Andrey Miroshnikov 2022-11-01 20:19:43 GMT
(In reply to Luke Kenneth Casson Leighton from comment #14)
> no it does not.
> 
> > There is no explicit sign-extension in the pseudocode.
> 
> correct.  if the desired behaviour was to perform a signed 32bit add
> then not only would EXTS be used, the instruction would be called "shaddsw"
> (or sigh shaddws)

Then why did you give me an example with SIGNEXTEND and ZEROEXTEND, if they are not relevant?
A lot of my confusion comes from the naming of these two instructions.

Is there a good reason as to why shadduw uses "uw" (Unsigned Word)?

Since both shadd and shadduw are unsigned, you're putting emphasis on the shadduw being unsigned.
Of course shadduw also deals with a "word" so unsigned is probably necessary here.

While shadd does not imply the operation deals with unsigned or signed integers.

(In reply to Dmitry Selyutin from comment #17)
> Andrey handles this task, if I got it correctly. :-)

Doing ls004, though I did add the initial pseudocode as well.
If yours is working, where can I find it?
I guess you just stored "sm" first "shift <- sm" to ensure there wouldn't be overflow problems?

(Our IRC chat on the overflow problem: https://libre-soc.org/irclog/latest.log.html#t2022-11-01T19:36:54)
Comment 19 Luke Kenneth Casson Leighton 2022-11-01 20:34:16 GMT
(In reply to Dmitry Selyutin from comment #17)

> Any static compilation, if it's so complicated that it handles switch,
> should likely be able to deal with constants, shouldn't it?

a pattern-match would be needed "if variable used then output a shift
mask using a variable else use a constant", which would likely need a
2-pass compiler.

> This switch/case
> is so ugly... 

i know, but it is braindead simple to compile to c, just output the
equivalent c switch/case
Comment 20 Luke Kenneth Casson Leighton 2022-11-01 20:42:51 GMT
(In reply to Andrey Miroshnikov from comment #18)
> (In reply to Luke Kenneth Casson Leighton from comment #14)
> > no it does not.
> > 
> > > There is no explicit sign-extension in the pseudocode.
> > 
> > correct.  if the desired behaviour was to perform a signed 32bit add
> > then not only would EXTS be used, the instruction would be called "shaddsw"
> > (or sigh shaddws)
> 
> Then why did you give me an example with SIGNEXTEND and ZEROEXTEND, if they
> are not relevant?

because the question you asked was, "why is it not sign-extended".
the answer to that involves creating two sets of pseudocode, one
sign-extending one not-sign-extending, then putting in an example
value and noticing that the sign-extending one gives the WRONG result.

sign-extending:

   RB=0x0000_0000_8000_000
   SIGNEXTEND(RB) -->
   0xFFFF_FFFF_8000_0000

this is the WRONG thing to do.  shadduw is SPECIFICALLY DESIGNED
to NOT perform this signextension on RB.


> A lot of my confusion comes from the naming of these two instructions.
> 
> Is there a good reason as to why shadduw uses "uw" (Unsigned Word)?

because it explicitly makes clear that RB is **NOT** to be sign-extended
before the addition.
 
> Since both shadd and shadduw are unsigned, you're putting emphasis on the
> shadduw being unsigned.
> Of course shadduw also deals with a "word" so unsigned is probably necessary
> here.

yes.

> While shadd does not imply the operation deals with unsigned or signed
> integers.

yes. because when both RA and RB are 64-bit, there *is* no difference.

sign-extension only matters when one of the two numbers is a DIFFERENT
bitwidth.
Comment 21 Jacob Lifshay 2022-11-01 22:04:14 GMT
(In reply to Luke Kenneth Casson Leighton from comment #15)
> (In reply to Dmitry Selyutin from comment #13)
> 
> > It seems that I no longer have the write access to libreriscv repository; I
> > vaguely recall I had it before, what happened?
> 
> jacob made unauthorised modifications for about the 3rd time, not following
> ptoject procedures, and i had to perform a force-master push (again). jacob
> was directly responsible for communicating with everyone to ensure that they
> had updated, but obviously missed you.

afaict i sent an email to the mailing list asking everyone to get back to me ensuring their local repo was in good shape (this was about 6mo ago iirc), since you never responded, I never asked luke to re-add write permissions.

>  i only re-add people once i know
> that no overwrite damage can occur.  please do check that you are at latest
> master commit.
Comment 22 Dmitry Selyutin 2022-11-06 11:07:32 GMT
Assigned to Andrey. I only handled the pseudocode; not sure if this is related to documentation.
Comment 23 Paul Mackerras 2023-05-25 07:01:10 BST
Hopefully this is the right place to submit feedback about ls004; if not then Luke could add a pointer to the right place to the OPF gitea issue.

Comments from IBM architects regarding the proposal:

The ls004 instructions use "sm" as the name of the shift-amount field. sm is not a split field, so the field name should be capitalized. (I'm not sure what "sm" stands for. "shift minus (1)"? If yes, SHM1 might be a better name. (ls004 doesn't include a definition of the field for Section 1.6.3.))

An additional comment is that the architecture uses "s" to stand for "Shift" in Shift instruction mnemonics, not "sh" as in ls004. (ls003 uses "s".)