Bug 966 - create shift-and-add instruction
Summary: create shift-and-add instruction
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Dmitry Selyutin
URL: https://libre-soc.org/openpower/sv/bi...
Depends on:
Blocks: 967
  Show dependency treegraph
 
Reported: 2022-10-23 12:00 BST by Luke Kenneth Casson Leighton
Modified: 2023-05-06 21:19 BST (History)
2 users (show)

See Also:
NLnet milestone: NLnet.2021.02A.052.CryptoRouter
total budget (EUR) for completion of task and all subtasks: 750
budget (EUR) for this task, excluding subtasks' budget: 750
parent task for budget allocation: 771
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
ghostmansd={amount=600, submitted=2022-12-10, paid=2022-12-30} lkcl={amount=150,submitted=2023-03-25,paid=2023-04-26}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2022-10-23 12:00:26 BST
a shift-and-add instruction is needed, useful for twofish/serpent(?)
as well as single-instruction for LD-ST-address-calculation-with-a-shift

* TODO: add shaddsw or replace shadduw with shaddsw
Comment 2 Dmitry Selyutin 2022-10-25 20:35:57 BST
I'm too lazy to raise a new issue, so I'll support these in binutils in scope of this task.
Comment 4 Jacob Lifshay 2022-10-27 09:19:07 BST
I'll note the pseudocode is wrong, it should shift left and add, instead it replaces the lower bits with zero then adds.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/bitmanip.mdwn;h=89515ec977385f3ffd7f91b253f795c7d55d69c8;hb=HEAD#l115

 115         case (0): sum[0:XLEN-1] = (n[0:XLEN-1-1] || [0]*1) + (RA)
 116         case (1): sum[0:XLEN-1] = (n[0:XLEN-2-1] || [0]*2) + (RA)
 117         case (2): sum[0:XLEN-1] = (n[0:XLEN-3-1] || [0]*3) + (RA)
 118         default:  sum[0:XLEN-1] = (n[0:XLEN-4-1] || [0]*4) + (RA)

that should be:
case (0): sum[0:XLEN-1] = (n[1:XLEN-1] || [0]*1) + (RA)
and likewise for all other cases.
Comment 5 Luke Kenneth Casson Leighton 2022-10-27 12:16:38 BST
(In reply to Jacob Lifshay from comment #4)
> I'll note the pseudocode is wrong, it should shift left and add, instead it
> replaces the lower bits with zero then adds.

ah well spotted.

> that should be:
> case (0): sum[0:XLEN-1] = (n[1:XLEN-1] || [0]*1) + (RA)
> and likewise for all other cases.

i just realised, it's perfectly fine to do this:

    n <- (RB)
    m <- sm + 1
    RT <- (n[m:XLEN-1] || [0]*m) + (RA)

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=263a8fca0f7413e62cf74a0e559e8965b0951e6c
Comment 6 Luke Kenneth Casson Leighton 2022-11-01 11:58:27 GMT
i've reverted the use of dynamic shift, instead it is an explicit
switch/case again.  ghostmansd, next time if you encounter difficulties
just take the easiest route.
Comment 7 Dmitry Selyutin 2022-11-01 19:36:13 GMT
Reverted back. Please don't revert the code unless really necessary. Peeking the easiest route after you already see the end of the difficult but correct route doesn't seem to be the best option. :-)
Details at https://bugs.libre-soc.org/show_bug.cgi?id=967#c1.
Comment 8 Jacob Lifshay 2022-11-01 23:40:23 GMT
In the meeting today I pointed out that we probably also want shaddsw since C likes to use int for array indexing.

shaddsw:
RT = RA + ((uint64_t)(int32_t)RB << (sh + 1))
Comment 9 Luke Kenneth Casson Leighton 2022-11-02 00:06:43 GMT
(In reply to Jacob Lifshay from comment #8)
> In the meeting today I pointed out that we probably also want shaddsw since
> C likes to use int for array indexing.
> 
> shaddsw:
> RT = RA + ((uint64_t)(int32_t)RB << (sh + 1))

mmm.... if added as "L", a 5th operand (shaddw RT, RA, RB, sh, L)
it reduces the number of new instructions added.

   if L = 1 then      n <- EXTS(RB[32:63)
   else               n <- [0]*32 || RB[32:63)

pseudoaliases will need to be created as well.
and yet another encoding Form. sigh.
Comment 10 Luke Kenneth Casson Leighton 2022-11-02 16:36:45 GMT
(In reply to Jacob Lifshay from comment #8)
> In the meeting today I pointed out that we probably also want shaddsw since
> C likes to use int for array indexing.
> 
> shaddsw:
> RT = RA + ((uint64_t)(int32_t)RB << (sh + 1))

the number of instructions being of some concern, this can be done
with extsw+shadd.  is there any compelling reason for an explicit
32bit shift-signed-add?  how much usage will it get in actual
compiled code?
Comment 11 Dmitry Selyutin 2022-11-02 16:56:29 GMT
Most of the cases I saw in the wild didn't really need sign. That is, many C programmers just use int since they simply unaware or are not used to size_t.
Comment 12 Dmitry Selyutin 2022-11-02 16:58:38 GMT
That said, adding a new form can be done. It will need more time, but for sure is doable. I just need some rationale that this is really required. With new form, we can have 1 instruction.
Comment 13 Dmitry Selyutin 2022-11-06 11:04:40 GMT
Is there any agreement and conclusion about this? If necessary I can create a patch which supports a new form to support shaddsw/shadduw simultaneously; this will need some tweaks and changes, so this task will take more time. Otherwise this task is done.
Comment 14 Luke Kenneth Casson Leighton 2022-11-06 11:07:20 GMT
(In reply to Dmitry Selyutin from comment #13)
> Is there any agreement and conclusion about this? If necessary I can create
> a patch which supports a new form to support shaddsw/shadduw simultaneously;

no. no justification. extsw is sufficient. we have over 100
instructions to add already

> this will need some tweaks and changes, so this task will take more time.
> Otherwise this task is done.

agreed.
Comment 15 Dmitry Selyutin 2022-11-06 11:09:27 GMT
Luke, this needs the budget, as well as #968 and #967.
Comment 16 Jacob Lifshay 2022-11-06 11:45:22 GMT
(In reply to Luke Kenneth Casson Leighton from comment #14)
> (In reply to Dmitry Selyutin from comment #13)
> > Is there any agreement and conclusion about this? If necessary I can create
> > a patch which supports a new form to support shaddsw/shadduw simultaneously;
> 
> no. no justification. extsw is sufficient. we have over 100
> instructions to add already

realistically if we're adding only one of shaddsw/shadduw, we should add shaddsw because a C compiler can trivially use it with array[int_index], whereas shadduw requires the compiler to first prove the index isn't ever negative, which is often difficult (all the easy cases are already covered by ld/st's immediate or by shadd). u32 indexes are very uncommon -- basically everything uses a constant, size_t/ptrdiff_t, or int.
Comment 17 Jacob Lifshay 2022-11-06 11:53:26 GMT
(In reply to Jacob Lifshay from comment #16)
> realistically if we're adding only one of shaddsw/shadduw, we should add
> shaddsw because a C compiler can trivially use it with array[int_index],
> whereas shadduw requires the compiler to first prove the index isn't ever
> negative, which is often difficult (all the easy cases are already covered
> by ld/st's immediate or by shadd).

those easy cases are:
1. the compiler can prove the index is a constant, in which case just using ld/st immediate is sufficient.
2. the compiler can promote all uses of the index variable to a 64-bit value throughout the code thereby avoiding needing any sign extensions (a common optimization compilers already do), in which case it can just use shadd.
Comment 18 Jacob Lifshay 2023-01-25 20:44:33 GMT
comments #16 and #17 haven't been addressed, so i think it's premature to mark this bug as fixed.
Comment 19 Dmitry Selyutin 2023-01-25 20:52:05 GMT
Please refer to comment #14. It's explicitly stated by PL it's done long ago. If you need to add something else, please raise the relevant task.
Comment 20 Jacob Lifshay 2023-01-25 21:16:12 GMT
(In reply to Dmitry Selyutin from comment #19)
> Please refer to comment #14. It's explicitly stated by PL it's done long
> ago. If you need to add something else, please raise the relevant task.

ok, reported https://bugs.libre-soc.org/show_bug.cgi?id=996