996 – add shaddsw or replace shadduw with shaddsw since i32 indexes are waay more common than u32

Bug 996 - add shaddsw or replace shadduw with shaddsw since i32 indexes are waay more common than u32

Summary: add shaddsw or replace shadduw with shaddsw since i32 indexes are waay more c...

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Jacob Lifshay

URL:

Depends on:
Blocks:	1055
	Show dependency tree / graph

Reported:	2023-01-25 21:09 GMT by Jacob Lifshay
Modified:	2023-07-14 01:36 BST (History)
CC List:	4 users (show)

See Also:	966 968 1055 1091
NLnet milestone:	NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks:	2000
budget (EUR) for this task, excluding subtasks' budget:	2000
parent task for budget allocation:	1011
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	[jacob] amount = 1500 submitted = 2023-06-28 paid = 2023-07-12 [lkcl] amount = 500 submitted = 2023-06-22

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jacob Lifshay 2023-01-25 21:09:21 GMT

as mentioned in:
https://bugs.libre-soc.org/show_bug.cgi?id=966#c17

> realistically if we're adding only one of shaddsw/shadduw, we should add
> shaddsw because a C compiler can trivially use it with array[int_index],
> whereas shadduw requires the compiler to first prove the index isn't ever
> negative, which is often difficult (all the easy cases are already covered
> by ld/st's immediate or by shadd).

> those easy cases are:
> 1. the compiler can prove the index is a constant, in which case just using
> ld/st immediate is sufficient.
> 2. the compiler can promote all uses of the index variable to a 64-bit value
> throughout the code thereby avoiding needing any sign extensions (a common
> optimization compilers already do), in which case it can just use shadd.

Note that easy case #2 also covers u32 index variables that the compiler can prove don't wrap around.

Comment 1 Jacob Lifshay 2023-01-25 21:15:30 GMT

(In reply to Dmitry Selyutin from bug #966 comment #11)
> Most of the cases I saw in the wild didn't really need sign. That is, many C
> programmers just use int since they simply unaware or are not used to size_t.

the issue is that even if indexes are almost never negative, the C compiler often can't prove that so needs to generate code that is correct even with negative indexes. I'd guess that happens waay more often than having u32 indexes, hence why I think shaddsw is waay more important than shadduw.

Comment 2 Jacob Lifshay 2023-02-09 01:28:11 GMT

in the meeting today, we decided it would probably be best to add shaddsw instead of replacing shadduw. 

we should probably actually name it shaddw for consistency with the rest of PowerISA

Comment 3 Jacob Lifshay 2023-04-13 02:51:07 BST

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=c20f2d447593c3c2f7d1a14e2f053cbec3dd5bb3;hp=a39d05eea22bb1f8f6200bf64cf6c877ef58ab6c

commit c20f2d447593c3c2f7d1a14e2f053cbec3dd5bb3
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed Apr 12 18:47:30 2023 -0700

    remove TODO for adding shaddw

commit a75f1b7ede025a9a11485511b9a7c0d765a5dc2e
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed Apr 12 18:46:43 2023 -0700

    add shaddw to ls004

commit a39d05eea22bb1f8f6200bf64cf6c877ef58ab6c
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed Apr 12 18:45:49 2023 -0700

    more formatting changes

commit 5ebffff380796b7d2547ceb87c674277950b9229
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed Apr 12 18:44:16 2023 -0700

    add more keywords

commit cfec6b891de04fa1a59748c0138c51d353bf5d7a
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed Apr 12 18:34:57 2023 -0700

    spelling and formatting

commit 48c272573c0a4aca970392ac9ca8739f11bc2af3
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Wed Apr 12 18:31:28 2023 -0700

    realign table

Comment 4 Jacob Lifshay 2023-04-13 02:52:20 BST

forgot everything but ls004...e.g. adding to simulator

Comment 5 Luke Kenneth Casson Leighton 2023-04-13 03:41:04 BST

$ make ls012.pdf

python3 ls012_optable.py
Traceback (most recent call last):
  File "ls012_optable.py", line 191, in <module>
    print_table("XO cost", header, areas, by_cost_then_priority_then_page)
  File "ls012_optable.py", line 103, in print_table
    areas = sortby(areas)
  File "ls012_optable.py", line 70, in by_cost_then_priority_then_page
    res = sorted(res, key=functools.cmp_to_key(sort_by_cost_priority_page))
  File "ls012_optable.py", line 56, in sort_by_cost_priority_page
    v = sort_by_cost(p1, p2)
  File "ls012_optable.py", line 45, in sort_by_cost
    p1 = p1['cost']
KeyError: 'cost'

any idea what that's about?

Comment 6 Jacob Lifshay 2023-04-13 03:42:21 BST

(In reply to Luke Kenneth Casson Leighton from comment #5)
> $ make ls012.pdf
> 
> python3 ls012_optable.py
> Traceback (most recent call last):
>   File "ls012_optable.py", line 191, in <module>
>     print_table("XO cost", header, areas, by_cost_then_priority_then_page)
>   File "ls012_optable.py", line 103, in print_table
>     areas = sortby(areas)
>   File "ls012_optable.py", line 70, in by_cost_then_priority_then_page
>     res = sorted(res, key=functools.cmp_to_key(sort_by_cost_priority_page))
>   File "ls012_optable.py", line 56, in sort_by_cost_priority_page
>     v = sort_by_cost(p1, p2)
>   File "ls012_optable.py", line 45, in sort_by_cost
>     p1 = p1['cost']
> KeyError: 'cost'
> 
> any idea what that's about?

idk, I didn't change that, it was broken before then.

Comment 7 Luke Kenneth Casson Leighton 2023-04-13 03:49:11 BST

(In reply to Luke Kenneth Casson Leighton from comment #5)
> $ make ls012.pdf
> any idea what that's about?

found it - extra space

Comment 8 Jacob Lifshay 2023-04-18 05:30:39 BST

I added shaddw to the wiki and simulator and tests.
I allocated PO 22 XO --10101110- since that was available.

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=0432032c8e1980b9a36ae4ed8f4d546ed16b9409

commit 0432032c8e1980b9a36ae4ed8f4d546ed16b9409
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Apr 17 21:26:00 2023 -0700

    add shaddw

commit ec5f8b03e0b65484fd2cea94a1935df843ba2649
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Apr 17 21:25:40 2023 -0700

    spelling fix

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=5e7b3c2dd8b5aba121a3c1305a6a663efc322e6c

commit 5e7b3c2dd8b5aba121a3c1305a6a663efc322e6c
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Apr 17 21:26:43 2023 -0700

    add shaddw

Comment 9 Luke Kenneth Casson Leighton 2023-04-18 08:56:08 BST

(In reply to Jacob Lifshay from comment #8)
> I added shaddw to the wiki and simulator and tests.
> I allocated PO 22 XO --10101110- since that was available.

the table is here
https://libre-soc.org/openpower/sv/bitmanip/

NN	RT	RA	RB	sm0	sm1 1	0101 110	Rc	shaddw	X-Form

designing that table was extremely complex so please don't ignore
it or use a different allocation.

the table page included there is the canonical allocation table
for anything in EXT022. please do ensure that you remember that
at all times when adding instructions.

Comment 10 Luke Kenneth Casson Leighton 2023-04-18 08:58:46 BST

(In reply to Luke Kenneth Casson Leighton from comment #9)

> the table page included there is the canonical allocation table
> for anything in EXT022. please do ensure that you remember that
> at all times when adding instructions.

okaay you did.   yyyeah it looks good. wondering if it should
have been matched up with itype.

diff --git a/openpower/sv/draft_opcode_tables.mdwn b/openpower/sv/draft_opcode_tables.mdwn
index a1cdbd677f85f2816e1ff7908bd6069831bf8cd0..527962620e3dbb404230d7d49d47347d01287ec2 100644 (file)
--- a/openpower/sv/draft_opcode_tables.mdwn
+++ b/openpower/sv/draft_opcode_tables.mdwn
@@ -100,7 +100,7 @@ the [[sv/av_opcodes]])
 | NN |    |     |     |    |   00  | 0101 110 |0 | crfbinlog | {TODO}  |
 | NN |    |     |     |    |   00  | 0101 110 |1 | rsvd      |         |
 | NN |    |     |     |    |   10  | 0101 110 |Rc| rsvd      |         |
-| NN |    |     |     |    |   -1  | 0101 110 |Rc| rsvd      |         |
+| NN | RT | RA  | RB  | sm0| sm1 1 | 0101 110 |Rc| shaddw    | X-Form  |
 | NN | RT | RA  | RB  | 0  | itype | 1001 110 |Rc| av minmax | X-Form  |
 | NN | RT | RA  | RB  | 1  |   00  | 1001 110 |Rc| av abss   | X-Form  |
 | NN | RT | RA  | RB  | 1  |   01  | 1001 110 |Rc| av absu   | X-Form  |

Comment 11 Jacob Lifshay 2023-04-18 09:10:50 BST

(In reply to Luke Kenneth Casson Leighton from comment #10)
> wondering if it should
> have been matched up with itype.

I matched it to shadd[uw]:
| NN | RT | RA  | RB  | sm0| sm1 1 | 0101 110 |Rc| shaddw    | X-Form  |
| NN | RT | RA  | RB  | sm0| sm1 0 | 1101 110 |Rc| shadd     | X-Form  |
| NN | RT | RA  | RB  | sm0| sm1 1 | 1101 110 |Rc| shadduw   | X-Form  |

Comment 12 Luke Kenneth Casson Leighton 2023-04-18 09:28:47 BST

hum hum should add+1 be included?

(or, more like 1<<sm)

Comment 13 Luke Kenneth Casson Leighton 2023-04-18 09:39:49 BST

(In reply to Jacob Lifshay from comment #11)
> (In reply to Luke Kenneth Casson Leighton from comment #10)
> > wondering if it should
> > have been matched up with itype.
> 
> I matched it to shadd[uw]:
> | NN | RT | RA  | RB  | sm0| sm1 1 | 0101 110 |Rc| shaddw    | X-Form  |
> | NN | RT | RA  | RB  | sm0| sm1 0 | 1101 110 |Rc| shadd     | X-Form  |
> | NN | RT | RA  | RB  | sm0| sm1 1 | 1101 110 |Rc| shadduw   | X-Form  |

yyeh... was wondering if the whole group (sm) should have been
shifted along.

click.

this is the sandbox.

not actual allocation.  ignore me :)
IBM's job to allocate.

Comment 14 Jacob Lifshay 2023-04-18 09:45:31 BST

(In reply to Luke Kenneth Casson Leighton from comment #12)
> hum hum should add+1 be included?
> 
> (or, more like 1<<sm)

well, if we're doing that, just do:
RT = (RA << sh) + RB + imm

or similar...this is what x86 lea supports (except with up to full 32-bit immediate instead of a tiny immediate)

maybe 3-bit imm?

immediate not shifted because then sh is log2(struct_size) and imm is field offset for struct S[]

e.g.:
struct S {
    uint8_t a;
    uint16_t b;
}

uint16_t *f(struct S *p, int i) {
    return &p[i].b;
}

asm:
f:
    shaddiw r3, r3, r4, 3, 2  # r3 = r3 + (r4 << 3) + 2
    blr