1047 – SVP64 LD/ST Data-Dependent Fail-First providing linked-list walking

Bug 1047 - SVP64 LD/ST Data-Dependent Fail-First providing linked-list walking

Summary: SVP64 LD/ST Data-Dependent Fail-First providing linked-list walking

Status:	IN_PROGRESS

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	High enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:	1080
Blocks:	1045 1056
	Show dependency tree / graph

Reported:	2023-04-02 23:06 BST by Luke Kenneth Casson Leighton
Modified:	2023-11-20 20:11 GMT (History)
CC List:	4 users (show)

See Also:	1003 1056 1083 1150
NLnet milestone:	NLnet.2022-08-107.ongoing
total budget (EUR) for completion of task and all subtasks:	3000
budget (EUR) for this task, excluding subtasks' budget:	3000
parent task for budget allocation:	1027
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl=3000

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-04-02 23:06:15 BST

it turns out that Vectorised linked-list walking is possible
including NULL-termination checkking, which is such a high
priority that it is worth adapting the specification at this
late stage.

binutils will also need updating as will Store operations.

TODO list:

* DONE: update power_insn.py
* DONE: write test_pysvp64dis.py unit test
* DONE: update power_decoder2.py
* DONE: implement ISACaller DD-FFirst LD/ST
* DONE: write simple test_caller_svp64_ldst.py DD-FF unit test
* TODO: write simple linked-list unit test
   - DONE LD-Immediate
   - TODO LD-Indexed including re-running sv_analysis after
          re-classifying appropriate instructions as EXTRA322
* TODO: update EXTRA area specification to allow RT RA RB EXTRA2/3
* TODO: upate binutils (separate task advised)

Comment 1 Luke Kenneth Casson Leighton 2023-04-04 15:30:29 BST

commit f56024fb535ffb81958a1624d7f57e62047848f0 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Apr 4 15:26:49 2023 +0100

    https://bugs.libre-soc.org/show_bug.cgi?id=1047
    start sorting out power_insn.py to conform to new LD/ST spec.
    Data-Dependent Fail-First gets top priority, pred-result is dropped,
    saturation removed from LDST-IDX leaving space for "els" to be added
    with its own bit

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f56024fb535ffb81958a1624d7f57e62047848f0

Comment 2 Luke Kenneth Casson Leighton 2023-04-04 16:10:34 BST

commit 8d3e5f183002f7327c4badd45ab12414297d832c (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Apr 4 16:10:02 2023 +0100

    add quick test_pysvp64dis.py of LD/ST data-dependent fail-first

Comment 3 Luke Kenneth Casson Leighton 2023-04-05 00:32:19 BST

from arm-sve-ieee-2017

for (p = &head; p != NULL; ) {
  for (i = 0; p != NULL && i < VL/64; p = p->next)
    p’[i++] = p;// collect up to VL/64 pointers 
  for (j = 0; j < i; j++)
    res ˆ= p’[j]->val; // gather from pointer vector
}

the first loop is literally DD-FFirst with an immediate of
offsetof(p->next) and VLi=false, ff=RC1 (fail on equal to zero)
which will truncate VL to exclude the NULL.

    sv.ld/ff=RC1 *1, 8(*0)

the use of RT=RA+1 creates the dependency-chain.  sv.bc with CTR
mode can be used, terminating if there was truncation. which needs
detecting (the NULL). hmmm...

Comment 4 Luke Kenneth Casson Leighton 2023-04-06 13:27:37 BST

commit 59f1579956f07efe2611f6892a923764a0778ab2 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Apr 6 13:26:20 2023 +0100

    add power_decode_svp64_rm.py capability for new LD/ST format
    https://bugs.libre-soc.org/show_bug.cgi?id=1047

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=59f1579956f07efe2611f6892a923764a0778ab2

Comment 5 Luke Kenneth Casson Leighton 2023-05-08 23:44:30 BST

>     sv.ld/ff=RC1 *1, 8(*0)
> 
> the use of RT=RA+1 creates the dependency-chain.

... but relies on ldst-immediate.  if the data structure is large
enough LDST-indexed has to be used but it is EXTRA2.  a solution
is being discussed in bug #1080 to make room for 2 extra bits that
can be added to RT and RA to make them EXTRA3.

Comment 6 Luke Kenneth Casson Leighton 2023-05-15 12:55:56 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=003959b7e760ed13fe5d58015fd63aa117cb9066

unit tests demoing LD-Immediate and with update working as expected.

Comment 7 Luke Kenneth Casson Leighton 2023-05-20 11:09:07 BST

see https://bugs.libre-soc.org/show_bug.cgi?id=1083#c5

saturation no longer works if bits 6-7 are unable to provide
separate src/dest override.

Comment 8 Luke Kenneth Casson Leighton 2023-05-20 11:39:39 BST

wrong:

 448         elif 'st' in insn_name and 'x' in insn_name:  # stwux
 449             res['Etype'] = 'EXTRA3'  # RM EXTRA2 type
 450             # RS: Rdest2_EXTRA2, RA: Rsrc1_EXTRA2 / Rdest
 451             res['0'] = "%s;s:RA;d:RA" % (sRS)
 452             res['1'] = 's:RB'  # RB: Rsrc2_EXTRA2

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD#l447

that is a bug, it actually *should* be EXTRA3 but where 2 of the bits
for RB and RS are taken from RM[6:7]

Comment 9 Luke Kenneth Casson Leighton 2023-05-22 14:11:07 BST

the next phase is to now modify the specification for the EXTRA
area, creating a new type - an EXTRA32 type is probably the
most sensible - which is of the form:

* EXTRA3 for RT/RS
* EXTRA3 for RA
* EXTRA2 for RB

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/LDSTRM-2P-2S1D.csv;hb=HEAD

  36 lfdux,LDST_IDX,,2P,EXTRA2,EN,d:FRT,s:RA;d:RA,s:RB,
  37 stdux,LDST_IDX,,2P,EXTRA3,EN,s:RS;s:RA;d:RA,s:RB

the bits have to come from:

* EXTRA3 for RT/RS [6,10,11]
* EXTRA3 for RA    [7,12,13]
* EXTRA2 for RB    [14,15]

although actually it may be better to do:

* EXTRA3 for RT/RS [10,11,12]
* EXTRA3 for RA    [13,14,15]
* EXTRA2 for RB    [16,17]
* MASK_SRC         [6,7,18]

and call this "2PM"

Comment 10 Luke Kenneth Casson Leighton 2023-08-30 13:00:17 BST

i already started on this and the budgets are correspondingly not balanced 
given that i have already done the majority of the work already
(see TODO list, 80% completed).

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/extra322_ldst_idx

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=1197ecb6b5110b180ef28c267982907acb1797c2

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=070adb7a11f3721302149946f5f13c95a4c23613

again this is another task that i allocated a budget for based on
one person (me) doing it. i am not sure what to suggest here as it
has been too long since i looked at this but it should be straightforward
to examine how the LD/ST-Immediate unit tests work and get the
corresponding LD/ST-Indexed ones running, at which point doubly-linked
lists can easily be done as well (double-chasing by two-overlap
LD/ST-Update).

it is much more straightforward than the Vector-Immediate task,
and examination of the commit-diffs will show clearly what was done.

Comment 11 Luke Kenneth Casson Leighton 2023-08-30 20:36:14 BST

    vantosh=1500
    markos=1500

reminder as in comment #10 that i have *already done* 80% of this
task, and that there is an extra322_ldst_idx branch with the
progress on the remaining 20%.

budget from bug #1035 *may* potentially be reassignable to here
in order to get a good balance for the learning-curve to get
this done (including writing up the spec which involves adding
a new 322 format if i had not already added it), as well as ensuring
that i get paid for the LD/ST-Immediate DDFFirst work and linked-list
unit tests etc.

Comment 12 Luke Kenneth Casson Leighton 2023-09-04 00:22:38 BST

having done 80% of this task i am allocating a respective budget for that.
the rest is a reasonably-easy task with sufficient walkthrough guidance

vantosh=500
markos=500
lkcl=2000

Comment 13 Konstantinos Margaritis (markos) 2023-09-04 12:17:44 BST

I am sorry but if it's 80% done, there is little point in myself investing time and completing the rest of the 20%. It is not about the completion ratio, but about the effort involved. If the last 20% is going to take me -and for that matter Toshaan as well- 2 months to understand and implement and test, it is not unreasonable to expect to get paid accordingly. I am not interested in doing that for 500EUR so I am removing myself from the list. If it is not going to take 2 months, then it would really be simpler to just go ahead and finish it yourself.

Comment 14 Luke Kenneth Casson Leighton 2023-09-04 14:01:32 BST

(In reply to Konstantinos Margaritis (markos) from comment #13)
> I am sorry but if it's 80% done, there is little point in myself investing
> time and completing the rest of the 20%. It is not about the completion
> ratio, but about the effort involved.

the budget can in theory be increased so that it is big enough to
be worthwhile.

> If it is not going to take 2 months, then it would really be simpler to just
> go ahead and finish it yourself.

i can't.