Bug 1047 - SVP64 LD/ST Data-Dependent Fail-First providing linked-list walking
Summary: SVP64 LD/ST Data-Dependent Fail-First providing linked-list walking
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: High enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 1080
Blocks: 1045 1056
  Show dependency treegraph
 
Reported: 2023-04-02 23:06 BST by Luke Kenneth Casson Leighton
Modified: 2023-11-20 20:11 GMT (History)
4 users (show)

See Also:
NLnet milestone: NLnet.2022-08-107.ongoing
total budget (EUR) for completion of task and all subtasks: 3000
budget (EUR) for this task, excluding subtasks' budget: 3000
parent task for budget allocation: 1027
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
lkcl=3000


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-04-02 23:06:15 BST
it turns out that Vectorised linked-list walking is possible
including NULL-termination checkking, which is such a high
priority that it is worth adapting the specification at this
late stage.

binutils will also need updating as will Store operations.

TODO list:

* DONE: update power_insn.py
* DONE: write test_pysvp64dis.py unit test
* DONE: update power_decoder2.py
* DONE: implement ISACaller DD-FFirst LD/ST
* DONE: write simple test_caller_svp64_ldst.py DD-FF unit test
* TODO: write simple linked-list unit test
   - DONE LD-Immediate
   - TODO LD-Indexed including re-running sv_analysis after
          re-classifying appropriate instructions as EXTRA322
* TODO: update EXTRA area specification to allow RT RA RB EXTRA2/3
* TODO: upate binutils (separate task advised)
Comment 1 Luke Kenneth Casson Leighton 2023-04-04 15:30:29 BST
commit f56024fb535ffb81958a1624d7f57e62047848f0 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Apr 4 15:26:49 2023 +0100

    https://bugs.libre-soc.org/show_bug.cgi?id=1047
    start sorting out power_insn.py to conform to new LD/ST spec.
    Data-Dependent Fail-First gets top priority, pred-result is dropped,
    saturation removed from LDST-IDX leaving space for "els" to be added
    with its own bit

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f56024fb535ffb81958a1624d7f57e62047848f0
Comment 2 Luke Kenneth Casson Leighton 2023-04-04 16:10:34 BST
commit 8d3e5f183002f7327c4badd45ab12414297d832c (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Apr 4 16:10:02 2023 +0100

    add quick test_pysvp64dis.py of LD/ST data-dependent fail-first
Comment 3 Luke Kenneth Casson Leighton 2023-04-05 00:32:19 BST
from arm-sve-ieee-2017

for (p = &head; p != NULL; ) {
  for (i = 0; p != NULL && i < VL/64; p = p->next)
    p’[i++] = p;// collect up to VL/64 pointers 
  for (j = 0; j < i; j++)
    res ˆ= p’[j]->val; // gather from pointer vector
}

the first loop is literally DD-FFirst with an immediate of
offsetof(p->next) and VLi=false, ff=RC1 (fail on equal to zero)
which will truncate VL to exclude the NULL.

    sv.ld/ff=RC1 *1, 8(*0)

the use of RT=RA+1 creates the dependency-chain.  sv.bc with CTR
mode can be used, terminating if there was truncation. which needs
detecting (the NULL). hmmm...
Comment 4 Luke Kenneth Casson Leighton 2023-04-06 13:27:37 BST
commit 59f1579956f07efe2611f6892a923764a0778ab2 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Apr 6 13:26:20 2023 +0100

    add power_decode_svp64_rm.py capability for new LD/ST format
    https://bugs.libre-soc.org/show_bug.cgi?id=1047

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=59f1579956f07efe2611f6892a923764a0778ab2
Comment 5 Luke Kenneth Casson Leighton 2023-05-08 23:44:30 BST
>     sv.ld/ff=RC1 *1, 8(*0)
> 
> the use of RT=RA+1 creates the dependency-chain.

... but relies on ldst-immediate.  if the data structure is large
enough LDST-indexed has to be used but it is EXTRA2.  a solution
is being discussed in bug #1080 to make room for 2 extra bits that
can be added to RT and RA to make them EXTRA3.
Comment 6 Luke Kenneth Casson Leighton 2023-05-15 12:55:56 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=003959b7e760ed13fe5d58015fd63aa117cb9066

unit tests demoing LD-Immediate and with update working as expected.
Comment 7 Luke Kenneth Casson Leighton 2023-05-20 11:09:07 BST
see https://bugs.libre-soc.org/show_bug.cgi?id=1083#c5

saturation no longer works if bits 6-7 are unable to provide
separate src/dest override.
Comment 8 Luke Kenneth Casson Leighton 2023-05-20 11:39:39 BST
wrong:

 448         elif 'st' in insn_name and 'x' in insn_name:  # stwux
 449             res['Etype'] = 'EXTRA3'  # RM EXTRA2 type
 450             # RS: Rdest2_EXTRA2, RA: Rsrc1_EXTRA2 / Rdest
 451             res['0'] = "%s;s:RA;d:RA" % (sRS)
 452             res['1'] = 's:RB'  # RB: Rsrc2_EXTRA2

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD#l447

that is a bug, it actually *should* be EXTRA3 but where 2 of the bits
for RB and RS are taken from RM[6:7]
Comment 9 Luke Kenneth Casson Leighton 2023-05-22 14:11:07 BST
the next phase is to now modify the specification for the EXTRA
area, creating a new type - an EXTRA32 type is probably the
most sensible - which is of the form:

* EXTRA3 for RT/RS
* EXTRA3 for RA
* EXTRA2 for RB

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/LDSTRM-2P-2S1D.csv;hb=HEAD

  36 lfdux,LDST_IDX,,2P,EXTRA2,EN,d:FRT,s:RA;d:RA,s:RB,
  37 stdux,LDST_IDX,,2P,EXTRA3,EN,s:RS;s:RA;d:RA,s:RB

the bits have to come from:

* EXTRA3 for RT/RS [6,10,11]
* EXTRA3 for RA    [7,12,13]
* EXTRA2 for RB    [14,15]

although actually it may be better to do:

* EXTRA3 for RT/RS [10,11,12]
* EXTRA3 for RA    [13,14,15]
* EXTRA2 for RB    [16,17]
* MASK_SRC         [6,7,18]

and call this "2PM"
Comment 10 Luke Kenneth Casson Leighton 2023-08-30 13:00:17 BST
i already started on this and the budgets are correspondingly not balanced 
given that i have already done the majority of the work already
(see TODO list, 80% completed).

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/extra322_ldst_idx

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=1197ecb6b5110b180ef28c267982907acb1797c2

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=070adb7a11f3721302149946f5f13c95a4c23613

again this is another task that i allocated a budget for based on
one person (me) doing it. i am not sure what to suggest here as it
has been too long since i looked at this but it should be straightforward
to examine how the LD/ST-Immediate unit tests work and get the
corresponding LD/ST-Indexed ones running, at which point doubly-linked
lists can easily be done as well (double-chasing by two-overlap
LD/ST-Update).

it is much more straightforward than the Vector-Immediate task,
and examination of the commit-diffs will show clearly what was done.
Comment 11 Luke Kenneth Casson Leighton 2023-08-30 20:36:14 BST
    vantosh=1500
    markos=1500

reminder as in comment #10 that i have *already done* 80% of this
task, and that there is an extra322_ldst_idx branch with the
progress on the remaining 20%.

budget from bug #1035 *may* potentially be reassignable to here
in order to get a good balance for the learning-curve to get
this done (including writing up the spec which involves adding
a new 322 format if i had not already added it), as well as ensuring
that i get paid for the LD/ST-Immediate DDFFirst work and linked-list
unit tests etc.
Comment 12 Luke Kenneth Casson Leighton 2023-09-04 00:22:38 BST
having done 80% of this task i am allocating a respective budget for that.
the rest is a reasonably-easy task with sufficient walkthrough guidance

vantosh=500
markos=500
lkcl=2000
Comment 13 Konstantinos Margaritis (markos) 2023-09-04 12:17:44 BST
I am sorry but if it's 80% done, there is little point in myself investing time and completing the rest of the 20%. It is not about the completion ratio, but about the effort involved. If the last 20% is going to take me -and for that matter Toshaan as well- 2 months to understand and implement and test, it is not unreasonable to expect to get paid accordingly. I am not interested in doing that for 500EUR so I am removing myself from the list. If it is not going to take 2 months, then it would really be simpler to just go ahead and finish it yourself.
Comment 14 Luke Kenneth Casson Leighton 2023-09-04 14:01:32 BST
(In reply to Konstantinos Margaritis (markos) from comment #13)
> I am sorry but if it's 80% done, there is little point in myself investing
> time and completing the rest of the 20%. It is not about the completion
> ratio, but about the effort involved.

the budget can in theory be increased so that it is big enough to
be worthwhile.

> If it is not going to take 2 months, then it would really be simpler to just
> go ahead and finish it yourself.

i can't.