it turns out that Vectorised linked-list walking is possible including NULL-termination checkking, which is such a high priority that it is worth adapting the specification at this late stage. binutils will also need updating as will Store operations. TODO list: * DONE: update power_insn.py * DONE: write test_pysvp64dis.py unit test * DONE: update power_decoder2.py * DONE: implement ISACaller DD-FFirst LD/ST * DONE: write simple test_caller_svp64_ldst.py DD-FF unit test * TODO: write simple linked-list unit test - DONE LD-Immediate - TODO LD-Indexed including re-running sv_analysis after re-classifying appropriate instructions as EXTRA322 * TODO: update EXTRA area specification to allow RT RA RB EXTRA2/3 * TODO: upate binutils (separate task advised)
commit f56024fb535ffb81958a1624d7f57e62047848f0 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Apr 4 15:26:49 2023 +0100 https://bugs.libre-soc.org/show_bug.cgi?id=1047 start sorting out power_insn.py to conform to new LD/ST spec. Data-Dependent Fail-First gets top priority, pred-result is dropped, saturation removed from LDST-IDX leaving space for "els" to be added with its own bit https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f56024fb535ffb81958a1624d7f57e62047848f0
commit 8d3e5f183002f7327c4badd45ab12414297d832c (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Apr 4 16:10:02 2023 +0100 add quick test_pysvp64dis.py of LD/ST data-dependent fail-first
from arm-sve-ieee-2017 for (p = &head; p != NULL; ) { for (i = 0; p != NULL && i < VL/64; p = p->next) p’[i++] = p;// collect up to VL/64 pointers for (j = 0; j < i; j++) res ˆ= p’[j]->val; // gather from pointer vector } the first loop is literally DD-FFirst with an immediate of offsetof(p->next) and VLi=false, ff=RC1 (fail on equal to zero) which will truncate VL to exclude the NULL. sv.ld/ff=RC1 *1, 8(*0) the use of RT=RA+1 creates the dependency-chain. sv.bc with CTR mode can be used, terminating if there was truncation. which needs detecting (the NULL). hmmm...
commit 59f1579956f07efe2611f6892a923764a0778ab2 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Apr 6 13:26:20 2023 +0100 add power_decode_svp64_rm.py capability for new LD/ST format https://bugs.libre-soc.org/show_bug.cgi?id=1047 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=59f1579956f07efe2611f6892a923764a0778ab2
> sv.ld/ff=RC1 *1, 8(*0) > > the use of RT=RA+1 creates the dependency-chain. ... but relies on ldst-immediate. if the data structure is large enough LDST-indexed has to be used but it is EXTRA2. a solution is being discussed in bug #1080 to make room for 2 extra bits that can be added to RT and RA to make them EXTRA3.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=003959b7e760ed13fe5d58015fd63aa117cb9066 unit tests demoing LD-Immediate and with update working as expected.
see https://bugs.libre-soc.org/show_bug.cgi?id=1083#c5 saturation no longer works if bits 6-7 are unable to provide separate src/dest override.
wrong: 448 elif 'st' in insn_name and 'x' in insn_name: # stwux 449 res['Etype'] = 'EXTRA3' # RM EXTRA2 type 450 # RS: Rdest2_EXTRA2, RA: Rsrc1_EXTRA2 / Rdest 451 res['0'] = "%s;s:RA;d:RA" % (sRS) 452 res['1'] = 's:RB' # RB: Rsrc2_EXTRA2 https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/sv/sv_analysis.py;hb=HEAD#l447 that is a bug, it actually *should* be EXTRA3 but where 2 of the bits for RB and RS are taken from RM[6:7]
the next phase is to now modify the specification for the EXTRA area, creating a new type - an EXTRA32 type is probably the most sensible - which is of the form: * EXTRA3 for RT/RS * EXTRA3 for RA * EXTRA2 for RB https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/LDSTRM-2P-2S1D.csv;hb=HEAD 36 lfdux,LDST_IDX,,2P,EXTRA2,EN,d:FRT,s:RA;d:RA,s:RB, 37 stdux,LDST_IDX,,2P,EXTRA3,EN,s:RS;s:RA;d:RA,s:RB the bits have to come from: * EXTRA3 for RT/RS [6,10,11] * EXTRA3 for RA [7,12,13] * EXTRA2 for RB [14,15] although actually it may be better to do: * EXTRA3 for RT/RS [10,11,12] * EXTRA3 for RA [13,14,15] * EXTRA2 for RB [16,17] * MASK_SRC [6,7,18] and call this "2PM"
i already started on this and the budgets are correspondingly not balanced given that i have already done the majority of the work already (see TODO list, 80% completed). https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/extra322_ldst_idx https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=1197ecb6b5110b180ef28c267982907acb1797c2 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=070adb7a11f3721302149946f5f13c95a4c23613 again this is another task that i allocated a budget for based on one person (me) doing it. i am not sure what to suggest here as it has been too long since i looked at this but it should be straightforward to examine how the LD/ST-Immediate unit tests work and get the corresponding LD/ST-Indexed ones running, at which point doubly-linked lists can easily be done as well (double-chasing by two-overlap LD/ST-Update). it is much more straightforward than the Vector-Immediate task, and examination of the commit-diffs will show clearly what was done.
vantosh=1500 markos=1500 reminder as in comment #10 that i have *already done* 80% of this task, and that there is an extra322_ldst_idx branch with the progress on the remaining 20%. budget from bug #1035 *may* potentially be reassignable to here in order to get a good balance for the learning-curve to get this done (including writing up the spec which involves adding a new 322 format if i had not already added it), as well as ensuring that i get paid for the LD/ST-Immediate DDFFirst work and linked-list unit tests etc.
having done 80% of this task i am allocating a respective budget for that. the rest is a reasonably-easy task with sufficient walkthrough guidance vantosh=500 markos=500 lkcl=2000
I am sorry but if it's 80% done, there is little point in myself investing time and completing the rest of the 20%. It is not about the completion ratio, but about the effort involved. If the last 20% is going to take me -and for that matter Toshaan as well- 2 months to understand and implement and test, it is not unreasonable to expect to get paid accordingly. I am not interested in doing that for 500EUR so I am removing myself from the list. If it is not going to take 2 months, then it would really be simpler to just go ahead and finish it yourself.
(In reply to Konstantinos Margaritis (markos) from comment #13) > I am sorry but if it's 80% done, there is little point in myself investing > time and completing the rest of the 20%. It is not about the completion > ratio, but about the effort involved. the budget can in theory be increased so that it is big enough to be worthwhile. > If it is not going to take 2 months, then it would really be simpler to just > go ahead and finish it yourself. i can't.