Bug 672 - create SVP64 demo / unit test "positional popcount"
Summary: create SVP64 demo / unit test "positional popcount"
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: shriya.sharma
URL: https://libre-soc.org/openpower/sv/co...
Depends on: 1221 1222 1225
Blocks: 952 953
  Show dependency treegraph
 
Reported: 2021-08-22 12:14 BST by Luke Kenneth Casson Leighton
Modified: 2024-01-26 22:35 GMT (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks: 2000
budget (EUR) for this task, excluding subtasks' budget: 2000
parent task for budget allocation: 953
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
red={amount=1000,submitted=2024-01-05,paid=2024-01-12} lkcl={amount=1000,submitted=2024-01-05,paid=2024-01-25}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-08-22 12:14:16 BST
https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd/h9n30n9/?utm_source=reddit&utm_medium=web2x&context=3
https://github.com/clausecker/pospop/blob/master/safe.go

to be based on work by https://www.reddit.com/user/FUZxxl/,
with credits / attribution

---

* DONE: shriya comment #14
* DONE: lkcl first working iteration
* DONE: refinements (brief ones)

* LATER: investigate 16/32/64.
* LATER: investigate vectorisation of gbbd and popcntd
Comment 1 Jacob Lifshay 2021-08-25 18:03:51 BST
fix milestone to match parent task
Comment 2 Luke Kenneth Casson Leighton 2023-03-17 17:25:47 GMT
the algorithm counts by bit *positions*, therefore it makes sense to use
bitmatrix-flipping (bpermd) followed by popcount.
Comment 3 Luke Kenneth Casson Leighton 2023-11-21 12:03:51 GMT
# assume result input all zero already
# count in r3, input address r4, results in r16-r23, temp r8-r16, r32-r63

>    for i := range buf
>        for j := 0; j < 8; j++ 
>            counts[j] += int(buf[i] >> j & 1)

#    use CTR mode
     mtspr CTR, r3
loop:
# clear out 8 temporaries, so that there is no extraneous
     setvli 8
     sv.addi *r8, r0, 0
# now copy *up to* the required number, the rest are zeros
     setvl. 0, 0, 8, 0, 1, 1
     sv.lbu/pi *r8, 0(r4) # post-increment r4
# vector of bpermds here, swaps up to 64 at once
     sv.bpermd *r8, *r8
# now do a vectorised 8-bit popcnt, elwidth override to expand to 64 bit
     setvli 8
     sv.popcnt/sw=8 *r8, *r32
# add to accumulation of results, the
     sv.add *r16, *r16, *r32
# and branch back if CTR ended
     sv.bc/all 16, *0, loop
Comment 4 Luke Kenneth Casson Leighton 2023-11-21 12:41:19 GMT
commit ee164c88cabe3d85ca1b1dc09c248dad87a25047 (HEAD -> master, origin/master, origin/HEAD)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Nov 21 12:39:28 2023 +0000

    add cut/paste copy of strncpy example as basis for pospopcount
Comment 5 shriya.sharma 2023-11-21 14:26:09 GMT
commit 088905b428547b53918d62b5256dab6392474375 (HEAD -> master, origin/master, origin/HEAD)
Author: Shriya Sharma <shriya@redsemiconductor.com>
Date:   Tue Nov 21 14:27:50 2023 +0000

    Added image for popcount
Comment 6 Luke Kenneth Casson Leighton 2023-11-21 17:41:03 GMT
commit ef542f4fa5b17ce3763b46849b14dfdbfb47d1f3 (HEAD -> master, origin/master, origin/HEAD)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Nov 21 17:40:27 2023 +0000

    starting on pospopcount assembler
Comment 7 Luke Kenneth Casson Leighton 2023-11-27 10:50:54 GMT
drat, it is actull SFS version of vgather needed. have to look at grevluti.
Comment 8 Luke Kenneth Casson Leighton 2023-11-27 11:22:37 GMT
bmatflip
https://libre-soc.org/openpower/sv/bitmanip/
Comment 9 Luke Kenneth Casson Leighton 2023-11-27 13:14:58 GMT
commit 6c322ecc2b5867f5ef38a7c4194fe25b85325c8e (HEAD -> 672_pospopcount)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Nov 27 13:14:43 2023 +0000

    add first gather instruction pseudocode
Comment 10 Luke Kenneth Casson Leighton 2023-11-27 13:30:14 GMT
commit 5a950e84cae06ed9a42fad4e07ad50c2da963799 (HEAD -> 672_pospopcount)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Mon Nov 27 13:29:18 2023 +0000

    add gbbd to minor_22.csv, add OP_BMAT to power_enums.py
Comment 11 Luke Kenneth Casson Leighton 2023-11-28 20:56:31 GMT
commit ade4c100a066e27dbbe1ccccf09fa2902bdefe0b (HEAD -> 672_pospopcount)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Nov 28 20:41:01 2023 +0000

    fix elwidth overrides when sw=8
    the way that XLEN works is it must be MAX(sw,dw) which is not what
    was happening, it was fixed at sw (source width)
Comment 12 Luke Kenneth Casson Leighton 2023-11-28 21:04:10 GMT
commit d66dc44e23da264fd43d4e7c5749af0890327cf2 (HEAD -> 672_pospopcount)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Nov 28 21:03:43 2023 +0000

    bug #672: fixing pospopcount assembler
    
    there is a lot going on here, this is pushing the boundaries of
    what ISAcaller can do (or hasnt been asked to do... until now)
    * gbbd (gather bits and bytes double) had to be added
    * sw=8,dw=64 had to be fixed (XLEN is actually 64 there
      but source elements have to be ZERO-EXTENDED...)
    * a bug in sv.addi/sw=8 was found
      https://bugs.libre-soc.org/show_bug.cgi?id=1221
    * some changes to setvl have to be made/written (!)
    * sv.bc in CTR-reduction mode needs to potentially be fixed
      or at least properly examined
Comment 13 Luke Kenneth Casson Leighton 2023-11-28 22:46:11 GMT
commit 90ee7b1f2d6e9aac870a23a2e821b25039689148 (HEAD -> 672_pospopcount, origin/672_pospopcount)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Nov 28 22:45:42 2023 +0000

    bug #672: pospopcount finally got the right answer
    
    forgot to add popcntd initially, lots of futzing around, still work to do
    but it gives a correct answer now
Comment 14 Luke Kenneth Casson Leighton 2023-11-29 09:20:16 GMT
RISC-V Bitmanip Extension Document Version 0.94-draft Editor: Claire Wolf Symbiotic GmbH claire@symbioticeda.com January 20, 2021

shriya can you look that up find a link and crossref to section 2.8 p35
in the cookbook wiki page? mention bmatflip and the other instructions
that other ISAs have, but also mention thwt it is the same as Power ISA
VSX vgbbd, which we are adding to SFS Draft as gbbd?

then copy the diagram from vgbbd in the Power Public ISA document,
you find it under vgbbd (search the PDF for "Vector Gather"), include
the colours ok and do it as an SVG like the last one.

there are lots of pieces here and it will make a great FOSDEM talk
Comment 15 Luke Kenneth Casson Leighton 2023-11-29 17:49:28 GMT
commit b427a6cc523dc5a277d0e379848e4bad90568592 (HEAD -> 672_pospopcount)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Nov 29 15:06:18 2023 +0000

    bug #672: shorter pospopcount but not fully working
    
    variant on pospopcount but when 241 array items instead of 240 are used
    it produces the wrong answer. under investigation

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=b427a6cc
Comment 16 Luke Kenneth Casson Leighton 2023-12-05 14:37:01 GMT
commit d96e724f9878250007b5c68b70879e420841f410 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Dec 5 14:36:42 2023 +0000

    prepare assembler for warm-words, pospopcount
Comment 17 Luke Kenneth Casson Leighton 2023-12-05 14:46:15 GMT
commit 1b1b3e73f9f737102780fc99341681feed65bf3d (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Dec 5 14:46:01 2023 +0000

    add words to describe first few instructions, bug #672 popspopc
Comment 18 Luke Kenneth Casson Leighton 2023-12-05 15:30:08 GMT
commit 4a3ebb8156b53109db6cae6847088906c54ed55a (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Dec 5 15:29:52 2023 +0000

    more instruction explanation on pospopcount, bug #672
Comment 19 Luke Kenneth Casson Leighton 2023-12-05 15:49:52 GMT
commit 0eee41dc14505b669d47c8e913d952888c7b7c94 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Dec 5 15:49:37 2023 +0000

    add sv.bc warm words for pospopcount bug #672
Comment 20 Luke Kenneth Casson Leighton 2023-12-05 15:56:37 GMT
commit b421f0cc6b35bf138235ca2d3f505611c2dc5a2d (HEAD -> master, origin/master, origin/HEAD)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Tue Dec 5 15:55:47 2023 +0000

    add improvements section
Comment 21 Luke Kenneth Casson Leighton 2023-12-06 14:40:02 GMT
commit d90f8ee6210d3595fd6b747dc258b1067bc1f548 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Dec 6 14:39:47 2023 +0000

    add pospopcount conclusion bug #672
Comment 22 Luke Kenneth Casson Leighton 2023-12-06 16:31:46 GMT
ok this is done, it reads well, has two sections describing the
algorithm, one with a sequence of images and the other doing
a walk-through of the assembler.