https://www.reddit.com/r/programming/comments/p0yn45/three_fundamental_flaws_of_simd/h9n30n9/?utm_source=reddit&utm_medium=web2x&context=3 https://github.com/clausecker/pospop/blob/master/safe.go to be based on work by https://www.reddit.com/user/FUZxxl/, with credits / attribution --- * DONE: shriya comment #14 * DONE: lkcl first working iteration * DONE: refinements (brief ones) * LATER: investigate 16/32/64. * LATER: investigate vectorisation of gbbd and popcntd
fix milestone to match parent task
the algorithm counts by bit *positions*, therefore it makes sense to use bitmatrix-flipping (bpermd) followed by popcount.
# assume result input all zero already # count in r3, input address r4, results in r16-r23, temp r8-r16, r32-r63 > for i := range buf > for j := 0; j < 8; j++ > counts[j] += int(buf[i] >> j & 1) # use CTR mode mtspr CTR, r3 loop: # clear out 8 temporaries, so that there is no extraneous setvli 8 sv.addi *r8, r0, 0 # now copy *up to* the required number, the rest are zeros setvl. 0, 0, 8, 0, 1, 1 sv.lbu/pi *r8, 0(r4) # post-increment r4 # vector of bpermds here, swaps up to 64 at once sv.bpermd *r8, *r8 # now do a vectorised 8-bit popcnt, elwidth override to expand to 64 bit setvli 8 sv.popcnt/sw=8 *r8, *r32 # add to accumulation of results, the sv.add *r16, *r16, *r32 # and branch back if CTR ended sv.bc/all 16, *0, loop
commit ee164c88cabe3d85ca1b1dc09c248dad87a25047 (HEAD -> master, origin/master, origin/HEAD) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Nov 21 12:39:28 2023 +0000 add cut/paste copy of strncpy example as basis for pospopcount
commit 088905b428547b53918d62b5256dab6392474375 (HEAD -> master, origin/master, origin/HEAD) Author: Shriya Sharma <shriya@redsemiconductor.com> Date: Tue Nov 21 14:27:50 2023 +0000 Added image for popcount
commit ef542f4fa5b17ce3763b46849b14dfdbfb47d1f3 (HEAD -> master, origin/master, origin/HEAD) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Nov 21 17:40:27 2023 +0000 starting on pospopcount assembler
drat, it is actull SFS version of vgather needed. have to look at grevluti.
bmatflip https://libre-soc.org/openpower/sv/bitmanip/
commit 6c322ecc2b5867f5ef38a7c4194fe25b85325c8e (HEAD -> 672_pospopcount) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Nov 27 13:14:43 2023 +0000 add first gather instruction pseudocode
commit 5a950e84cae06ed9a42fad4e07ad50c2da963799 (HEAD -> 672_pospopcount) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Mon Nov 27 13:29:18 2023 +0000 add gbbd to minor_22.csv, add OP_BMAT to power_enums.py
commit ade4c100a066e27dbbe1ccccf09fa2902bdefe0b (HEAD -> 672_pospopcount) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Nov 28 20:41:01 2023 +0000 fix elwidth overrides when sw=8 the way that XLEN works is it must be MAX(sw,dw) which is not what was happening, it was fixed at sw (source width)
commit d66dc44e23da264fd43d4e7c5749af0890327cf2 (HEAD -> 672_pospopcount) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Nov 28 21:03:43 2023 +0000 bug #672: fixing pospopcount assembler there is a lot going on here, this is pushing the boundaries of what ISAcaller can do (or hasnt been asked to do... until now) * gbbd (gather bits and bytes double) had to be added * sw=8,dw=64 had to be fixed (XLEN is actually 64 there but source elements have to be ZERO-EXTENDED...) * a bug in sv.addi/sw=8 was found https://bugs.libre-soc.org/show_bug.cgi?id=1221 * some changes to setvl have to be made/written (!) * sv.bc in CTR-reduction mode needs to potentially be fixed or at least properly examined
commit 90ee7b1f2d6e9aac870a23a2e821b25039689148 (HEAD -> 672_pospopcount, origin/672_pospopcount) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Nov 28 22:45:42 2023 +0000 bug #672: pospopcount finally got the right answer forgot to add popcntd initially, lots of futzing around, still work to do but it gives a correct answer now
RISC-V Bitmanip Extension Document Version 0.94-draft Editor: Claire Wolf Symbiotic GmbH claire@symbioticeda.com January 20, 2021 shriya can you look that up find a link and crossref to section 2.8 p35 in the cookbook wiki page? mention bmatflip and the other instructions that other ISAs have, but also mention thwt it is the same as Power ISA VSX vgbbd, which we are adding to SFS Draft as gbbd? then copy the diagram from vgbbd in the Power Public ISA document, you find it under vgbbd (search the PDF for "Vector Gather"), include the colours ok and do it as an SVG like the last one. there are lots of pieces here and it will make a great FOSDEM talk
commit b427a6cc523dc5a277d0e379848e4bad90568592 (HEAD -> 672_pospopcount) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Nov 29 15:06:18 2023 +0000 bug #672: shorter pospopcount but not fully working variant on pospopcount but when 241 array items instead of 240 are used it produces the wrong answer. under investigation https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=b427a6cc
commit d96e724f9878250007b5c68b70879e420841f410 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Dec 5 14:36:42 2023 +0000 prepare assembler for warm-words, pospopcount
commit 1b1b3e73f9f737102780fc99341681feed65bf3d (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Dec 5 14:46:01 2023 +0000 add words to describe first few instructions, bug #672 popspopc
commit 4a3ebb8156b53109db6cae6847088906c54ed55a (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Dec 5 15:29:52 2023 +0000 more instruction explanation on pospopcount, bug #672
commit 0eee41dc14505b669d47c8e913d952888c7b7c94 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Dec 5 15:49:37 2023 +0000 add sv.bc warm words for pospopcount bug #672
commit b421f0cc6b35bf138235ca2d3f505611c2dc5a2d (HEAD -> master, origin/master, origin/HEAD) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Dec 5 15:55:47 2023 +0000 add improvements section
commit d90f8ee6210d3595fd6b747dc258b1067bc1f548 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Dec 6 14:39:47 2023 +0000 add pospopcount conclusion bug #672
ok this is done, it reads well, has two sections describing the algorithm, one with a sequence of images and the other doing a walk-through of the assembler.