see bug #858 two key pages: https://libre-soc.org/openpower/sv/executive_summary/ https://libre-soc.org/openpower/sv/comparison_table/ combined into: https://ftp.libre-soc.org/simple_v_spec.pdf
start transferring comments over from #858 Comparison table headings * Number of instructions * Scalable yes/no * Predication masks yes/no * Explicit vector registers * 128-Bit * biginteger capability * Load/Store Fail/First * Twin predication * Data-dependent fail-first * Predicate-result https://libre-soc.org/openpower/sv/comparison_table/
svp64 does not modify harm or corrupt the existing Power ISA, and does not interfere with an existing system. it needs only a small allocation of opcodes (5) to implement. whereas any othe Vector implementation would require an intrusive fundamental overhaul of the Power ISA. we invented Simple-V to be simple bwcause we don't like complicated.
preamble highlights on comparison table: * Significantly less opcodes than other Predicated SIMD and Scalable Vector ISAs, and a lot less intrusive * Provides all features of all modern Scalable Vector ISAs and some innovative ones as well * Can be added without complication to Power ISA systems
make sure to spell out what SVP64 is not: * not RVV. not based on RVV. based on original Cray concept. * not based on any known other ISA, is its own intuitive concept need to think in 2 Dimensions. instructions, vertical, registers horizontal. GPUs shoild have backkend massive wide SIMD. frontend is atill exact same SVP64 ISA. makes life much easier because uniform executive summary (2 page) book zero. use arefs how can be done (source code, unit tests) primer is too technical (book 1) merge into same document. "please contact if questions" add revision history with version numbers put together quickly, will be updated,
https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/svp64-primer/svp64-proposal.tex;h=eb714dbf720fc05685a57a1ef8df75e0986ccc8c;h=aa6cb4dfd8f91dc6e60168cce3bbdec8c31c17fa;hb=HEAD imho the Authors field needs to be changed to just "The Libre-SOC Project" or something, since e.g. I (and others) also put a lot of work into the SVP64 spec.
(In reply to Jacob Lifshay from comment #5) > https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/svp64-primer/ > svp64-proposal.tex;h=eb714dbf720fc05685a57a1ef8df75e0986ccc8c; > h=aa6cb4dfd8f91dc6e60168cce3bbdec8c31c17fa;hb=HEAD > > imho the Authors field needs to be changed to just "The Libre-SOC Project" > or something, since e.g. I (and others) also put a lot of work into the > SVP64 spec. nm, mistook that for embedding other parts of the svp64 spec into the output pdf.
(In reply to Jacob Lifshay from comment #5) > https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/svp64-primer/ > svp64-proposal.tex;h=eb714dbf720fc05685a57a1ef8df75e0986ccc8c; > h=aa6cb4dfd8f91dc6e60168cce3bbdec8c31c17fa;hb=HEAD > > imho the Authors field needs to be changed to just "The Libre-SOC Project" > or something, since e.g. I (and others) also put a lot of work into the > SVP64 spec. Is it Author of the document or Author of SVP64. I would assume Author of the document in this case.
(In reply to Luke Kenneth Casson Leighton from comment #3) > preamble highlights on comparison table: imho the comparison table needs a User-selectable Vector Length (VL reg) column, since we want to emphasize that we're avoiding ARM SVE's trap of not allowing the user to pick specific vector lengths -- mostly because the trap isn't obvious unless explicitly pointed out and we want OpenPower to not fall into that trap.
unit tests and simulator for Power ISA and SVP64 https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/decoder/isa;hb=HEAD several thousand more ISA unit tests https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/test;hb=HEAD demo,showing 4.5x reduction in program size for MP3 decode, greatly simplifies assembler development https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=media/audio/mp3;hb=HEAD development of binutils support for SVP64 https://git.libre-soc.org/?p=binutils-gdb.git;a=shortlog;h=refs/heads/svp64-ng
(In reply to Jacob Lifshay from comment #8) > imho the comparison table needs a User-selectable Vector Length (VL reg) > column, since we want to emphasize that we're avoiding ARM SVE's trap of not > allowing the user to pick specific vector lengths -- mostly because the trap > isn't obvious unless explicitly pointed out and we want OpenPower to not > fall into that trap. i'm putting that in the summary and also spelling it out in the footnotes. if it's really not obvious then yes, sigh, have to work out if another column can be added. table's very cramped even landscape. v. small font.
(In reply to Luke Kenneth Casson Leighton from comment #10) > i'm putting that in the summary and also spelling it out in the footnotes. > if it's really not obvious then yes, sigh, have to work out if another column > can be added. it can.
konstantinos can i ask you the favour of dropping some rough numbers of intrinsics for NEON, SVE2, and AVX512? or better the number of instructions
Intel: 7256 SIMD intrinsics (SSE*, AVX & AVX2, all AVX512 variants) Arm NEON: 2754 (32-bit) and 4344 (64-bit) intrinsics Arm SVE: 4140 intrinsics Arm SVE2: 1900 intrinsics All numbers were taken from Intel intrinsics Guide and Arm Developer online resource.
(In reply to Konstantinos Margaritis from comment #13) > Intel: 7256 SIMD intrinsics (SSE*, AVX & AVX2, all AVX512 variants) > Arm NEON: 2754 (32-bit) and 4344 (64-bit) intrinsics > Arm SVE: 4140 intrinsics > Arm SVE2: 1900 intrinsics > > All numbers were taken from Intel intrinsics Guide and Arm Developer online > resource. https://atomicquote.com/author/iain-m-banks/quote/two-hundred-and-thirty-three-thousand-times-the-speed-of-light-dear-holy-fucking-shit-the-yawning-angel-thought-there-was-something-almost-vulgar-about-such-a-velocity-where-the-hell-was-it-h
found it. https://gist.github.com/zingaburga/805669eb891c820bd220418ee3f0d6bd#file-sve2-md but in the worst case, you may be stuck with only using the bottom 128 bits of the vector, or need to code specifically for each width you wish to support.
found ARM's SVE2 Matrix Extension. https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/SMOPA--Signed-integer-sum-of-outer-products-and-accumulate-?lang=en i cannot tell by looking at the linked pseudocode if it is power-2 or non-power-2 CheckStreamingSVEAndZAEnabled(); constant integer VL = CurrentVL; constant integer PL = VL DIV 8; constant integer dim = VL DIV esize; there's a hyperlink to CurrentVL https://developer.arm.com/documentation/ddi0602/2022-06/Shared-Pseudocode/AArch64-Functions?lang=en#impl-aarch64.CurrentVL.read.none but i get totally lost in the ref-to-this, ref-to-that, if-optional-thing-enabled use this, if-not-use-that. urr.... https://www.realworldtech.com/forum/?threadid=202688&curpostid=202688 might as well ask https://www.realworldtech.com/forum/?threadid=202688&curpostid=207731
as mentioned on irc: I changed the comparison table to use footnotes and changed texmunge.py to make the tex work with multiple references to the same footnote. I managed to cram it all on one page by reducing the font size of the footnotes somewhat... https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=93b706bca64112a29a75cbb8e0f1a3fea80e5e2a lkcl last I checked I don't have ftp write access (was lost at some point apparently), so I can't upload the new pdf version.
(In reply to Jacob Lifshay from comment #17) > as mentioned on irc: > I changed the comparison table to use footnotes and changed texmunge.py to > make the tex work with multiple references to the same footnote. I managed > to cram it all on one page by reducing the font size of the footnotes > somewhat... > https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff; > h=93b706bca64112a29a75cbb8e0f1a3fea80e5e2a brilliant. i removed the names back down to numbers, reducing the xterm width from 380 characters to a "mere" 250. it now again fits 50% of a 3840x2160 resolution LCD so is (barely) manageable the use of footnotes themselves are brilliant, saves a lot of hassle. > lkcl last I checked I don't have ftp write access (was lost at some point > apparently), so I can't upload the new pdf version. sftp. i am doing "make upload" appx 15-20x a day at the moment.
i also added MyISA 66000 which as can be seen on the latest libre-soc-dev discussion can take between 1 week and 2 YEARS to comprehend. it is an unlimited-scalable "Hardware Auto-Vectorisation" ISA limited to LD/ST as Vector start/end points and to a 1D LOOP Construct. it warrants addition simply because the bang-per-buck is so high within the capability it gives. it is however not a panacea.
(In reply to Luke Kenneth Casson Leighton from comment #16) > urr.... https://www.realworldtech.com/forum/?threadid=202688&curpostid=202688 > might as well ask > https://www.realworldtech.com/forum/?threadid=202688&curpostid=207731 someone called dmcq very kindly replied, prompting me to investigate the pseudocode, which magically a day later is making sense to me. https://www.realworldtech.com/forum/?threadid=202688&curpostid=207774 the outer-product instructions are definitely power-of-two boundaried, the "tiles" must be squares. a 128-bit silicon-partner choice would result in 2x2 64-bit outer-product, a 4x4 32-bit outer-product. there is no way to stop overwriting of destinations on non-power-two boundaries, but there *is* a way to stop wasting of CPU cycles on multiply-by-zero-and-adds, by pre-running some zero-detection instructions and putting the result of that detection into predicate masks. there's *two* predicate source masks for that purpose: one for N one for M.