Bug 893 - SVP64 proposal to OPF
Summary: SVP64 proposal to OPF
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL: https://ftp.libre-soc.org/simple_v_sp...
Depends on:
Blocks:
 
Reported: 2022-07-22 11:48 BST by Luke Kenneth Casson Leighton
Modified: 2022-09-10 12:38 BST (History)
7 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Comment 1 Luke Kenneth Casson Leighton 2022-07-22 13:20:43 BST
start transferring comments over from #858

Comparison table  headings

* Number of instructions
* Scalable yes/no
* Predication masks yes/no
* Explicit vector registers
* 128-Bit
* biginteger capability
* Load/Store Fail/First
* Twin predication
* Data-dependent fail-first
* Predicate-result

https://libre-soc.org/openpower/sv/comparison_table/
Comment 2 Luke Kenneth Casson Leighton 2022-07-22 18:23:29 BST
svp64 does not modify harm or corrupt the existing Power ISA, and does not
interfere with an existing system.  it needs only a small allocation of opcodes (5) to implement.

whereas any othe Vector implementation would require an intrusive fundamental
overhaul of the Power ISA.

we invented Simple-V to be simple bwcause we don't like complicated.
Comment 3 Luke Kenneth Casson Leighton 2022-07-22 18:29:58 BST
preamble highlights on comparison table:

* Significantly less opcodes than other Predicated SIMD and Scalable Vector ISAs, and a lot less intrusive
* Provides all features of all modern Scalable Vector ISAs and some innovative ones as well
* Can be added without complication to Power ISA systems
Comment 4 Luke Kenneth Casson Leighton 2022-07-22 22:48:23 BST
make sure to spell out what SVP64 is not:

* not RVV. not based on RVV. based on original Cray concept.
* not based on any known other ISA, is its own intuitive concept

need to think in 2 Dimensions.  instructions, vertical, registers horizontal.


GPUs shoild have backkend massive wide SIMD.  frontend is atill exact same SVP64 ISA.  makes life much easier because uniform

executive summary (2 page) book zero. use arefs how can be done (source code,
unit tests)

primer is too technical (book 1)

merge into same document.

"please contact if questions"

add revision history with version numbers
put together quickly, will be updated,
Comment 5 Jacob Lifshay 2022-07-23 00:47:16 BST
https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/svp64-primer/svp64-proposal.tex;h=eb714dbf720fc05685a57a1ef8df75e0986ccc8c;h=aa6cb4dfd8f91dc6e60168cce3bbdec8c31c17fa;hb=HEAD

imho the Authors field needs to be changed to just "The Libre-SOC Project" or something, since e.g. I (and others) also put a lot of work into the SVP64 spec.
Comment 6 Jacob Lifshay 2022-07-23 00:54:02 BST
(In reply to Jacob Lifshay from comment #5)
> https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/svp64-primer/
> svp64-proposal.tex;h=eb714dbf720fc05685a57a1ef8df75e0986ccc8c;
> h=aa6cb4dfd8f91dc6e60168cce3bbdec8c31c17fa;hb=HEAD
> 
> imho the Authors field needs to be changed to just "The Libre-SOC Project"
> or something, since e.g. I (and others) also put a lot of work into the
> SVP64 spec.

nm, mistook that for embedding other parts of the svp64 spec into the output pdf.
Comment 7 djac 2022-07-23 00:55:28 BST
(In reply to Jacob Lifshay from comment #5)
> https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/svp64-primer/
> svp64-proposal.tex;h=eb714dbf720fc05685a57a1ef8df75e0986ccc8c;
> h=aa6cb4dfd8f91dc6e60168cce3bbdec8c31c17fa;hb=HEAD
> 
> imho the Authors field needs to be changed to just "The Libre-SOC Project"
> or something, since e.g. I (and others) also put a lot of work into the
> SVP64 spec.

Is it Author of the document or Author of SVP64.  I would assume Author of the document in this case.
Comment 8 Jacob Lifshay 2022-07-23 01:01:34 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> preamble highlights on comparison table:

imho the comparison table needs a User-selectable Vector Length (VL reg) column, since we want to emphasize that we're avoiding ARM SVE's trap of not allowing the user to pick specific vector lengths -- mostly because the trap isn't obvious unless explicitly pointed out and we want OpenPower to not fall into that trap.
Comment 9 Luke Kenneth Casson Leighton 2022-07-23 01:21:27 BST
unit tests and simulator for Power ISA and SVP64
https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/decoder/isa;hb=HEAD

several thousand more ISA unit tests
https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/openpower/test;hb=HEAD

demo,showing 4.5x reduction in program size for MP3 decode, greatly simplifies
assembler development
https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=media/audio/mp3;hb=HEAD

development of binutils support for SVP64
https://git.libre-soc.org/?p=binutils-gdb.git;a=shortlog;h=refs/heads/svp64-ng
Comment 10 Luke Kenneth Casson Leighton 2022-07-23 10:42:44 BST
(In reply to Jacob Lifshay from comment #8)
 
> imho the comparison table needs a User-selectable Vector Length (VL reg)
> column, since we want to emphasize that we're avoiding ARM SVE's trap of not
> allowing the user to pick specific vector lengths -- mostly because the trap
> isn't obvious unless explicitly pointed out and we want OpenPower to not
> fall into that trap.

i'm putting that in the summary and also spelling it out in the footnotes.
if it's really not obvious then yes, sigh, have to work out if another column
can be added. table's very cramped even landscape. v. small font.
Comment 11 Luke Kenneth Casson Leighton 2022-07-23 11:39:38 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)

> i'm putting that in the summary and also spelling it out in the footnotes.
> if it's really not obvious then yes, sigh, have to work out if another column
> can be added. 

it can.
Comment 12 Luke Kenneth Casson Leighton 2022-07-27 00:04:33 BST
konstantinos can i ask you the favour of dropping some rough numbers of
intrinsics for NEON, SVE2, and AVX512? or better the number of instructions
Comment 13 Konstantinos Margaritis (markos) 2022-07-27 23:01:10 BST
Intel: 7256 SIMD intrinsics (SSE*, AVX & AVX2, all AVX512 variants)
Arm NEON: 2754 (32-bit) and 4344 (64-bit) intrinsics
Arm SVE: 4140 intrinsics
Arm SVE2: 1900 intrinsics

All numbers were taken from Intel intrinsics Guide and Arm Developer online resource.
Comment 14 Luke Kenneth Casson Leighton 2022-07-28 01:44:58 BST
(In reply to Konstantinos Margaritis from comment #13)
> Intel: 7256 SIMD intrinsics (SSE*, AVX & AVX2, all AVX512 variants)
> Arm NEON: 2754 (32-bit) and 4344 (64-bit) intrinsics
> Arm SVE: 4140 intrinsics
> Arm SVE2: 1900 intrinsics
> 
> All numbers were taken from Intel intrinsics Guide and Arm Developer online
> resource.

https://atomicquote.com/author/iain-m-banks/quote/two-hundred-and-thirty-three-thousand-times-the-speed-of-light-dear-holy-fucking-shit-the-yawning-angel-thought-there-was-something-almost-vulgar-about-such-a-velocity-where-the-hell-was-it-h
Comment 15 Luke Kenneth Casson Leighton 2022-07-28 22:28:43 BST
found it.

https://gist.github.com/zingaburga/805669eb891c820bd220418ee3f0d6bd#file-sve2-md

     but in the worst case, you may be stuck with only using the bottom
     128 bits of the vector, or need to code specifically for each width
     you wish to support.
Comment 16 Luke Kenneth Casson Leighton 2022-07-28 23:27:27 BST
found ARM's SVE2 Matrix Extension.
https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/SMOPA--Signed-integer-sum-of-outer-products-and-accumulate-?lang=en

i cannot tell by looking at the linked pseudocode if it is power-2 or
non-power-2

    CheckStreamingSVEAndZAEnabled();
    constant integer VL = CurrentVL;
    constant integer PL = VL DIV 8;
    constant integer dim = VL DIV esize;

there's a hyperlink to CurrentVL

https://developer.arm.com/documentation/ddi0602/2022-06/Shared-Pseudocode/AArch64-Functions?lang=en#impl-aarch64.CurrentVL.read.none

but i get totally lost in the ref-to-this, ref-to-that,
if-optional-thing-enabled use this, if-not-use-that.

urr.... https://www.realworldtech.com/forum/?threadid=202688&curpostid=202688
might as well ask
https://www.realworldtech.com/forum/?threadid=202688&curpostid=207731
Comment 17 Jacob Lifshay 2022-07-29 10:33:55 BST
as mentioned on irc:
I changed the comparison table to use footnotes and changed texmunge.py to make the tex work with multiple references to the same footnote. I managed to cram it all on one page by reducing the font size of the footnotes somewhat...
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=93b706bca64112a29a75cbb8e0f1a3fea80e5e2a

lkcl last I checked I don't have ftp write access (was lost at some point apparently), so I can't upload the new pdf version.
Comment 18 Luke Kenneth Casson Leighton 2022-07-29 13:28:54 BST
(In reply to Jacob Lifshay from comment #17)
> as mentioned on irc:
> I changed the comparison table to use footnotes and changed texmunge.py to
> make the tex work with multiple references to the same footnote. I managed
> to cram it all on one page by reducing the font size of the footnotes
> somewhat...
> https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;
> h=93b706bca64112a29a75cbb8e0f1a3fea80e5e2a

brilliant.  i removed the names back down to numbers, reducing the
xterm width from 380 characters to a "mere" 250.  it now again fits 50%
of a 3840x2160 resolution LCD so is (barely) manageable

the use of footnotes themselves are brilliant, saves a lot of hassle.

> lkcl last I checked I don't have ftp write access (was lost at some point
> apparently), so I can't upload the new pdf version.

sftp.  i am doing "make upload" appx 15-20x a day at the moment.
Comment 19 Luke Kenneth Casson Leighton 2022-07-29 13:33:17 BST
i also added MyISA 66000 which as can be seen on the latest libre-soc-dev
discussion can take between 1 week and 2 YEARS to comprehend.  it is an
unlimited-scalable "Hardware Auto-Vectorisation" ISA limited to LD/ST
as Vector start/end points and to a 1D LOOP Construct.  it warrants addition
simply because the bang-per-buck is so high within the capability it gives.
it is however not a panacea.
Comment 20 Luke Kenneth Casson Leighton 2022-07-29 23:49:44 BST
(In reply to Luke Kenneth Casson Leighton from comment #16)

> urr.... https://www.realworldtech.com/forum/?threadid=202688&curpostid=202688
> might as well ask
> https://www.realworldtech.com/forum/?threadid=202688&curpostid=207731

someone called dmcq very kindly replied, prompting me to investigate
the pseudocode, which magically a day later is making sense to me.

https://www.realworldtech.com/forum/?threadid=202688&curpostid=207774

the outer-product instructions are definitely power-of-two boundaried,
the "tiles" must be squares. a 128-bit silicon-partner choice would
result in 2x2 64-bit outer-product, a 4x4 32-bit outer-product.

there is no way to stop overwriting of destinations on non-power-two
boundaries, but there *is* a way to stop wasting of CPU cycles on
multiply-by-zero-and-adds, by pre-running some zero-detection instructions
and putting the result of that detection into predicate masks.
there's *two* predicate source masks for that purpose: one for N
one for M.