Bug 858 - SVP64 Primer Documentation
Summary: SVP64 Primer Documentation
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Documentation (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Andrey Miroshnikov
URL:
Depends on:
Blocks: 243
  Show dependency treegraph
 
Reported: 2022-06-15 23:25 BST by Andrey Miroshnikov
Modified: 2022-07-24 16:06 BST (History)
6 users (show)

See Also:
NLnet milestone: NLNet.2019.10.046.Standards
total budget (EUR) for completion of task and all subtasks: 3000
budget (EUR) for this task, excluding subtasks' budget: 3000
parent task for budget allocation: 243
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
lkcl={amount=1500, submitted=2022-07-04, paid=2022-07-08} # originally submitted by email 2022-06-27 andrey={amount=1500, submitted=2022-07-19, paid = 2022-07-21}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Miroshnikov 2022-06-15 23:25:53 BST
Bug for the SVP64 primer document outlining the usecases and advantages of the SVP64 specification.
Comment 1 Luke Kenneth Casson Leighton 2022-06-16 09:37:01 BST
andrey if you need help with any SVG images Veera can help
just draw them quick by hand.
Comment 2 djac 2022-06-16 09:59:49 BST
Andrey, in summary, the purpose of the sequence of documents it to communicate, explain and persuade.  Starting with an overview where an individual who is new to the topic can quickly grasp the principles, features and benefits of SVP64 it will be the prologue to the next section which will contain greater detail and will form the basis of future programming/application/training guides.  If you need help to review as you go along just ask.
Comment 3 djac 2022-06-17 09:08:27 BST
Input from Paul "Something usable within a couple of weeks would be good.  I have an opportunity to set up a meeting with the people inside IBM who are looking at scalable vector architectures and this would be really useful for that"
Comment 4 Luke Kenneth Casson Leighton 2022-06-17 10:23:47 BST
(In reply to djac from comment #3)
> Input from Paul "Something usable within a couple of weeks would be good.  I
> have an opportunity to set up a meeting with the people inside IBM who are
> looking at scalable vector architectures and this would be really useful for
> that"

ahh that's very interesting.  so actually, if they're evaluating *all*
available scalable vector architectures, then that's very different.
firstly it means they'll want to know what those are, secondly it's
reasonable i feel to be able to assume they're extremely intelligent
and know what they're looking at, and thirdly that they'll be doing
a "this vs that" so anything we can help them with there, to know
what the differences are, would i feel be beneficial.  thoughts?
Comment 5 djac 2022-06-17 10:45:55 BST
The KISS maxim applies.  

Comprehensive but simple.  We should have a 3-way Zoom on Monday.
Comment 6 Luke Kenneth Casson Leighton 2022-06-17 11:16:25 BST
(In reply to djac from comment #5)
> The KISS maxim applies.  
> 
> Comprehensive but simple.  We should have a 3-way Zoom on Monday.

ack.  nothing like this then: :)

https://ftp.libre-soc.org/20220617_110034.jpg
https://youtu.be/1SsMVP1CTFI
Comment 7 Andrey Miroshnikov 2022-06-17 14:06:26 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> ahh that's very interesting.  so actually, if they're evaluating *all*
> available scalable vector architectures, then that's very different.
Interesting, so we'll need a nice little comparison page of all those arch's. No pressure :D

> firstly it means they'll want to know what those are, secondly it's
> reasonable i feel to be able to assume they're extremely intelligent
That means we'll need brief, yet deep content for them to look over?

> what the differences are, would i feel be beneficial.  thoughts?
A comparison sounds like an easy way to demonstrate the power of SV.

(In reply to djac from comment #5)
> The KISS maxim applies.  
> 
> Comprehensive but simple.  We should have a 3-way Zoom on Monday.
Will be there.

(In reply to Luke Kenneth Casson Leighton from comment #6)
> ack.  nothing like this then: :)
> 
> https://ftp.libre-soc.org/20220617_110034.jpg
> https://youtu.be/1SsMVP1CTFI
Actually the diagram is a good idea. Just scale down to fit on a page.
The video is still processing, so haven't seen it yet.

I submitted my latest content here:
https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=svp64-primer/summary.tex;h=b1b47f8754efc13699df898941cf55002fc823cb;hb=c622226176550788c3cf447db36f4fde07bff16f

The summary.tex file is a secondary file, you'll also find the main one, bibliography, and acronyms (figured a good idea as we have so many, even if we don't use in the primer).

The text is a bit of a mess, and I haven't added any examples yet (wasn't sure if you wanted to use the code from the sigarch article, or use of our own). Currently I have a brief summary of SIMD, Vector Processing, and started to add SV. The text as is, is too big, but I'll leave it for now. We can cut away for the primer next week.


I will now be heading off on my weekend camping trip, but can check bugs/logs, David can reach me if needed.
Comment 8 Luke Kenneth Casson Leighton 2022-06-17 18:13:42 BST
another thing we need to establish: what is IBM looking for? as in: what "features", are they looking for high performance, low power, compiler toolchain support, capabilities: we don't know yet.
knowing these things would radically alter what we write.
Comment 9 Luke Kenneth Casson Leighton 2022-06-17 19:06:20 BST
(In reply to Andrey Miroshnikov from comment #7)
> Interesting, so we'll need a nice little comparison page of all those
> arch's. No pressure :D

number of instructions says it all, really.

> > firstly it means they'll want to know what those are, secondly it's
> > reasonable i feel to be able to assume they're extremely intelligent
> That means we'll need brief, yet deep content for them to look over?

David suggested a 2-3 page document with features only that would leave them
wishing/wanting to ask more questions.

> Actually the diagram is a good idea. Just scale down to fit on a page.
> The video is still processing, so haven't seen it yet.

still uploading *quail*

> I submitted my latest content here:
> https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=svp64-primer/summary.
> tex;h=b1b47f8754efc13699df898941cf55002fc823cb;
> hb=c622226176550788c3cf447db36f4fde07bff16f

great. made some clarifications.
 
> I will now be heading off on my weekend camping trip, but can check
> bugs/logs, David can reach me if needed.

brilliant. enjoy.
Comment 10 Luke Kenneth Casson Leighton 2022-06-18 11:21:04 BST
important features and benefits to mention:

* The v3.1 Specification is not altered in any way.
* Specifically designed to be easily implemented
  on top of an existing Micro-architecture (especially
  Superscalar Out-of-Order Multi-issue) without
  disruptive full architectural redesigns.
* Divided into Compliancy Levels to suit differing needs.
* At the highest Compliancy Level only requires four instructions
  (SVE2 requires appx 9,000. AVX-512 around 10,000. RVV around
  300).
* Predication, an often-requested feature, is added cleanly to the
  Power ISA (without modifying the v3.1 Power ISA)
* In-registers arbitrary-sized Matrix Multiply is achieved in three
  instructions (without adding any v3.1 Power ISA instructions)
* Full DCT and FFT RADIX2 Triple-loops are achieved with dramatically
  reduced instruction count, and power consumption expected to greatly
  reduce. Normally found only in high-end VLIW DSPs (TI MSP, Qualcomm
  Hexagon)
* Fail-First Load/Store allows strncpy to be implemented in around 14
  instructions (Optimised VSX assembler is 240).
* Inner loop of MP3 implemented in under 100 instructions
  (gcc produces 450 for the same function)

All areas investigated so far consistently showed reductions in executable
size, which as outlined in {SIMD_HARM} has an indirect reduction in
power consumption due both to less I-Cache/TLB pressure and Issue remaining
idle.
Comment 11 Luke Kenneth Casson Leighton 2022-06-18 13:45:33 BST
added, please review / critique
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=57ac938a4074d54c86267a74fda14ecbb1a7b086
Comment 12 Luke Kenneth Casson Leighton 2022-06-18 18:31:34 BST
hmmm.. this image, which is how RISC-V works, i don't think
helps us.  i totally get that it's based on an SRAM of a fixed
size: it just isn't how SV works.

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=svp64-primer/img/vl_reg_n.jpg;hb=HEAD

SV is actually much more like how MMX works:

register r0:

    bytes    0  1  2  3  4  5  6  7
    64-bit  |<-------------------->|
    32-bit  |<--------->|<-------->|
    16-bit  |<--->|<--->|<--->|<-->|
    8-bit   |<->|    etc        |<>| 

register r1:

    bytes    0  1  2  3  4  5  6  7
    64-bit  |<-------------------->|
    32-bit  |<--------->|<-------->|
    16-bit  |<--->|<--->|<--->|<-->|
    8-bit   |<->|    etc        |<>| 

r2,3,4.......
.....r126

register r127:

    bytes    0  1  2  3  4  5  6  7
    64-bit  |<-------------------->|
    32-bit  |<--------->|<-------->|
    16-bit  |<--->|<--->|<--->|<-->|
    8-bit   |<->|    etc        |<>|
Comment 13 Luke Kenneth Casson Leighton 2022-06-18 18:58:28 BST
this is how SVP64 registers work:

   https://ftp.libre-soc.org/20220618_184935.jpg

you get a *rollover* effect in the ***SCALAR***
register file.
Comment 14 Luke Kenneth Casson Leighton 2022-06-18 19:45:53 BST
(In reply to Luke Kenneth Casson Leighton from comment #12)
> hmmm.. this image, which is how RISC-V works, i don't think
> helps us.

because it's too complex to explain. the Cray vector regfile is way simpler


     0  .....   63 elements
  
v0
v1
v2
..
v7

registers
Comment 15 Luke Kenneth Casson Leighton 2022-06-18 22:32:41 BST
(In reply to Luke Kenneth Casson Leighton from comment #14)

> because it's too complex to explain. the Cray vector regfile is way simpler

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=4c9273dcdfea9ec7ce4f955846280972431239f6
Comment 16 djac 2022-06-20 12:49:07 BST
Zoom at 3pm
Comment 17 Luke Kenneth Casson Leighton 2022-06-20 15:23:14 BST
https://ftp.libre-soc.org/20220620_151109.jpg
Comment 18 Luke Kenneth Casson Leighton 2022-06-20 16:11:56 BST
(In reply to Luke Kenneth Casson Leighton from comment #17)
> https://ftp.libre-soc.org/20220620_151109.jpg

added, B&W

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=e36b59c1e3f13b3732a19b517c999f441c66ad73
Comment 19 Luke Kenneth Casson Leighton 2022-06-20 16:25:30 BST
now includes URLs in the bibliography
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=147c7e52eabba2449fe1a9fccd5f5846aff70bc1
Comment 20 Luke Kenneth Casson Leighton 2022-06-20 20:40:58 BST
took these out, if they go back in they should be part of
(merged into) the advantages subsubsection of SV.  where they were,
they were repetition so getting annoying

-\subsubsection{Prefix 64 - SVP64}
-
-SVP64, is a specification designed to solve the problems caused by
-SIMD implementations by:
-\begin{itemize}
-       \item Simplifying the hardware design
-       \item Reducing maintenance overhead
-       \item Reducing code size and power consumption
-       \item Easier for compilers, coders, documentation
-       \item Time to support platform is a fraction of conventional SIMD
-             (Less money on R\&D, faster to deliver)
-\end{itemize}
Comment 21 Andrey Miroshnikov 2022-06-20 21:50:54 BST
Sent the draft primer to Paul.
Comment 22 Andrey Miroshnikov 2022-06-20 21:56:06 BST
I apologise, must've selected the pulldown by mistake. Changing back to "documentation"
Comment 24 Andrey Miroshnikov 2022-06-27 15:00:08 BST
(In reply to Luke Kenneth Casson Leighton from comment #23)
> these images need converting to SVG

As per today's earlier conversation with Luke: https://libre-soc.org/irclog/latest.log.html#t2022-06-27T11:11:54

In addition I converted the SIMD diagram (I apologise if this was unnecessary)

Each one has to be exported to PNG (as LaTex doesn't support it by default)

See the files here:
https://git.libre-soc.org/?p=libreriscv.git;a=tree;f=svp64-primer/img;h=fc0680f7b9ea21a5fce6c1bf502258400af0e9c7;hb=HEAD

Once Luke's had a look and is happy with them, I can delete the JPGs from the repo.
Comment 25 Luke Kenneth Casson Leighton 2022-06-27 15:02:51 BST
these look great Andrey let's close this after removing jpgs.
nice work
Comment 26 Andrey Miroshnikov 2022-06-27 19:43:42 BST
Old jpg's deleted, closing this bug.
https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=c5fd0af78789bb34709227bc8b4e850a72349cae
Comment 27 Luke Kenneth Casson Leighton 2022-07-19 14:55:59 BST
andrey i'm re-submitting this via the secret URL system
as a single RFP combined with bug #858 and bug #875
altering the submission date accordingly
Comment 28 Toshaan Bharvani 2022-07-20 08:38:15 BST
Is there a way we can compare SVP64 to other scalar vector systems.
As SVP64 does not use (fixed or predicated) SIMD, but a pure scalar vector, is a comparison even possible?
I am thinking of comparing it with AVX(2/512), SVE2, RVV.
If possible, we should have a short section on that.
Comment 29 Luke Kenneth Casson Leighton 2022-07-22 11:10:05 BST
(In reply to Toshaan Bharvani from comment #28)
> Is there a way we can compare SVP64 to other scalar vector systems.
> As SVP64 does not use (fixed or predicated) SIMD, but a pure scalar vector,
> is a comparison even possible?

yes because you use fixed (or, better predicated) SIMD at the back-end.
i just added a diagram (by Veera) which helps explain.

https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=svp64-primer/img/sv_multi_issue.svg;hb=HEAD

> I am thinking of comparing it with AVX(2/512), SVE2, RVV.
> If possible, we should have a short section on that.

sure. i did a summary
Comment 31 Luke Kenneth Casson Leighton 2022-07-22 11:24:16 BST
make sure to spell out what SVP64 is not:

* not RVV. not based on RVV. based on original Cray concept.
* not based on any known other ISA, is its own intuitive concept
Comment 32 Luke Kenneth Casson Leighton 2022-07-22 11:31:06 BST
Comparison table  headings

* Number of instructions
* Scalable yes/no
* Predication masks yes/no
* Explicit vector registers
* 128-Bit
* biginteger capability
* Load/Store Fail/First
* Twin predication
* Data-dependent fail-first
* Predicate-result

need to think in 2 Dimensions.  instructions, vertical, registers horizontal.


GPUs shoild have bqckend massive wide SIMD.  frontend is atill exact same SVP64 ISA.  makes life much easier because uniform
Comment 33 Luke Kenneth Casson Leighton 2022-07-22 11:45:27 BST
executive summary (2 page) book zero. use arefs how can be done (source code,
unit tests)

primer is too technical (book 1)

merge into same document.

"please contact if questions"

add revision history with version numbers
put together quickly, will be updated,
Comment 34 Luke Kenneth Casson Leighton 2022-07-22 12:01:06 BST
do a first page title logo etc.
who composed it.
version number
license? skip it

end: these are contact details.