Bug 1229 - fosdem2024 llvm simple-v
Summary: fosdem2024 llvm simple-v
Status: DEFERRED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Conferences (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL: https://pretalx.fosdem.org/fosdem-202...
Depends on:
Blocks: 1070
  Show dependency treegraph
 
Reported: 2023-12-02 21:09 GMT by Luke Kenneth Casson Leighton
Modified: 2023-12-21 13:41 GMT (History)
6 users (show)

See Also:
NLnet milestone: Future
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-12-02 21:09:41 GMT
Your proposal: Simple-V not-vectorisation-but-looping: a Vector ISA without Vector registers

review here
https://pretalx.fosdem.org/fosdem-2024/talk/review/EQYWDUTBQXCS87RVXZDK7JECVXUXS8NL

see and edit your proposal at
https://pretalx.fosdem.org/fosdem-2024/me/submissions/9GXYMY/.

reviewers rejected (good reasons)
https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-December/005896.html

TODO: discuss and plan a "compiler" grant request
Comment 1 Jacob Lifshay 2023-12-03 02:33:05 GMT
some points:
the register keyword won't be going away completely since the register keyword has a non-deprecated use around inline assembly (this isn't part of the C or C++ standards, this is a GCC extension that Clang also implements):
// tells the compiler to put a in r3 (clang does this only for
// inline assembly so a could be stored elsewhere outside of
// any inline assembly blocks, gcc does more than that, idk
// how much)
register long a asm("r3");
asm("addi r3, r3, 1" : "+r"(a));  // increments a

for LLVM IR, we will be having standard existing vector IR translate to SimpleV ops in the backend, not something like having only SimpleV prefixed IR and never any vector ops or something like that.
Comment 2 Jacob Lifshay 2023-12-03 02:35:17 GMT
(In reply to Jacob Lifshay from comment #1)
> for LLVM IR, we will be having standard existing vector IR translate to
> SimpleV ops in the backend, not something like having only SimpleV prefixed
> IR and never any vector ops or something like that.

to be clear, I expect us to eventually have both standard vector and probably-SV-specific IR translate to SV.
Comment 3 Jacob Lifshay 2023-12-03 02:48:26 GMT
also, we should make clear that any C syntax we have for declaring/using SV ops is marked as draft/wip/etc. since I don't necessarily think register int my_vec[MAXVL]; is the best choice. e.g., it's missing SUBVL and I expect the compiler devs will complain about it not having any SV-specific keywords/attributes thereby using a part of the syntax-space that they might want for something else.
Comment 4 Luke Kenneth Casson Leighton 2023-12-03 04:38:02 GMT
(In reply to Jacob Lifshay from comment #1)
> some points:
> the register keyword won't be going away completely

whew

>
> register long a asm("r3");
> asm("addi r3, r3, 1" : "+r"(a));  // increments a

what i envision is that this:
   register long a[10] asm ("*r3")

literally traslates to
   setvl maxvl=10

and marks r3 as vector, such that sv.addi works as expected.
following on from that...

> for LLVM IR, we will be having standard existing vector IR translate to
> SimpleV ops in the backend, not something like having only SimpleV prefixed
> IR and never any vector ops or something like that.

... i then envisage an IR primitive that LITERALLY and precisely
without fail without exception represents the FULL and complete
capability of SV Prefixing. exactly precisely fully absolute
100% without fail absolute total and full representation of
the full absolute SV concept.

no vector instructions =>
no vector IR.

instead:

prefix isa =>
prefix hardware =>
prefix IR =>
prefix assembler.

we are ****NOT**** doing 1.5 million intrinsics. i am not
having it. no, caches are not ok. no, JIT-intrinsic-generation
is not ok.

if we have to invent a new IR syntax to support loop-prefixing
on top of (existing, small) scalar IR that is perfectly fine
with me.
Comment 5 Luke Kenneth Casson Leighton 2023-12-03 04:46:59 GMT
(In reply to Jacob Lifshay from comment #3)
> also, we should make clear that any C syntax we have for declaring/using SV
> ops is marked as draft/wip/etc. 

of course.

> since I don't necessarily think register int
> my_vec[MAXVL]; is the best choice. e.g., it's missing SUBVL

struct vec2 register[NNN]. sorted.

>  and I expect the
> compiler devs will complain about it not having any SV-specific
> keywords/attributes 

attributes will make programmer's lives hell (unreadable crap).
plus deviation from standard c will make code-conversion that
much more work. to be severely avoided regardless of potential
complaints.

> thereby using a part of the syntax-space that they might
> want for something else.

for EUR 50,000 which disappears on embecosm's budget in under
3 months, even talking with other developers during the *design*
phase will be firmly out of scope.

if however other devs help out rather than complain then i have
absolutely no problem at all.
Comment 6 Jacob Lifshay 2023-12-03 04:52:53 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> > for LLVM IR, we will be having standard existing vector IR translate to
> > SimpleV ops in the backend, not something like having only SimpleV prefixed
> > IR and never any vector ops or something like that.
> 
> ... i then envisage an IR primitive that LITERALLY and precisely
> without fail without exception represents the FULL and complete
> capability of SV Prefixing. exactly precisely fully absolute
> 100% without fail absolute total and full representation of
> the full absolute SV concept.

fine with me, though I imagine you will have a tough time convincing gcc/llvm maintainers to redo their IR...

> no vector instructions =>
> no vector IR.

we *need* vector IR support (supporting *already existing LLVM IR*) because it allows existing vector code to compile to SimpleV without having to rewrite all software everywhere. Also, all of LLVM's existing optimization pipelines and everything else all uses vector IR, trying to force everyone to use only SimpleV IR to get anything beyond scalar performance is going to make SimpleV fail, because we're currently the little dog and most programmers don't care about SimpleV and will just use stuff that makes vector IR regardless.

Remember, supporting existing vector LLVM IR is similar to supporting existing PowerPC software or to supporting existing C software, we care about backward compatibility because we *can't* rewrite the world.
Comment 7 Jacob Lifshay 2023-12-03 05:02:32 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to Jacob Lifshay from comment #1)
> > some points:
> > the register keyword won't be going away completely
> 
> whew
> 
> >
> > register long a asm("r3");
> > asm("addi r3, r3, 1" : "+r"(a));  // increments a
> 
> what i envision is that this:
>    register long a[10] asm ("*r3")

I think we should focus on not requiring specifying exact registers, since imo that's like half the benefit of compilers -- you don't have to figure out where everything goes, since that's really hard for humans. e.g. it took me *hours/days* to figure out where all the registers should go for divmod, and that's a short function!

> 
> literally traslates to
>    setvl maxvl=10

I think we should avoid requiring the programmer to specify where all the setvls go, instead we just rely on vector types to track that information, and the compiler can then insert setvl instructions where necessary. additionally, compiler optimizations tend to work much better when it doesn't have to worry about a bunch of global state (VL), and can just reintroduce (insert setvl ops) the global state at the end after doing the optimizations.

> 
> and marks r3 as vector, such that sv.addi works as expected.
> following on from that...

I think it should just mark `a` as a vector, not r3, since a has the type `register long[10]` which tells the compiler that it's a vector with MAXVL=10, so the compiler can assign it to any convenient spot or even optimize it out completely rather than being forced to keep it in r3 and key all scalar/vector decisions off of whether r3 is mentioned or not.
Comment 8 Jacob Lifshay 2023-12-03 05:04:28 GMT
(In reply to Jacob Lifshay from comment #7)
> (In reply to Luke Kenneth Casson Leighton from comment #4)
> > what i envision is that this:
> >    register long a[10] asm ("*r3")
> 
> I think we should focus on not requiring specifying exact registers

to be clear, this means you *can* specify asm("r3") when you need it to be in r3 for some inline asm, you're just not required to when you want to let the compiler pick.
Comment 9 Jacob Lifshay 2023-12-03 05:12:52 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)
> attributes will make programmer's lives hell (unreadable crap).

actually, attributes have gotten a whole lot better recently, since C/C++ introduced a new attribute syntax:
e.g.:
[[noreturn]] void f() {
    abort();
}

instead of:
void f() __attribute__((noreturn));
void f() {
    abort();
}

so I think something like this could be good:

say you want a vector with MAXVL=12, VL=vl, SUBVL=3, and element type float:

[[sv_vec(12)]] float my_vector[vl][3];

or a fixed-length vector with MAXVL=VL=21, SUBVL=1, and element type uint16_t:

[[sv_vec]] uint16_t my_vector2[21];

the [[sv_vec]] could also go at the end if you prefer that (C/C++ lets you put it a few different places):

uint16_t my_vector2[21] [[sv_vec]];
Comment 10 Luke Kenneth Casson Leighton 2023-12-03 09:26:38 GMT
(In reply to Jacob Lifshay from comment #9)

> actually, attributes have gotten a whole lot better recently, since C/C++
> introduced a new attribute syntax:

please please i know you love "latest and greatest" but it severely
interferes with simplicity and practical "get-it-done-in-scope-on-budget"

> e.g.:
> [[noreturn]] 

**NO**. absolutely not. end of this line of reasoning.
i said ANY deviation from standard c is unacceptable.

it interferes with the addition of autovectorisation
passes, later [alexandre oliva is the leading expert
in gcc autovectorisation]

plus can you imagine the nightmare of porting? hundreds
of millions of lines of code, stupid "attributes" all
over the place whilst we raise the USD 50 million to have
a go at proper autovectorisation in both gcc and llvm?

we have to be *smart* about this jacob.

attributes is an extremely bad idea in multiple ways.

> [[sv_vec]] uint16_t my_vector2[21];

i said NO on attributes! please do listen!
Comment 11 Luke Kenneth Casson Leighton 2023-12-03 09:32:33 GMT
(In reply to Jacob Lifshay from comment #7)
> (In reply to Luke Kenneth Casson Leighton from comment #4)
> > (In reply to Jacob Lifshay from comment #1)
> > > some points:
> > > the register keyword won't be going away completely
> > 
> > whew
> > 
> > >
> > > register long a asm("r3");
> > > asm("addi r3, r3, 1" : "+r"(a));  // increments a
> > 
> > what i envision is that this:
> >    register long a[10] asm ("*r3")
> 
> I think we should focus on not requiring specifying exact registers, since
> imo that's like half the benefit of compilers -- you don't have to figure
> out where everything goes, since that's really hard for humans. e.g. it took
> me *hours/days* to figure out where all the registers should go for divmod,
> and that's a short function!
> 
> > 
> > literally traslates to
> >    setvl maxvl=10
> 
> I think we should avoid requiring the programmer to specify where all the
> setvls go, instead we just rely on vector types to track that information,
> and the compiler can then insert setvl instructions where necessary.

yes.  i missed out several compiler and assembly-level peephole optimisation
passes there for simplicity (at 5am)

> additionally, compiler optimizations tend to work much better when it
> doesn't have to worry about a bunch of global state (VL), and can just
> reintroduce (insert setvl ops) the global state at the end after doing the
> optimizations.

... which is where the correct design of the IR-prefix-representing-SV
comes into play but ultimately shoud define pretty much exactly the
current SVP64 SPRs.

> 
> > 
> > and marks r3 as vector, such that sv.addi works as expected.
> > following on from that...
> 
> I think it should just mark `a` as a vector, 

yes, sorry, wasn't clear, yes absolutely.

> not r3, since a has the type
> `register long[10]` which tells the compiler that it's a vector with
> MAXVL=10, 

you got it in 1.

so the compiler can assign it to any convenient spot or even
> optimize it out completely rather than being forced to keep it in r3 and key
> all scalar/vector decisions off of whether r3 is mentioned or not.

yyep.

then any loops can easily be autovectorized, vertical-first is going to
be astonishingly laughably simple to implement.  HF a little harder but
doable with the right IR passes and checking that element variables within
the loop are all 100% independent.
Comment 12 Jacob Lifshay 2023-12-03 10:04:11 GMT
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #9)
> 
> > actually, attributes have gotten a whole lot better recently, since C/C++
> > introduced a new attribute syntax:
> 
> please please i know you love "latest and greatest" but it severely
> interferes with simplicity and practical "get-it-done-in-scope-on-budget"
> 
> > e.g.:
> > [[noreturn]] 
> 
> **NO**. absolutely not. end of this line of reasoning.
> i said ANY deviation from standard c is unacceptable.

well, too bad, *any* syntax we use to express SV operations is not standard C (including the `register` array syntax, the C standard doesn't really allow you to use register on arrays).

why not use the syntax *standard* C explicitly provides for extensions instead of inventing our own in a way that seems like a syntax land-grab (which is what just using `register` like that seems to me and probably to other compiler devs)?

> 
> it interferes with the addition of autovectorisation
> passes, later [alexandre oliva is the leading expert
> in gcc autovectorisation]

this is C *syntax*, by the time any autovectorization passes are run, *everything* is in compiler IR, and essentially *none* of the source syntax is left. compiler IR is *not* C. if we pick `register` or if we pick `[[sv_vec]]` they will both translate to the same compiler IR.

> plus can you imagine the nightmare of porting? hundreds
> of millions of lines of code, stupid "attributes" all
> over the place whilst we raise the USD 50 million to have
> a go at proper autovectorisation in both gcc and llvm?

autovectorization is independent of which C frontend syntax we pick.

the whole point of autovectorization is to automatically convert *scalar code* to vectorized code, so any way we express SV operations, that's manual vectorization, so the autovectorizer doesn't generally do anything with the manually vectorized code.

> we have to be *smart* about this jacob.

yes, we have to make arguments that hold water if we're going to try to argue for or against attributes.
Comment 13 Jacob Lifshay 2023-12-03 10:10:10 GMT
(In reply to Jacob Lifshay from comment #12)
> the whole point of autovectorization is to automatically convert *scalar
> code* to vectorized code, so any way we express SV operations, that's manual
> vectorization, so the autovectorizer doesn't generally do anything with the
> manually vectorized code.

autovectorization is what converts loops like:
for(int i = 0; i < 8; i++)
    a[i] = b[i] + c[i];

to:
*(vec8 *)a = *(vec8 *)b + *(vec8 *)c;

if you give it already vectorized code (aka. basically any code using SV vector syntax), it will just not change it.
Comment 14 Jacob Lifshay 2023-12-03 10:43:52 GMT
ah, so you meant something like (for some fixed-vector type vec_t):

void attempt_autovec_mul_add(vec_t *a, vec_t *b, vec_t *c, vec_t *r, int n) {
    for(int o = 0; o < n; o++) {
        for(int i = 0; i < sizeof(vec_t) / sizeof(float); i++) {
            r[o][i] = a[o][i] + b[o][i] * c[o][i];
        }
    }
}

LLVM can vectorize that (when adding __restrict to the pointers) to be effectively:
void attempt_autovec_mul_add(vec_t *a, vec_t *b, vec_t *c, vec_t *r, int n) {
    for(int o = 0; o < n; o++) {
        r[o] = a[o] + b[o] * c[o];
    }
}

https://gcc.godbolt.org/z/8x4av199K

I don't expect that to behave differently for SimpleV no matter which C syntax we pick.