Your proposal: Simple-V not-vectorisation-but-looping: a Vector ISA without Vector registers review here https://pretalx.fosdem.org/fosdem-2024/talk/review/EQYWDUTBQXCS87RVXZDK7JECVXUXS8NL see and edit your proposal at https://pretalx.fosdem.org/fosdem-2024/me/submissions/9GXYMY/. reviewers rejected (good reasons) https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-December/005896.html TODO: discuss and plan a "compiler" grant request
some points: the register keyword won't be going away completely since the register keyword has a non-deprecated use around inline assembly (this isn't part of the C or C++ standards, this is a GCC extension that Clang also implements): // tells the compiler to put a in r3 (clang does this only for // inline assembly so a could be stored elsewhere outside of // any inline assembly blocks, gcc does more than that, idk // how much) register long a asm("r3"); asm("addi r3, r3, 1" : "+r"(a)); // increments a for LLVM IR, we will be having standard existing vector IR translate to SimpleV ops in the backend, not something like having only SimpleV prefixed IR and never any vector ops or something like that.
(In reply to Jacob Lifshay from comment #1) > for LLVM IR, we will be having standard existing vector IR translate to > SimpleV ops in the backend, not something like having only SimpleV prefixed > IR and never any vector ops or something like that. to be clear, I expect us to eventually have both standard vector and probably-SV-specific IR translate to SV.
also, we should make clear that any C syntax we have for declaring/using SV ops is marked as draft/wip/etc. since I don't necessarily think register int my_vec[MAXVL]; is the best choice. e.g., it's missing SUBVL and I expect the compiler devs will complain about it not having any SV-specific keywords/attributes thereby using a part of the syntax-space that they might want for something else.
(In reply to Jacob Lifshay from comment #1) > some points: > the register keyword won't be going away completely whew > > register long a asm("r3"); > asm("addi r3, r3, 1" : "+r"(a)); // increments a what i envision is that this: register long a[10] asm ("*r3") literally traslates to setvl maxvl=10 and marks r3 as vector, such that sv.addi works as expected. following on from that... > for LLVM IR, we will be having standard existing vector IR translate to > SimpleV ops in the backend, not something like having only SimpleV prefixed > IR and never any vector ops or something like that. ... i then envisage an IR primitive that LITERALLY and precisely without fail without exception represents the FULL and complete capability of SV Prefixing. exactly precisely fully absolute 100% without fail absolute total and full representation of the full absolute SV concept. no vector instructions => no vector IR. instead: prefix isa => prefix hardware => prefix IR => prefix assembler. we are ****NOT**** doing 1.5 million intrinsics. i am not having it. no, caches are not ok. no, JIT-intrinsic-generation is not ok. if we have to invent a new IR syntax to support loop-prefixing on top of (existing, small) scalar IR that is perfectly fine with me.
(In reply to Jacob Lifshay from comment #3) > also, we should make clear that any C syntax we have for declaring/using SV > ops is marked as draft/wip/etc. of course. > since I don't necessarily think register int > my_vec[MAXVL]; is the best choice. e.g., it's missing SUBVL struct vec2 register[NNN]. sorted. > and I expect the > compiler devs will complain about it not having any SV-specific > keywords/attributes attributes will make programmer's lives hell (unreadable crap). plus deviation from standard c will make code-conversion that much more work. to be severely avoided regardless of potential complaints. > thereby using a part of the syntax-space that they might > want for something else. for EUR 50,000 which disappears on embecosm's budget in under 3 months, even talking with other developers during the *design* phase will be firmly out of scope. if however other devs help out rather than complain then i have absolutely no problem at all.
(In reply to Luke Kenneth Casson Leighton from comment #4) > > for LLVM IR, we will be having standard existing vector IR translate to > > SimpleV ops in the backend, not something like having only SimpleV prefixed > > IR and never any vector ops or something like that. > > ... i then envisage an IR primitive that LITERALLY and precisely > without fail without exception represents the FULL and complete > capability of SV Prefixing. exactly precisely fully absolute > 100% without fail absolute total and full representation of > the full absolute SV concept. fine with me, though I imagine you will have a tough time convincing gcc/llvm maintainers to redo their IR... > no vector instructions => > no vector IR. we *need* vector IR support (supporting *already existing LLVM IR*) because it allows existing vector code to compile to SimpleV without having to rewrite all software everywhere. Also, all of LLVM's existing optimization pipelines and everything else all uses vector IR, trying to force everyone to use only SimpleV IR to get anything beyond scalar performance is going to make SimpleV fail, because we're currently the little dog and most programmers don't care about SimpleV and will just use stuff that makes vector IR regardless. Remember, supporting existing vector LLVM IR is similar to supporting existing PowerPC software or to supporting existing C software, we care about backward compatibility because we *can't* rewrite the world.
(In reply to Luke Kenneth Casson Leighton from comment #4) > (In reply to Jacob Lifshay from comment #1) > > some points: > > the register keyword won't be going away completely > > whew > > > > > register long a asm("r3"); > > asm("addi r3, r3, 1" : "+r"(a)); // increments a > > what i envision is that this: > register long a[10] asm ("*r3") I think we should focus on not requiring specifying exact registers, since imo that's like half the benefit of compilers -- you don't have to figure out where everything goes, since that's really hard for humans. e.g. it took me *hours/days* to figure out where all the registers should go for divmod, and that's a short function! > > literally traslates to > setvl maxvl=10 I think we should avoid requiring the programmer to specify where all the setvls go, instead we just rely on vector types to track that information, and the compiler can then insert setvl instructions where necessary. additionally, compiler optimizations tend to work much better when it doesn't have to worry about a bunch of global state (VL), and can just reintroduce (insert setvl ops) the global state at the end after doing the optimizations. > > and marks r3 as vector, such that sv.addi works as expected. > following on from that... I think it should just mark `a` as a vector, not r3, since a has the type `register long[10]` which tells the compiler that it's a vector with MAXVL=10, so the compiler can assign it to any convenient spot or even optimize it out completely rather than being forced to keep it in r3 and key all scalar/vector decisions off of whether r3 is mentioned or not.
(In reply to Jacob Lifshay from comment #7) > (In reply to Luke Kenneth Casson Leighton from comment #4) > > what i envision is that this: > > register long a[10] asm ("*r3") > > I think we should focus on not requiring specifying exact registers to be clear, this means you *can* specify asm("r3") when you need it to be in r3 for some inline asm, you're just not required to when you want to let the compiler pick.
(In reply to Luke Kenneth Casson Leighton from comment #5) > attributes will make programmer's lives hell (unreadable crap). actually, attributes have gotten a whole lot better recently, since C/C++ introduced a new attribute syntax: e.g.: [[noreturn]] void f() { abort(); } instead of: void f() __attribute__((noreturn)); void f() { abort(); } so I think something like this could be good: say you want a vector with MAXVL=12, VL=vl, SUBVL=3, and element type float: [[sv_vec(12)]] float my_vector[vl][3]; or a fixed-length vector with MAXVL=VL=21, SUBVL=1, and element type uint16_t: [[sv_vec]] uint16_t my_vector2[21]; the [[sv_vec]] could also go at the end if you prefer that (C/C++ lets you put it a few different places): uint16_t my_vector2[21] [[sv_vec]];
(In reply to Jacob Lifshay from comment #9) > actually, attributes have gotten a whole lot better recently, since C/C++ > introduced a new attribute syntax: please please i know you love "latest and greatest" but it severely interferes with simplicity and practical "get-it-done-in-scope-on-budget" > e.g.: > [[noreturn]] **NO**. absolutely not. end of this line of reasoning. i said ANY deviation from standard c is unacceptable. it interferes with the addition of autovectorisation passes, later [alexandre oliva is the leading expert in gcc autovectorisation] plus can you imagine the nightmare of porting? hundreds of millions of lines of code, stupid "attributes" all over the place whilst we raise the USD 50 million to have a go at proper autovectorisation in both gcc and llvm? we have to be *smart* about this jacob. attributes is an extremely bad idea in multiple ways. > [[sv_vec]] uint16_t my_vector2[21]; i said NO on attributes! please do listen!
(In reply to Jacob Lifshay from comment #7) > (In reply to Luke Kenneth Casson Leighton from comment #4) > > (In reply to Jacob Lifshay from comment #1) > > > some points: > > > the register keyword won't be going away completely > > > > whew > > > > > > > > register long a asm("r3"); > > > asm("addi r3, r3, 1" : "+r"(a)); // increments a > > > > what i envision is that this: > > register long a[10] asm ("*r3") > > I think we should focus on not requiring specifying exact registers, since > imo that's like half the benefit of compilers -- you don't have to figure > out where everything goes, since that's really hard for humans. e.g. it took > me *hours/days* to figure out where all the registers should go for divmod, > and that's a short function! > > > > > literally traslates to > > setvl maxvl=10 > > I think we should avoid requiring the programmer to specify where all the > setvls go, instead we just rely on vector types to track that information, > and the compiler can then insert setvl instructions where necessary. yes. i missed out several compiler and assembly-level peephole optimisation passes there for simplicity (at 5am) > additionally, compiler optimizations tend to work much better when it > doesn't have to worry about a bunch of global state (VL), and can just > reintroduce (insert setvl ops) the global state at the end after doing the > optimizations. ... which is where the correct design of the IR-prefix-representing-SV comes into play but ultimately shoud define pretty much exactly the current SVP64 SPRs. > > > > > and marks r3 as vector, such that sv.addi works as expected. > > following on from that... > > I think it should just mark `a` as a vector, yes, sorry, wasn't clear, yes absolutely. > not r3, since a has the type > `register long[10]` which tells the compiler that it's a vector with > MAXVL=10, you got it in 1. so the compiler can assign it to any convenient spot or even > optimize it out completely rather than being forced to keep it in r3 and key > all scalar/vector decisions off of whether r3 is mentioned or not. yyep. then any loops can easily be autovectorized, vertical-first is going to be astonishingly laughably simple to implement. HF a little harder but doable with the right IR passes and checking that element variables within the loop are all 100% independent.
(In reply to Luke Kenneth Casson Leighton from comment #10) > (In reply to Jacob Lifshay from comment #9) > > > actually, attributes have gotten a whole lot better recently, since C/C++ > > introduced a new attribute syntax: > > please please i know you love "latest and greatest" but it severely > interferes with simplicity and practical "get-it-done-in-scope-on-budget" > > > e.g.: > > [[noreturn]] > > **NO**. absolutely not. end of this line of reasoning. > i said ANY deviation from standard c is unacceptable. well, too bad, *any* syntax we use to express SV operations is not standard C (including the `register` array syntax, the C standard doesn't really allow you to use register on arrays). why not use the syntax *standard* C explicitly provides for extensions instead of inventing our own in a way that seems like a syntax land-grab (which is what just using `register` like that seems to me and probably to other compiler devs)? > > it interferes with the addition of autovectorisation > passes, later [alexandre oliva is the leading expert > in gcc autovectorisation] this is C *syntax*, by the time any autovectorization passes are run, *everything* is in compiler IR, and essentially *none* of the source syntax is left. compiler IR is *not* C. if we pick `register` or if we pick `[[sv_vec]]` they will both translate to the same compiler IR. > plus can you imagine the nightmare of porting? hundreds > of millions of lines of code, stupid "attributes" all > over the place whilst we raise the USD 50 million to have > a go at proper autovectorisation in both gcc and llvm? autovectorization is independent of which C frontend syntax we pick. the whole point of autovectorization is to automatically convert *scalar code* to vectorized code, so any way we express SV operations, that's manual vectorization, so the autovectorizer doesn't generally do anything with the manually vectorized code. > we have to be *smart* about this jacob. yes, we have to make arguments that hold water if we're going to try to argue for or against attributes.
(In reply to Jacob Lifshay from comment #12) > the whole point of autovectorization is to automatically convert *scalar > code* to vectorized code, so any way we express SV operations, that's manual > vectorization, so the autovectorizer doesn't generally do anything with the > manually vectorized code. autovectorization is what converts loops like: for(int i = 0; i < 8; i++) a[i] = b[i] + c[i]; to: *(vec8 *)a = *(vec8 *)b + *(vec8 *)c; if you give it already vectorized code (aka. basically any code using SV vector syntax), it will just not change it.
ah, so you meant something like (for some fixed-vector type vec_t): void attempt_autovec_mul_add(vec_t *a, vec_t *b, vec_t *c, vec_t *r, int n) { for(int o = 0; o < n; o++) { for(int i = 0; i < sizeof(vec_t) / sizeof(float); i++) { r[o][i] = a[o][i] + b[o][i] * c[o][i]; } } } LLVM can vectorize that (when adding __restrict to the pointers) to be effectively: void attempt_autovec_mul_add(vec_t *a, vec_t *b, vec_t *c, vec_t *r, int n) { for(int o = 0; o < n; o++) { r[o] = a[o] + b[o] * c[o]; } } https://gcc.godbolt.org/z/8x4av199K I don't expect that to behave differently for SimpleV no matter which C syntax we pick.