this would be for llvm and gcc and probably cranelift imho adding support in cranelift will be much easier because we don't have to deal with making the maintainers of existing powerisa backends happy with our major modifications, which i expect to be a long painful process. because of that imho we should start with cranelift if we're going to work on cranelift, otherwise start with llvm, because cranelift is probably best but llvm also works well as a gpu driver backend, gcc is less well suited for that. additionally cranelift, being a new powerisa backend rather than modifying an existing one allows us to completely svp64-ify all vector support (with prefix split from suffix) and not have to deal with vsx/vmx at first. cranelift can be used as a backend for rustc and for wasmtime, but i'm not aware of any c/c++ frontends for cranelift (though there are projects that translate c to rust that might kinda work)
the idea is that once we have a working implementation in one compiler it should be much easier to add support in another compiler by translating from the first compiler's implementation.
i'm planning on this grant request including designing C/C++/Rust intrinsics (C++/Rust can use generics/templates on types and intrinsic functions, C we probably want something like the _Alignas keyword for types where it's a modifier on an existing type that also has const-expr arguments, intrinsic functions can be generic like tgmath.h) it will also include designing IR, which may be based on IR intrinsics -- not to be confused with C intrinsics, they will almost certainly be different since C/IR have different goals (C aims for usability, IR aims for canonicalization, optimizability, and SSA-form) for operations with multiple outputs (e.g. fail-first load has both the output vector and the new VL as outputs), imho the Rust intrinsics should return a struct rather than have out-parameters. IR intrinsics must not have out-parameters through memory, since that greatly limits optimizations since it prevents IR from expressing putting that output directly in a register (this is currently an issue with llvm which only has compress-store and no compress-reg2reg-copy).
(In reply to Jacob Lifshay from comment #3) > i'm planning on this grant request including designing C/C++/Rust intrinsics > (C++/Rust can use generics/templates on types and intrinsic functions, C we > probably want something like the _Alignas keyword for types where it's a > modifier on an existing type that also has const-expr arguments, intrinsic > functions can be generic like tgmath.h) for C types we discussed on irc (around linked location): https://libre-soc.org/irclog/%23libre-soc.2023-02-21.log.html#t2023-02-21T10:32:16 that it would be good to have a header that defines ease-of-use macros: // _SVP64_vec(MAXVL, [SUBVL]) changes a int, float, or pointer type into // a vector with that type as the element type, and with the provided MAXVL // and SUBVL (SUBVL defaults to 1) -- MAXVL and SUBVL are const expressions. #ifndef NOSHORTMACROS // user can define if short macros conflict with their code // naming scheme follows rust types plus -x since they're short and to-the-point #define u64x(MAXVL, ...) uint64_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define i64x(MAXVL, ...) int64_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define f64x(MAXVL, ...) double _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define u32x(MAXVL, ...) uint32_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define i32x(MAXVL, ...) int32_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define f32x(MAXVL, ...) float _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) // ... #define u8x(MAXVL, ...) uint8_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define i8x(MAXVL, ...) int8_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define usizex(MAXVL, ...) size_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define isizex(MAXVL, ...) ptrdiff_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) // no macros for pointer element types, just use the _SVP64_vec keyword, e.g.: // const struct node *_SVP64_vec(16, 2) my_node_ptr_vec; // predicate mask vector type, compiles to integer bitmask and/or cr vectors // masks don't have SUBVL #define maskx(MAXVL) bool _SVP64_vec(MAXVL) #endif // long macros that won't conflict #define _SVP64_u64x(MAXVL, ...) uint64_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_i64x(MAXVL, ...) int64_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_f64x(MAXVL, ...) double _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_u32x(MAXVL, ...) uint32_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_i32x(MAXVL, ...) int32_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_f32x(MAXVL, ...) float _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) // ... #define _SVP64_u8x(MAXVL, ...) uint8_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_i8x(MAXVL, ...) int8_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_usizex(MAXVL, ...) size_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_isizex(MAXVL, ...) ptrdiff_t _SVP64_vec(MAXVL __VA_OPT__(,) __VA_ARGS__) #define _SVP64_maskx(MAXVL) bool _SVP64_vec(MAXVL) there may also be additional types for a RVV/SVE-style vector compatibility layer, where the compiler determines MAXVL -- imho this should be left for later as it will be much more complex to implement due to needing to guess register usage
I think we should choose the following sizes and alignments for our vector types: this assumes all non-mask element types have a size that is a power of two all non-mask types: sizeof(Elt _SVP64_vec(MAXVL, SUBVL)) = sizeof(Elt) * MAXVL * SUBVL -- so the same size as the corresponding array with no padding alignof(Elt _SVP64_vec(MAXVL, SUBVL)) = gcd( sizeof(Elt) * MAXVL * SUBVL, next_power_of_two(sizeof(Elt) * MAXVL * SUBVL), ALIGN_LIMIT) where ALIGN_LIMIT is some global constant >= alignof(Elt) for all non-mask element types we'll ever support (so imho 16 is probably good because then you can use malloc for vectors instead of some alloc_aligned function, though we could go bigger to match expected average cache line size -- it can't depend on the target cpu though since that makes libraries a pain. this should be <= largest alignment supported by linker/loader) this choice of size/align for non-mask types gives two nice properties: * vector types never have padding, so we can type-pun an appropriately aligned section of any arbitrary array into a vector type and it will work properly. (if vector types were more aligned than this, writing a vector type to that array could fill array trailing elements with the vector's padding which compilers treat as `undef` -- prohibited in Rust unless the original array uses MaybeUninit) * vector types are always sufficiently aligned that type-punning vector types works correctly with some reasonable assumptions. (described at end) mask types: llvm likes i1 x N vectors to be as small as possible, which can cause problems, so imho: just make it easy and define all mask vector types with MAXVL <= 64 to be uint64_t underneath, so: sizeof(maskx(MAXVL)) = 8 alignof(maskx(MAXVL)) = 8 this means when generating llvm ir we will need to convert to i64 before storing/loading to/from memory (this doesn't include spills/fills since llvm handles that invisibly to the programmer) vector type to vector type punning details: well, conveniently every type combination that is valid to type pun (so doesn't try to e.g. type pun `i32x3` to `i64x2` where the last `i64` is half `undef`) already satisfies the alignment requirements if we decide the alignment is `gcd(sizeof(element) * length, next_power_of_2(sizeof(element) * length), some_global_constant_limit)` (compatible with what i proposed for portable-simd) because: * assuming `some_global_constant_limit >= sizeof(element)` for all possible element types (handles wanting vector alignment to not increase without bound) * assuming element types are power-of-2 sized which afaict is true for SVP64 * assuming `sizeof(target_element) * target_length <= sizeof(source_element) * source_length` aka. that the type pun is valid because the target vector's elements are completely within the source elements and not taking bytes from after the end of the source elements * every type pun where `sizeof(target_element) <= sizeof(source_element)` the alignment works out so the target vector type needs the same or less alignment than the source (always the same alignment if `sizeof(target_element) * target_length == sizeof(source_element) * source_length`) * every type pun where `sizeof(target_element) > sizeof(source_element)` the assumption that the type pun is valid means that the source vector's length is a multiple of `sizeof(target_element) / sizeof(source_element)` which is a power of two therefore the alignment again works out so the target vector type needs the same or less alignment than the source (always the same alignment if `sizeof(target_element) * target_length == sizeof(source_element) * source_length`) As part of Rust's project portable-simd I recommended basically the same size/align scheme for non-mask vectors to both portable-simd and RISC-V: https://github.com/rust-lang/portable-simd/issues/319#issuecomment-1334515524 https://github.com/riscv-non-isa/riscv-elf-psabi-doc/issues/347#issuecomment-1442584902
(In reply to Jacob Lifshay from comment #4) > there may also be additional types for a RVV/SVE-style vector compatibility > layer, where the compiler determines MAXVL -- imho this should be left for > later as it will be much more complex to implement due to needing to guess > register usage no guessing required. int register [NN];
(In reply to Luke Kenneth Casson Leighton from comment #6) > (In reply to Jacob Lifshay from comment #4) > > > there may also be additional types for a RVV/SVE-style vector compatibility > > layer, where the compiler determines MAXVL -- imho this should be left for > > later as it will be much more complex to implement due to needing to guess > > register usage > > no guessing required. > > int register [NN]; the whole point of those RVV/SVE-style vector types is the programmer does *not* provide MAXVL, the compiler figures out the most efficient value. if the programmer wants to provide MAXVL, they should just use the SVP64 vector types (e.g. u32x(14, 3)), since providing MAXVL or not is the only difference between them, they are otherwise identical types.
other reasons to include cranelift is that cranelift is likely better than llvm for use as a gpu driver shader compiler backend
(In reply to Jacob Lifshay from comment #8) > other reasons to include cranelift is that cranelift is likely better than > llvm for use as a gpu driver shader compiler backend Unless Luke or David know otherwise, I don't see us dealing with gpu shaders any time soon. So not critically important (at least this year).