A software only (SwiftShader style) 3D MESA Driver is needed, portable (x86 etc) using non-accelerated LLVM
the idea here is that we need something that is cross-platform portable (works on x86 and POWER9 primarily) and is identical in effect to google "SwiftShader". however as you are no doubt aware, SwiftShader is targetted *only* at software-rendering (using SIMD NEON, SSE, AVX etc.) and because of that, what they've done is *lost* all the predication, vec3/4, swizzle information in the SPIRV IR as quickly as possible, when translating to LLVM IR having lost that information during the translation, it is impossible to get that information back, and even if you tried it would result in massive CPU load and heavy latency. so we therefore need a completely new SPIR-V to LLVM-IR translator: one that **preserves** the predication and other intrinsics right up until the very last minute, just like AMDVLK and RADV, probably using NIR. now, for the *first* version, that information will be "lost" through handing over to standard scalar (non-vectorised, non-3D-accelerated) general-purpose LLVM IR, on x86 and POWER9. this we call the "soft-shader". https://bugs.libre-soc.org/show_bug.cgi?id=251 after that is successful we will design and add accelerated hardware opcodes to the processor (and to the simulator), then add support for those in the LLVM-IR back-end. at that point, the predication and vector intrinsics (preserved in NIR format, rather than thrown away as they are in SwiftShader) can be passed directly to the now vector-capable LLVM IR.
> I need to break it down to small tasks, also I need to > understand what part we need to write and what can be > use/understood from SwiftShader? nothing at all can be _used_ from it. as i said: SwiftShader has been specifically designed with complete lack of consideration for 3D-hardware-accelerated instruction sets. we can _understand_ that, conceptually, the first goal (#251) is to make something *like* SwiftShader... ... except very very specifically targetted at preserving predication, vec2/3/4, and vectors "per se", right the way through into LLVM-IR. SwiftShader *does not do that*. it destroys that information as quickly as possible. the 3D MESA driver therefore needs to be based on one of the following: * the AMDVLK driver (unlikely because it's a rather rocky/kludged port of a Win32 driver) * the MESA RADV driver (which involves removing the "thunk" layer between MESA and LLVM) * the original Intel NIR driver and porting it i.e. following the path blazed by RADV many years ago (RADV was a port of the *Intel* MESA driver to AMDGPU). there are downsides and benefits to each: these all need to be discussed.
one of the important things to understand about the hardware and therefore the driver is that the GPU opcodes and the Vectorisation is part of the *native* CPU instruction set. here is what swiftshader "looks" like: * SPIR-V IR * total removal of all predication, vectorisation "pass" * handing over to LLVM-IR (predication, vectorisation *LOST* at this point) * compilation of LLVM-IR to *native* CPU binary (NEON/SSE SIMD etc) * execution of the resultant binary in the *CPU* space here is what AMDVLK / Intel GPU / NVidia GPU looks like (in summary): * SPIR-V IR * preservation of predication, vectorisation * handing over to LLVM-IR (or other vector-preserving IR) * compilation of IR to **GPU** binary instruction set * massively-complex inter-processor communication system (via kernel drivers) ... ... * finally at long last execution of the resultant vectorised binary on the GPU here is what *we* need (ultimately - as part of issue #140): * SPIR-V IR * preservation of predication, vectorisation * handing over to LLVM-IR (or other vector-preserving IR) * compilation of IR to *NATIVE CPU BINARY WHICH HAPPENS TO HAVE GPU OPCODES* * execution of the resultant vector-capable binary in the >>***CPU***<< space however what we decided to do is to break this down into separate phases. the first phase is to *not* try to add GPU opcodes... yet. the first phase is to make a driver that is ***like*** google SwiftShader but is *not* google SwiftShader, i.e. it is this: * SPIR-V IR * preservation of predication, vectorisation * handing over to LLVM-IR (or other vector-preserving IR) vvvvvvv this is the only difference vvvv * compilation of IR to NATIVE CPU BINARY which does **NOT** have GPU OPCODES ^^^^^^^ ^^^^ * execution of the resultant vector-capable binary in the >>***CPU***<< space the only difference between the "software-only" version and the "final" version is that we may use *upstream* stock and standard LLVM for x86, ARM, POWER9 and consequently have far less work to do. zero optimisations of any kind. zero work on LLVM itself. however - and this is the important bit - the initial "software-only" driver preserves the predication and vectorisation information *right* up until it is handed to that stock-and-standard LLVM. *later* - under a separate bugreport - we will replace that stock-and-standard LLVM with a special version of LLVM that has been augmented to understand the GPU opcodes, vectorisation and predication that we will be adding. i anticipate and expect that if correctly designed (the predication and vectorisation correctly preserved right up until hand-over to LLVM-IR) that when we do this later task, there should be zero (or negligeable, or trivial) modifications required to the 3D MESA driver developed under this bugreport. the reason why we decided already not to go with SwiftShader is because the entire codebase is specifically targetted at removing the SPIR-V predication and vectorisation (including vec2/3/4) as quickly as possible. modifying SwiftShader or trying to use it would result in constant fights with google's engineers along the lines of "why are you submitting patches for a hardware 3D GPU, this is a software GPU driver, please go away" and we would have to do a full hard fork of it and become the only maintainers. the intention here is that this code be *fully* upstream in MESA.
Is it okay to use fork of https://github.com/KhronosGroup/SPIRV-LLVM-Translator ? for SPIRV to LLVM translation ? Or we want to write similar library from scratch?
(In reply to vivekvpandya from comment #4) > Is it okay to use fork of > https://github.com/KhronosGroup/SPIRV-LLVM-Translator ? > for SPIRV to LLVM translation ? ha, wouldn't that be hilarious if 80 to 90% of the work needed was already done by that library. you would still receive a (very large!!) share of the EUR 12,000 donation from NLNet :) > Or we want to write similar library from > scratch? i am very much against the "duplicate code because then we pwn it" way of thinking. the only reason for doing a similar library would be if this one was totally unsuited. given that it seems to have an "extension" architecture it should be adaptable. the main question we need to find out is: does this library "preserve": 1. predication intrinsics from SPIR-V 2. vec2/3/4 3. swizzle information (vec2-4 XYZW reordering) 4. vector length WITHOUT reducing down to SIMD the reason we need these preserved in LLVM-IR is because the Vector customisation of the POWER9 ISA will *have* predication, vec234, swizzle and ATAN2, COS, SIN etc added to it. number 4 (variable vector length) *might* be a bit much to expect because vector intrinsics are in development by Simon and Robin (jacob has been tracking this) one way to find out would be to actually do some quick tests doing exactly that, and i am more than happy to raise an R&D sub-task (with small budget) for that. or it may turn out there are unit tests already in llvm-spirv which tell us the answer? i _was_ kinda expecting that we would use Intel NIR format (just like in the RADV and Intel MESA drivers) however if this library does everything needed GREAT. i.am adding jacob cc on this one as he will have some valuable insights.
(In reply to vivekvpandya from comment #4) > Is it okay to use fork of > https://github.com/KhronosGroup/SPIRV-LLVM-Translator ? > for SPIRV to LLVM translation ? Or we want to write similar library from > scratch? That particular library isn't currently suitable for Vulkan, it is based on the OpenCL SPIR-V dialect, which is somewhat different than the Vulkan SPIR-V dialect. Additionally, it is based on a really old version of LLVM last I checked. If you're going to base everything on Mesa, you should use the NIR code, since there is already a Vulkan SPIR-V to NIR translator. You would need to write the NIR to vectorized LLVM translator.
(In reply to Jacob Lifshay from comment #6) > (In reply to vivekvpandya from comment #4) > > Is it okay to use fork of > > https://github.com/KhronosGroup/SPIRV-LLVM-Translator ? > > for SPIRV to LLVM translation ? Or we want to write similar library from > > scratch? > > That particular library isn't currently suitable for Vulkan, it is based on > the OpenCL SPIR-V dialect, which is somewhat different than the Vulkan > SPIR-V dialect. Additionally, it is based on a really old version of LLVM > last I checked. > > If you're going to base everything on Mesa, you should use the NIR code, > since there is already a Vulkan SPIR-V to NIR translator. You would need to > write the NIR to vectorized LLVM translator. I see this commit related to LLVM 12. https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/501a277a0b18e6d709f4bec5e6557c9099207af4 Also I could not find that it supports only OpenCL dialect.
(In reply to vivekvpandya from comment #7) > (In reply to Jacob Lifshay from comment #6) > > (In reply to vivekvpandya from comment #4) > > > Is it okay to use fork of > > > https://github.com/KhronosGroup/SPIRV-LLVM-Translator ? > > > for SPIRV to LLVM translation ? Or we want to write similar library from > > > scratch? > > > > That particular library isn't currently suitable for Vulkan, it is based on > > the OpenCL SPIR-V dialect, which is somewhat different than the Vulkan > > SPIR-V dialect. Additionally, it is based on a really old version of LLVM > > last I checked. > > > > If you're going to base everything on Mesa, you should use the NIR code, > > since there is already a Vulkan SPIR-V to NIR translator. You would need to > > write the NIR to vectorized LLVM translator. > > I see this commit related to LLVM 12. > https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/ > 501a277a0b18e6d709f4bec5e6557c9099207af4 > > Also I could not find that it supports only OpenCL dialect. Yes it does not yet support Vulkan SPIR-V https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/30
(In reply to vivekvpandya from comment #7) > (In reply to Jacob Lifshay from comment #6) > > (In reply to vivekvpandya from comment #4) > > > Is it okay to use fork of > > > https://github.com/KhronosGroup/SPIRV-LLVM-Translator ? > > > for SPIRV to LLVM translation ? Or we want to write similar library from > > > scratch? > > > > That particular library isn't currently suitable for Vulkan, it is based on > > the OpenCL SPIR-V dialect, which is somewhat different than the Vulkan > > SPIR-V dialect. Additionally, it is based on a really old version of LLVM > > last I checked. > > > > If you're going to base everything on Mesa, you should use the NIR code, > > since there is already a Vulkan SPIR-V to NIR translator. You would need to > > write the NIR to vectorized LLVM translator. > > I see this commit related to LLVM 12. > https://github.com/KhronosGroup/SPIRV-LLVM-Translator/commit/ > 501a277a0b18e6d709f4bec5e6557c9099207af4 Ok, so they updated it since I last checked. > Also I could not find that it supports only OpenCL dialect. They don't explicitly say so, but it becomes apparent from looking around some: it doesn't support OpKill which is only used in fragment shaders (required for Vulkan), it also doesn't support any of the glsl standard library (only the OpenCL standard library), which is required for Vulkan. https://www.khronos.org/registry/spir-v/specs/unified1/GLSL.std.450.html
Just to note: There is a translator based on https://github.com/KhronosGroup/SPIRV-LLVM-Translator and generates LLVM IR which has AMDGPU specific intrinsic in it. https://github.com/GPUOpen-Drivers/llpc/tree/dev/llpc/translator
We should also take a look at https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/
(In reply to vivekvpandya from comment #10) > Just to note: > There is a translator based on > https://github.com/KhronosGroup/SPIRV-LLVM-Translator and generates LLVM IR > which has AMDGPU specific intrinsic in it. > https://github.com/GPUOpen-Drivers/llpc/tree/dev/llpc/translator interesting! and that's work done in the last 6 months. so the question becomes, here: are those intrinsics supported only by a special version of LLVM? (i don't believe so: AMDGPU LLVM has been upstream for some time). or: is it the case that if we tried handing the LLVM-IR with AMDGPU specific intrinsics to ask it to be compiled to x86 standard assembler, that LLVM would throw a compile-time error *because* of those AMDGPU intrinsics? i.e. have they made them *general-purpose* LLVM IR intrinsics or are they *really* actually AMDGPU-only intrinsics? if they are specific to AMDGPU, then could we (hypothetically) create a (temporary) mapping translator? or: would it be better to see what general-purpose vector intrinsics exist in LLVM IR that are properly supported for general-purpose use?
(In reply to Jacob Lifshay from comment #9) > They don't explicitly say so, but it becomes apparent from looking around > some: it doesn't support OpKill which is only used in fragment shaders > (required for Vulkan), it also doesn't support any of the glsl standard > library (only the OpenCL standard library), which is required for Vulkan. > > https://www.khronos.org/registry/spir-v/specs/unified1/GLSL.std.450.html i was also kinda expecting to use NIR, on the assumption that it would be less work, given that (i think?) now 3 MESA drivers use it. however to be honest it really doesn't matter: if for example it's easier to add OpKill (and other ops) to llvm-spirv, that's great.
(In reply to vivekvpandya from comment #11) > We should also take a look at > https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/ https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/#special-cases-1 ah ok so we will want to actually generate a rsqrt llvm instruction, because we will actually _have_ an rsqrt assembly-level instruction in the hardware. also, slightly puzzled: i'm not seeing anything there about predication, vec2/3/4, or swizzle?
(In reply to Luke Kenneth Casson Leighton from comment #14) > also, slightly puzzled: i'm not seeing anything there about predication, > vec2/3/4, or swizzle? https://mlir.llvm.org/docs/SPIRVToLLVMDialectConversion/#vector-types i'm guessing that vec2/3/4 would be "vector types"?
For now I don't see any reason to work on MLIR dialect. Based on this presentation it seems to support compute shaders only, work is in progress https://drive.google.com/drive/u/0/folders/1jTTzQkBj8gq7gavtfuWFpv-dc0pH_VEn I think we can start with https://github.com/GPUOpen-Drivers/llpc/tree/dev/llpc/translator and see how we can make it working on x86 instructions, instead of AMDGPU specific instructions.
(In reply to vivekvpandya from comment #16) > For now I don't see any reason to work on MLIR dialect. > Based on this presentation it seems to support compute shaders only, work is > in progress > https://drive.google.com/drive/u/0/folders/1jTTzQkBj8gq7gavtfuWFpv-dc0pH_VEn > > I think we can start with > https://github.com/GPUOpen-Drivers/llpc/tree/dev/llpc/translator and see how > we can make it working on x86 instructions, instead of AMDGPU specific > instructions. I really think a better way to go is to use NIR, since it is already built into MESA instead of using AMDVLK's pipeline compiler. Also, since several GPU drivers are built using it, it is less likely to have things that only work on AMD gpus. NIR is the de-facto MESA way to write a Vulkan driver.
(In reply to Jacob Lifshay from comment #17) > (In reply to vivekvpandya from comment #16) > > For now I don't see any reason to work on MLIR dialect. > > Based on this presentation it seems to support compute shaders only, work is > > in progress > > https://drive.google.com/drive/u/0/folders/1jTTzQkBj8gq7gavtfuWFpv-dc0pH_VEn > > > > I think we can start with > > https://github.com/GPUOpen-Drivers/llpc/tree/dev/llpc/translator and see how > > we can make it working on x86 instructions, instead of AMDGPU specific > > instructions. > > I really think a better way to go is to use NIR, since it is already built > into MESA instead of using AMDVLK's pipeline compiler. Also, since several > GPU drivers are built using it, it is less likely to have things that only > work on AMD gpus. > > NIR is the de-facto MESA way to write a Vulkan driver. Why NIR is required? As per comment 3 we need to support a SPIR-V IR and that can be through any API right? Also can we use something like https://github.com/mesa3d/mesa/commits/master/src/gallium/drivers/llvmpipe I have not checked it yet but that seems to have one more abstraction before LLVM.
(In reply to vivekvpandya from comment #18) > Why NIR is required? As per comment 3 we need to support a SPIR-V IR and > that can be through any API right? Also can we use something like > https://github.com/mesa3d/mesa/commits/master/src/gallium/drivers/llvmpipe > I have not checked it yet but that seems to have one more abstraction before > LLVM. Because NIR has lots of GPU-specific optimization passes that LLVM doesn't, such as cross-shader optimizations. The rest of MESA is also already built to interact with NIR when it needs to read properties of the shaders. It is not built to do that with LLVM or with LLPC. Luke was oversimplifying somewhat.
(In reply to Jacob Lifshay from comment #19) > (In reply to vivekvpandya from comment #18) > > Why NIR is required? it is not "required" however the amount if work to do something that does not use NIR when it is integrated into MESA looks like it would be far longer a project. > > As per comment 3 we need to support a SPIR-V IR and > > that can be through any API right? Also can we use something like > > https://github.com/mesa3d/mesa/commits/master/src/gallium/drivers/llvmpipe > > I have not checked it yet but that seems to have one more abstraction before > > LLVM. llvmpipe has severe design limitations which jacob is aware of the details, it is hardcoded to single threaded in a critical area. if we use it we will only be able to use one single core, all others will run idle. > > Because NIR has lots of GPU-specific optimization passes that LLVM doesn't, > such as cross-shader optimizations. The rest of MESA is also already built > to interact with NIR when it needs to read properties of the shaders. so there basically already exists a full NIR parser, built-in to MESA. what you are saying is, if we use llvm-spirv it would be necessary to do work to *remove* the existing NIR support from MESA, it would then be necessary to re-add the missing shader optimisation passes not present in llvm-spirv. this tends to point in favour of starting from RADV, or the original intel 3D driver. > It is > not built to do that with LLVM or with LLPC. Luke was oversimplifying > somewhat. yes, because i don't have a handle on the full details in ways that you do, jacob.
Do we have a NIR reader/parser code already in open source? Also where can I find some shader dumps for NIR?
https://github.com/mesa3d/mesa/tree/master/src/amd/vulkan https://github.com/mesa3d/mesa/tree/master/src/compiler/nir
(In reply to vivekvpandya from comment #21) > Do we have a NIR reader/parser code already in open source? as part of MESA, yes. bit of crossover i was just investigating. https://github.com/mesa3d/mesa/blob/master/src/panfrost/midgard/midgard_compile.c even the new MALI midgard driver uses NIR. > Also where can I find some shader dumps for NIR? honestly don't know. a quick treewalk i did find this: https://github.com/mesa3d/mesa/blob/master/src/compiler/spirv/spirv2nir.c whether that's a binary dump or not i don't know. if there really is no nir assembly dump tool or documentation that would be rather strange and be "self-foot-shooting". https://people.freedesktop.org/~cwabbott0/nir-docs/intro.html ah, here we go. there is a way: NIR includes a function called nir_print_shader() for printing the contents of a shader to a given FILE *, which can be useful for debugging. In addition, nir_print_instr() is exposed, which can be useful for examining instructions in the debugger.
https://github.com/mesa3d/mesa/blob/master/src/amd/llvm/ac_llvm_build.c#L4083 that's a big file. went through it briefly. i "get" the conversion of e.g. nir add to llvm add (etc). more here: https://github.com/mesa3d/mesa/blob/master/src/amd/llvm/ac_nir_to_llvm.c#L859 again more conversion. interesting that for recip sqrt a special llvm.amdgcn.rsqr is needed. we will add rsqrt as an actual opcode to the instruction set however obviously not for x86/POWER9 scalar. this tends to suggest that llvm is the best place for "llvm.rsqrt" and for x86 (etc) it would be llvm that had a pass converting that to 1/sqrt(x)
Just to note I tried dumping NIR with debug mesa drivers: My command: vivek@vivek-VirtualBox:~/Downloads/mesa-demos/build/src/demos$ GALLIUM_DEBGU=ir LD_LIBRARY_PATH=/home/vivek/install/lib/x86_64-linux-gnu/ LIBGL_DRIVERS_PATH=/home/vivek/install/lib/x86_64-linux-gnu/dri NIR_PRINT=1 MESA_DEBUG=1 LIBGL_ALWAYS_SOFTWARE=true GALLIUM_DRIVER=llvmpipe LIBGL_DEBUG=verbose VC4_DEBUG=nir NIR_DEBUG=1 MIDGARD_MESA_DEBUG=shaders ./gears I did not get anything. I asked few people on for help but nothing could lead to success. Asking anything on mesa mailing list (even on IRC) requires you to force subscribe it :(
Writing complete MESA driver which uses LLVM to codegen seems very large effort. It is only worth if this driver is not just for experimental use. if we already agree that mesa llvmpipe has limitation then we may want to fix that. On side thought I would really like to evaluate that we really need all benefits from NIR optimizations. (if that is the only reason to bring NIR in flow) Our enticipated flow: Vulkan/OGL -> spirv -> (someIR)* -> LLVM -> LLVM Optimization and Codegen. so without having solid numbers on performance that we required NIR optimization I really would like to go AMDVLK fork that can generate ppc assembly.
(In reply to vivekvpandya from comment #25) > Just to note I tried dumping NIR with debug mesa drivers: > My command: > vivek@vivek-VirtualBox:~/Downloads/mesa-demos/build/src/demos$ > GALLIUM_DEBGU=ir misspelled here. > LD_LIBRARY_PATH=/home/vivek/install/lib/x86_64-linux-gnu/ > LIBGL_DRIVERS_PATH=/home/vivek/install/lib/x86_64-linux-gnu/dri NIR_PRINT=1 > MESA_DEBUG=1 LIBGL_ALWAYS_SOFTWARE=true GALLIUM_DRIVER=llvmpipe > LIBGL_DEBUG=verbose VC4_DEBUG=nir NIR_DEBUG=1 MIDGARD_MESA_DEBUG=shaders > ./gears > > I did not get anything. gears doesn't actually use any opengl shaders AFAIK, so it uses a fallback pipeline or something like that -- not exactly sure how the OpenGL API is wired up to Gallium. Try using a more complex example program that uses shaders, or just use a Vulkan example program (assuming you have a GPU in your VM), since Vulkan doesn't have the fallback pipeline and will always use shaders if it draws any triangles. The other part that might be causing problems is the virtualbox guest additions, which might be overriding the opengl libraries.
(In reply to vivekvpandya from comment #26) > Writing complete MESA driver which uses LLVM to codegen seems very large > effort. It is only worth if this driver is not just for experimental use. > > if we already agree that mesa llvmpipe has limitation then we may want to > fix that. that could be as complicated as writing a whole new Gallium driver, since a lot of llvmpipe assumes that some parts are single threaded AFAIK. > On side thought I would really like to evaluate that we really need all > benefits from NIR optimizations. (if that is the only reason to bring NIR in > flow) > > Our enticipated flow: > > Vulkan/OGL -> spirv -> (someIR)* -> LLVM -> LLVM Optimization and Codegen. A future flow that could provide some major benefits: (like the ACO backend for RADV which can provide >10% performance increase) Vulkan/OpenGL -> NIR -> NIR optimizations -> Cranelift or similar (like ACO) Cranelift and ACO are both waay faster than LLVM since they are designed more for generating pretty good code quickly, rather than LLVM's approach of taking waay longer to generate the best code. In this case, NIR optimizations are critical for good performance since neither Cranelift or ACO do any of the higher-level optimizations. > so without having solid numbers on performance that we required NIR > optimization I really would like to go AMDVLK fork that can generate ppc > assembly. There are other reasons not to go with an AMDVLK fork, such as Mesa being well known to accept contributions, being friendly to Libre-software, having a good community, generally being developed in the open (transparency), and being installed on most Linux distros in the default install. AMDVLK, by contrast, has effectively zero contribution other than AMD employees, is developed in a closed manner (they publish new code every once in a while, but the actual development process is mostly private), and it is unknown if they will accept a port to a non-AMD GPU upstream (I'm guessing not). There is the other factor that RADV, when combined with ACO, is usually faster than AMDVLK.
(In reply to Jacob Lifshay from comment #28) assembly. > > There are other reasons not to go with an AMDVLK fork, such as Mesa being > well known to accept contributions, being friendly to Libre-software, having > a good community, generally being developed in the open (transparency), and > being installed on most Linux distros in the default install. what we do not want is to end up in a situation where LibreSOC is the sole exclusive location where libraries are found, and we really, *really* do not wish to hard fork and become the sole maintainer of massive codebases such as LLVM, AMDVLK, MESA and so on. attempting such *will* run into difficulties with distros because nobody will accept a replacement package from an arbitrary source other than the distro itself. even if we tried that the problems it causes for users are immense. examples are the deb-multimedia archive which provides "alternative" versions of debian packages. the archive is maintained by only one person who does not have sufficient resources to keep fully up-to-date recompiling packages at the same rate and with the same options as the entire debian team of a thousand people. if we are not "upstream" we create the exact same problem due to being a hard fork of critical low level software. not only that: any security issues and security patches in the hard forked software become *our problem to drop everything and provide*. the patching process to try to keep uptodate will consume all resources leaving no time for actual maintenance and development. consequently, leveraging upstream is critically important. > AMDVLK, by contrast, has effectively zero contribution other than AMD > employees, is developed in a closed manner (they publish new code every once > in a while, but the actual development process is mostly private), and it is > unknown if they will accept a port to a non-AMD GPU upstream (I'm guessing > not). similarly, we decided not to go with SwiftShader because google is the only controlling contributor. we would therefore become the de-facto exclusive maintainer of a hard fork of AMDVLK which is too much of a burden and as explained above has detrimental ramifications. leveraging the collaboration inherent in MESA is in many ways far more important than whether the code is, in its initial release, optimally performing or even feature complete.
(In reply to vivekvpandya from comment #26) > Writing complete MESA driver which uses LLVM to codegen seems very large > effort. It is only worth if this driver is not just for experimental use. can you expand on this insight a little? the reason i ask is twofold firstly, we have to be comprehensive in our analysis of existing software and ecosystems (taking into consideration longterm maintenance). therefore even if the answer turns out to be obvious, due to the audit and transparency requirements we need to go over all options. secondly, the reason why we picked to do a nonaccelerated nonvectorised general purpose software-only shader is because apart from the lack of vectorisation and custom 3D opcodes such a driver is relatively close to what is needed for a hybrid CPU-GPU (a lot closer than dual ISA architectures, that is) consequently whilst experimental, jacob and i when evaluating this driver development strategy considered it a reasonable incremental stepping stone. what we do not want to have to do is to develop the ISA *and* an ISA simulator *and* the HDL *and* the driver *and* the LLVM IR backend all at the same time. if instead we can get a working 3D nonaccelerated MESA driver that runs on stable hardware we have an incremental code-morphing path where portions may be independently developed and brought in one at a time. * first target: MESA driver on x86 * second target: MESA driver on POWER9 * next: add 3D opcodes to POWER9 simulator * next: augment LLVM to use new 3D opcodes * next: alter MESA driver to generate LLVM intrinsics that will use new opcodes. if we attempt a nonincremental strategy the chances of success are extremely remote. a bug occurs and we have no idea whether it is in LLVM, or the MESA driver, or the POWER9 simulator or the HDL, or even the 3D opcode. if therefore it requires extra passes/steps or requires adding translation layers to e.g. LLVM to get the AMDGPU vectorised intrinsics translated to scalar x86 (or POWER9) and this code is later discarded then annoying as that may be, so be it. however i suspect that there will be quite some interest in the nonaccelerated software vulkan compatible MESA driver in its own right, particularly given that its job could effectively supplant SwiftShader if picked up by another maintainer and suitably optimised to use SSE/AVX/NEON/VSX
(In reply to Luke Kenneth Casson Leighton from comment #30) > (In reply to vivekvpandya from comment #26) > > Writing complete MESA driver which uses LLVM to codegen seems very large > > effort. It is only worth if this driver is not just for experimental use. > > can you expand on this insight a little? the reason i ask is twofold > This is due to my very little knowledge about Mesa sw structure. When I said complete MESA driver what I mean is everything from scratch. But I tried to get some information about MESA system. Now to me it seems that if we want to use MESA based driver, we must use open source code that gets upto NIR (or Gallium). After that we have few options: LLVMPipe (note official document claims to be multi-threaded and fastest, also that it can generate code for ppc): "The Gallium llvmpipe driver is a software rasterizer that uses LLVM to do runtime code generation. Shaders, point/line/triangle rasterization and vertex processing are implemented with LLVM IR which is translated to x86, x86-64, or ppc64le machine code. Also, the driver is multi-threaded to take advantage of multiple CPU cores (up to 8 at this time). It’s the fastest software rasterizer for Mesa." OpenSWR: For OpenSWR its official FAQ page claims that to get it generate PPC code non trivial amount of work is required. Option3: Write a NIR to LLVM IR translator (or we can use code from above two components) and use LLVM's JIT to run that code on x86/PPC. So we must note limitations of LLVMPipe so that we have record for why not choosing that.
(In reply to vivekvpandya from comment #31) > (In reply to Luke Kenneth Casson Leighton from comment #30) > > (In reply to vivekvpandya from comment #26) > > > Writing complete MESA driver which uses LLVM to codegen seems very large > > > effort. It is only worth if this driver is not just for experimental use. > > > > can you expand on this insight a little? the reason i ask is twofold > > > This is due to my very little knowledge about Mesa sw structure. > > When I said complete MESA driver what I mean is everything from scratch. > But I tried to get some information about MESA system. it provides a common "meeting place" of disparate ideas, presenting a uniform top level API and redirecting to different subsystems with translators that again share common APIs. > Now to me it seems that if we want to use MESA based driver, we must use > open source code that gets upto NIR (or Gallium). a better way to put it is that MESA *provides* that code, already written, so that we do not have to. (in addition for example there is someone else working on a library that converts OpenGL to Vulkan. it is i believe a MESA plugin). > After that we have few options: > > LLVMPipe (note official document claims to be multi-threaded and fastest, > also that it can generate code for ppc): > x86-64, or ppc64le machine code. Also, the driver is multi-threaded to take > advantage of multiple CPU cores (up to 8 at this time). hmmm so jacob, where is the limitation that you noted in llvmpipe? > OpenSWR: > For OpenSWR its official FAQ page claims that to get it generate PPC code > non trivial amount of work is required. it also says that it is an OpenGL driver, and we definitely want to do Vulkan. > Option3: > Write a NIR to LLVM IR translator (or we can use code from above two > components) > and use LLVM's JIT to run that code on x86/PPC. option 4 start from an existing NIR to LLVM IR translator (RADV), take out amdgpu specific intrinsics and replace them option 5 keep the existing RADV NIR to LLVM IR translator as-is as much as possible and augment LLVM IR with translation passes that turn special amdgpu vector intrinsics into generic scalar ones. later (Phase 2) replace those special translation passes into POWER9-SimpleV-augmented ones > So we must note limitations of LLVMPipe so that we have record for why not > choosing that. indeed. in particular we need to establish (and this is highly likely) if the same design assumptions were made in llvmpipe that were also made in SwiftShader. namely: the hypothesis is that the inherent assumption was that because the target was a software-only renderer where all target hardware was known to be incapable of supporting vectors, vec2/3/4, predication, swizzle, texturisation operations and more, llvmpipe was designed to eliminate these capabilities as quickly as possible and to "fall" onto standard SIMD and scalar capability. jacob has established that NIR preserves these types of 3D specialist intrinsics, and as a gallium driver i would be very surprised if llvmpipe also preserved them.
(In reply to Luke Kenneth Casson Leighton from comment #32) > (In reply to vivekvpandya from comment #31) > > After that we have few options: > > > > LLVMPipe (note official document claims to be multi-threaded and fastest, > > also that it can generate code for ppc): > > > x86-64, or ppc64le machine code. Also, the driver is multi-threaded to take > > advantage of multiple CPU cores (up to 8 at this time). > > hmmm so jacob, where is the limitation that you noted in llvmpipe? So, for any OpenGL/Vulkan implementation, there are 2 parts (oversimplifying somewhat) where lots of shader processing happens: the per-vertex/per-triangle processing and the per-pixel/per-fragment processing. Last I checked, llvmpipe does parallelize the per-pixel/per-fragment processing but doesn't parallelize the per-vertex/per-triangle processing. Both are important for performance, though per-pixel/per-fragment processing is about an order-of-magnitude more important if just counting the number of times shaders are run, though that can vary greatly. A good implementation (such as AMD GPUs, and what is planned for Kazan) will parallelize both processing stages using both SIMD (or equivalent) and multiple processors.
>> llvmpipe does parallelize the > per-pixel/per-fragment processing but doesn't parallelize the > per-vertex/per-triangle processing. Just to note: I see use of SSE capabilities on Intel and similar stuffs for POWER8 in following files: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/llvmpipe/lp_rast_tri.c https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/llvmpipe/lp_setup_tri.c but not in following: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/llvmpipe/lp_setup_line.c
(In reply to vivekvpandya from comment #34) > >> llvmpipe does parallelize the > > per-pixel/per-fragment processing but doesn't parallelize the > > per-vertex/per-triangle processing. > > Just to note: > I see use of SSE capabilities on Intel and similar stuffs for POWER8 in > following files: https://phoronix.com/scan.php?page=news_item&px=Gallium3D-Vulkan-ST-Possible although it is 2016 the article points out that implementing vulkan on top of gallium is not really practical.
I studies mesa source code for vulkan drivers and I have following very high-level plan (Note: we may face difficulties executing it) Step1: Testing Setup This includes a Linux machine in which we can run vulkan tests trough mesa built libvulkan_intel.so Step 2: Start a simple project under mesa source tree which can create a simple libvulkan_ppc.so. However this lib will not actually do anything and on pipeline creation it should just return some error code as VkResult. In this task we may start by just copying code for https://gitlab.freedesktop.org/mesa/mesa/-/tree/master/src/intel/vulkan abd then remove all files from meson.build and just keep minimal files which are required to create a broken pipeline. Test this setup with by forcing application to use this driver. Need to figure out way how to force it. May be through VK_ICD_FILENAMES. Step3: Once we have above broken driver ready we can start enabling code required to process "COMPUTE" shaders. And we should be able dump spriv text format and fail. Step4: Use spriv to nir converter available in https://gitlab.freedesktop.org/mesa/mesa/-/tree/master/src/compiler/spirv and dump NIR and fail. Step5: Copy NIR to LLVM ir converions from https://gitlab.freedesktop.org/mesa/mesa/-/tree/master/src/amd/vulkan and use it to generate LLVM that suites our need. Dump LLVM IR and fail. Here if possible we can create some mechanism such that we can move https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/amd/vulkan/radv_nir_to_llvm.c to common compiler folder. One possible thing is to have a simple file that maps NIR operations to target specific intrinsics for (AMDGPU, PPC) and use that in above code. Step6: Use LLVM's JIT capabilities to execute IR generated in Step5 After these steps we have a simple mesa driver working for COMPUTE shader.
(In reply to vivekvpandya from comment #36) > I studies mesa source code for vulkan drivers and I have following very > high-level plan (Note: we may face difficulties executing it) that's ok. evaluating and adjusting accordingly is fine. this is a research project. > Step1: Testing Setup > This includes a Linux machine in which we can run vulkan tests through > mesa built libvulkan_intel.so we have sponsored access thanks to Raptor Engineering to a TALOS-II workstation (24-core POWER9). however being able to run generically - not depend on ppc llvm - is partly a "nice side-goal" and partly "if there's something not working on llvm ppc we don't want that to be a show-stopper" > Step 2: > Start a simple project under mesa source tree which can create a simple > libvulkan_ppc.so. However this lib will not actually do anything and on > pipeline creation > it should just return some error code as VkResult. In this task we may > start by just copying code for > https://gitlab.freedesktop.org/mesa/mesa/-/tree/master/src/intel/vulkan abd > then remove all files from meson.build and just keep minimal files which are > required to create a broken pipeline. interesting. i believe this was the approach taken originally on RADV, and the Intel team - who designed NIR - are known to be very helpful and approachable. > Here if possible we can create some mechanism such that we can move > https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/amd/vulkan/ > radv_nir_to_llvm.c to common compiler folder. this is a good idea, easily justified by the code-duplication that would otherwise result. a "temporary" solution which would help convince the mesa developers of the need for that would be a file named libresoc/vulkan/radv_nir_llvm.c containing this one line: "#include ../../amd/vulkan/radv_nir_llvm.c" > One possible thing is to have a simple file that maps NIR operations to > target specific intrinsics for (AMDGPU, PPC) and use that in above code. oh that would be interesting. trying AMDGPU as one of the (generic) targets of libresoc-mesa. it would involve adding the pipe system in (the system which transfers LLVM-compiled binaries over to the radeon GPU) which is not a high priority. interesting all the same. > Step6: Use LLVM's JIT capabilities to execute IR generated in Step5 > > After these steps we have a simple mesa driver working for COMPUTE shader. fantastic. one thing to note (for the record): unlike RADV (and AMDVLK), the resultant binary, generated by Step 6, is *locally* executed, on the main processor core (x86, ppc64). RADV and AMDVLK use some sort of "pipe" library that farms the (foreign architecture) binary over to the (separate) GPU. jacob, what's your thoughts?
btw this has been quite a comprehensive evaluation and i think it only reasonable to allocate a subtask and budget for it.
(In reply to vivekvpandya from comment #36) > Test this setup with by forcing application to use this driver. Need to > figure out way how to force it. May be through VK_ICD_FILENAMES. if i remember correctly jacob has worked this out already in kazan and documented it.
(In reply to Luke Kenneth Casson Leighton from comment #39) > (In reply to vivekvpandya from comment #36) > > > Test this setup with by forcing application to use this driver. Need to > > figure out way how to force it. May be through VK_ICD_FILENAMES. > > if i remember correctly jacob has worked this out already in kazan and > documented it. yes, see run.sh and run-cts.sh in: https://salsa.debian.org/Kazan-team/kazan/-/tree/master You need to create the json file pointing to the compiled .so file. example contents: { "ICD": { "api_version": "1.2.128", "library_path": "/usr/lib/x86_64-linux-gnu/libvulkan_radeon.so" }, "file_format_version": "1.0.0" }
> we have sponsored access thanks to Raptor Engineering to a TALOS-II > workstation (24-core POWER9). I see there is vulkan-loader binary available fro PPC. (I am not sure what else may require to run a vulkan program) From Mesa RADV has ppc build https://packages.ubuntu.com/xenial-updates/powerpc/mesa-vulkan-drivers/filelist So I can try starting building simple broken pipeline on this machine once sufficient progress is made we can test things on Intel machine.
(In reply to vivekvpandya from comment #41) > > we have sponsored access thanks to Raptor Engineering to a TALOS-II > > workstation (24-core POWER9). > > I see there is vulkan-loader binary available fro PPC. ppc debian packages have been around for a *long* time. currently at 94% https://buildd.debian.org/stats/
(In reply to vivekvpandya from comment #36) > After these steps we have a simple mesa driver working for COMPUTE shader. vivek one nice thing about these steps is it is very easy and clear to allocate budgets for each. regarding that: if you are not in europe can i recommend applying for a Transferwise online bank account? this only requires uploading a form of ID, and the conversion rates (if you want to) use "middle rate" without extortionate fees. also you will not have bank transfer fees subtracted when NLNet sends from their EU Bank.
(In reply to vivekvpandya from comment #36) > I studies mesa source code for vulkan drivers and I have following very > high-level plan (Note: we may face difficulties executing it) That plan sounds good to me!
(In reply to Jacob Lifshay from comment #44) > (In reply to vivekvpandya from comment #36) > > I studies mesa source code for vulkan drivers and I have following very > > high-level plan (Note: we may face difficulties executing it) > > That plan sounds good to me! excellent, i'll close #466 then.
see: https://gitlab.freedesktop.org/apinheiro/mesa/-/commit/07d01ebf6aae2f9ae71a8bea13a5d8acccb6280e http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-August/000209.html
Just to note: I am using https://github.com/GameTechDev/IntroductionToVulkan simple programs to test driver skeleton.
Currently driver skeleton fails at 53x VkResult libresoc_CreateGraphicsPipelines( 54x VkDevice _device, 55x VkPipelineCache pipelineCache, 56x uint32_t count, 57x const VkGraphicsPipelineCreateInfo* pCreateInfos, 58x const VkAllocationCallbacks* pAllocator, 59x VkPipeline* pPipelines) 60x { 61t> return VK_ERROR_UNKNOWN; 62x //FIXME: stub 63x } which is my original intention to in Step 2: as per comment 36. https://gitlab.freedesktop.org/vivekvpandya/mesa/-/tree/libresoc_dev/src/libre-soc/vulkan However coed quality is not good, with lots of TODOs and stubs. Next target is to get SPIRV dumps of shaders.
(In reply to vivekvpandya from comment #48) > Currently driver skeleton fails at > <snip> > which is my original intention to in Step 2: as per comment 36. Yay!
(In reply to Jacob Lifshay from comment #49) > (In reply to vivekvpandya from comment #48) > > Currently driver skeleton fails at > > <snip> > > which is my original intention to in Step 2: as per comment 36. > > Yay! fantastic, vivek. what would you consider step 2 to be a reasonable percentage completed so far, in terms of complexity / productivity?
(In reply to Luke Kenneth Casson Leighton from comment #50) > (In reply to Jacob Lifshay from comment #49) > > (In reply to vivekvpandya from comment #48) > > > Currently driver skeleton fails at > > > <snip> > > > which is my original intention to in Step 2: as per comment 36. > > > > Yay! > > fantastic, vivek. what would you consider step 2 to be a reasonable > percentage completed so far, in terms of complexity / productivity? I am not sure, frankly it was lots of labor but we still far from interesting point where we can run some thing.(In reply to Luke Kenneth Casson Leighton from comment #50) > (In reply to Jacob Lifshay from comment #49) > > (In reply to vivekvpandya from comment #48) > > > Currently driver skeleton fails at > > > <snip> > > > which is my original intention to in Step 2: as per comment 36. > > > > Yay! > > fantastic, vivek. what would you consider step 2 to be a reasonable > percentage completed so far, in terms of complexity / productivity? I think all these things are lots of labor work, mainly because I don't want to put code which is added without testing, so I am adding stubs for things that are not really required for shader creation. at this point driver dumps spirv and nir. https://gitlab.freedesktop.org/vivekvpandya/mesa/-/commit/14e40cd8e605423e8b3d4d7cb7a7589149b4ff50 Next step is to get LLVM generated from NIR.
(In reply to vivekvpandya from comment #51) > (In reply to Luke Kenneth Casson Leighton from comment #50) > > (In reply to Jacob Lifshay from comment #49) > > > (In reply to vivekvpandya from comment #48) > > > > Currently driver skeleton fails at > > > > <snip> > > > > which is my original intention to in Step 2: as per comment 36. > > > > > > Yay! > > > > fantastic, vivek. what would you consider step 2 to be a reasonable > > percentage completed so far, in terms of complexity / productivity? > > I am not sure, frankly it was lots of labor but we still far from > interesting point where we can run some thing. the reason i ask is because we can create a milestone for you and assign a percentage of the budget to it, and you can receive a donation from NLnet. it does have to be a "100% completed task" which is why we are retrospectively creating "subtasks that happen to reflect 100% completed work" :) basically if we do not have these sub-milestones we have to wait for the ENTIRE driver to be completed 100% and only then can the donation (EUR 11500) be paid to you. if that's actually ok with you then that's fine with me. (In reply to Luke Kenneth > Casson Leighton from comment #50) > > (In reply to Jacob Lifshay from comment #49) > > > (In reply to vivekvpandya from comment #48) > > > > Currently driver skeleton fails at > > > > <snip> > > > > which is my original intention to in Step 2: as per comment 36. > > > > > > Yay! > > > > fantastic, vivek. what would you consider step 2 to be a reasonable > > percentage completed so far, in terms of complexity / productivity? > > I think all these things are lots of labor work, mainly because I don't want > to put code which is added without testing, so I am adding stubs for things > that are not really required for shader creation. absolutely fine, btw we really do need to collate all the tests (into a separate repo) so that anyone at amy time can join and get up to speed immediately without delay or needing to wait for an email "what am i missing"
https://git.libre-soc.org/?p=mesa.git;a=commitdiff;h=c5211837ec6469b2b9fba592845237050b5b2e9d vivek, Cole pushed the addotion of the vk* files to the libresoc_dev branch on git.libre-soc.org you will need to git pull from there BEFORE adding any other files otherwise we end up with a conflict. please see the HDL_workflow page, note that the ssh server is on port 922 NOT port 22. using the ssh key you sent me last week you can test it with "ssh -v -p922 gitolite3@git.libre-soc.org" for goodness sake get that right first time, and under no circumstances type a password if prompted to. i got sufficiently fed up with continuous scanning and DDOS attacks against the server to set up a very draconian fail2ban policy which instantly bans any IP address from which password failures occur on ssh. once you have that working you want this URL: git clone ssh://gitolite3@git.libre-soc.org:922/mesa.git you can if you want hand-edit the .git/config file as i mentioned in an earlier message, including adding two url= lines or a separate section if you prefer.
(In reply to Luke Kenneth Casson Leighton from comment #53) > https://git.libre-soc.org/?p=mesa.git;a=commitdiff; > h=c5211837ec6469b2b9fba592845237050b5b2e9d > > vivek, Cole pushed the addotion of the vk* files to the libresoc_dev branch > on git.libre-soc.org > > you will need to git pull from there BEFORE adding any other files otherwise > we end up with a conflict. > > please see the HDL_workflow page, note that the ssh server is on port 922 > NOT port 22. using the ssh key you sent me last week you can test it with > "ssh -v -p922 gitolite3@git.libre-soc.org" > > for goodness sake get that right first time, and under no circumstances type > a password if prompted to. i got sufficiently fed up with continuous > scanning and DDOS attacks against the server to set up a very draconian > fail2ban policy which instantly bans any IP address from which password > failures occur on ssh. exactly this happened, due to prompt I got confuse. Can you please help? > > > once you have that working you want this URL: > > git clone ssh://gitolite3@git.libre-soc.org:922/mesa.git > > you can if you want hand-edit the .git/config file as i mentioned in an > earlier message, including adding two url= lines or a separate section if > you prefer.
(In reply to Luke Kenneth Casson Leighton from comment #52) > (In reply to vivekvpandya from comment #51) > > (In reply to Luke Kenneth Casson Leighton from comment #50) > > > (In reply to Jacob Lifshay from comment #49) > > > > (In reply to vivekvpandya from comment #48) > > > > > Currently driver skeleton fails at > > > > > <snip> > > > > > which is my original intention to in Step 2: as per comment 36. > > > > > > > > Yay! > > > > > > fantastic, vivek. what would you consider step 2 to be a reasonable > > > percentage completed so far, in terms of complexity / productivity? > > > > I am not sure, frankly it was lots of labor but we still far from > > interesting point where we can run some thing. > > the reason i ask is because we can create a milestone for you and assign a > percentage of the budget to it, and you can receive a donation from NLnet. > > it does have to be a "100% completed task" which is why we are > retrospectively creating "subtasks that happen to reflect 100% completed > work" :) > > basically if we do not have these sub-milestones we have to wait for the > ENTIRE driver to be completed 100% and only then can the donation (EUR > 11500) be paid to you. > > if that's actually ok with you then that's fine with me. I don't think I myself can judge the work I have done till now. If you can please do it. It may turned out that we need more work than we thought initially. > > > (In reply to Luke Kenneth > > Casson Leighton from comment #50) > > > (In reply to Jacob Lifshay from comment #49) > > > > (In reply to vivekvpandya from comment #48) > > > > > Currently driver skeleton fails at > > > > > <snip> > > > > > which is my original intention to in Step 2: as per comment 36. > > > > > > > > Yay! > > > > > > fantastic, vivek. what would you consider step 2 to be a reasonable > > > percentage completed so far, in terms of complexity / productivity? > > > > I think all these things are lots of labor work, mainly because I don't want > > to put code which is added without testing, so I am adding stubs for things > > that are not really required for shader creation. > > absolutely fine, btw we really do need to collate all the tests (into a > separate repo) so that anyone at amy time can join and get up to speed > immediately without delay or needing to wait for an email "what am i missing"
(In reply to vivekvpandya from comment #54) > any IP address from which password > failures occur on ssh. > exactly this happened, due to prompt I got confuse. Can you please help? i've unbanned the ip address starting 103. ok so try again: remember to use the id_rsa key matching the pubkey you sent me 10 days ago. do not try to type a password if requested password authorization. if you reach that stage it means you presented the wrong ssh key and password auth (100% guaranteed to fail) is the fallback.
(In reply to vivekvpandya from comment #55) > I don't think I myself can judge the work I have done till now. If you can > please do it. It may turned out that we need more work than we thought > initially. that's absolutely fine, we simply need to know, work out how much needs doing, and plan accordingly. let's discuss more when you are a bit further along.
Instead of writing LLVM based rasterizer from scratch I tried experimenting with lavapipe (gallium + llvmpipe for vulkan). It can be built with meson configure -Dvulkan-drivers=swrast -Dgallium-drivers=swrast set VK_ICD_FILENAMES to the lvp_icd.x86_64.json file and vulkaninfo should show it. I tried running demos from https://software.intel.com/content/www/us/en/develop/articles/api-without-secrets-introduction-to-vulkan-part-1.html and all 7 runs fine with lavapipe. shaders can be downloaded with export GALLIVM_DEBUG="ir" This seems more promising than writing everything from scratch. This allows us to focus core part of this bug.
(In reply to vivekvpandya from comment #58) > Instead of writing LLVM based rasterizer from scratch I tried experimenting > with lavapipe (gallium + llvmpipe for vulkan). that's a good intermediary (incremental) stage > I tried running demos from > https://software.intel.com/content/www/us/en/develop/articles/api-without- > secrets-introduction-to-vulkan-part-1.html and all 7 runs fine with lavapipe. that's fantastic! that's definitely a milestone worth getting you some $ for :) > > shaders can be downloaded with export GALLIVM_DEBUG="ir" > > This seems more promising than writing everything from scratch. This allows > us to focus core part of this bug. it's a brilliant idea, to be able to progress other areas, having something that works allows other areas of development / experimentation. the only thing we found about gallium is: it has major design flaws, a single-threaded internal bottleneck, which was why we eliminated it right at the start. for a low-performance *software* renderer (as a poor substitute so that people at least have "something working") gallium is perfect. for a high-performance *hybrid* renderer where we want to have a minimum of 4 cores, working up over time to 64 cores, it will be a show-stopper. that said if you've managed to get as far as you have, and opened up a new development path with it, that's fantastic, and a good call.
(In reply to Luke Kenneth Casson Leighton from comment #59) > (In reply to vivekvpandya from comment #58) > > Instead of writing LLVM based rasterizer from scratch I tried experimenting > > with lavapipe (gallium + llvmpipe for vulkan). > > that's a good intermediary (incremental) stage > > > I tried running demos from > > https://software.intel.com/content/www/us/en/develop/articles/api-without- > > secrets-introduction-to-vulkan-part-1.html and all 7 runs fine with lavapipe. > > that's fantastic! that's definitely a milestone worth getting you some $ > for :) > > > > > shaders can be downloaded with export GALLIVM_DEBUG="ir" > > > > This seems more promising than writing everything from scratch. This allows > > us to focus core part of this bug. > > it's a brilliant idea, to be able to progress other areas, having something > that works allows other areas of development / experimentation. > > the only thing we found about gallium is: it has major design flaws, a > single-threaded internal bottleneck, which was why we eliminated it right > at the start. Agreed but that is separate issue and can be improved if there are some low hanging fruits. > > for a low-performance *software* renderer (as a poor substitute so that > people at least have "something working") gallium is perfect. > > for a high-performance *hybrid* renderer where we want to have a minimum > of 4 cores, working up over time to 64 cores, it will be a show-stopper. > Again may be some low hanging fruits like https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4385 > that said if you've managed to get as far as you have, and opened up > a new development path with it, that's fantastic, and a good call. This particular path can get us started on LLVM side. I am thinking to modify LLVM power-pc backend which will run a simple vectorizer pass (just before Global ISEL)and create libre-soc specific LLVM intrinsics (can be added with TD) and then updated Global Isel to generate libre-soc's textual assembly for newly added llvm instrinsics. On other side it will take some time for someone like me (who don't have much experience in mesa) and most of code inspiration will be through llvmpipe. So that can be explored later.
(In reply to vivekvpandya from comment #60) > Again may be some low hanging fruits like > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4385 interesting, a parallel addition. > > that said if you've managed to get as far as you have, and opened up > > a new development path with it, that's fantastic, and a good call. > > This particular path can get us started on LLVM side. I am thinking to > modify LLVM power-pc backend which will run a simple vectorizer pass (just > before Global ISEL)and create libre-soc specific LLVM intrinsics (can be > added with TD) and then updated Global Isel to generate libre-soc's textual > assembly for newly added llvm instrinsics. great idea. if that can be done as stand-alone programs (not needing the entirety of mesa) that would be particularly good, or, more to the point, if the *assembly* can be generated stand-alone that's really good. then it can be run through the python-based simulator, which can do around 2,500 instructions per second on high-end hardware, and that's perfect for running short programs. in parallel with that, a c-based simulator can be written, which can do 100,000 instructions per second even on low-end hardware. btw a word of caution: if expanded out to a single 1D linear sequence there are over a QUARTER OF A MILLION possibly even HALF A MILLION instructions in SVP64. let the implications sink in for a minute. LLVM Vector ISA support will have assumed that there are a limited number of instructions, possibly as many as 1,000, maybe even 10,000 when all intrinsics are permuted out, so it is "perfectly fine" to have programs which auto-generate all possible IR combinations. for SVP64 this approach would create multi-megabyte IR files. this is down to the fact that there is a 32-bit opcode space *MULTIPLIED* by a 24-bit prefix. it would be much, much better if LLVM's IR was designed around the 2D {prefix}{suffix} concept rather than the 1D {prefix * suffix} space. but, we work with what we've got.
(ediying to remove unnecessary context, please do this in future) (In reply to Luke Kenneth Casson Leighton from comment #61) > great idea. if that can be done as stand-alone programs (not needing > the entirety of mesa) that would be particularly good, or, more to the > point, if the *assembly* can be generated stand-alone that's really > good. More info on building llvm and use it in mesa this gist https://gist.github.com/Venemo/a9483106565df3a83fc67a411191edbd#building-llvm-and-using-it-with-mesa
IRC logs questions: https://libre-soc.org/irclog/latest.log.html#t2021-07-02T13:15:39
(In reply to vivekvpandya from comment #60) > This particular path can get us started on LLVM side. I am thinking to > modify LLVM power-pc backend which will run a simple vectorizer pass (just > before Global ISEL)and create libre-soc specific LLVM intrinsics (can be > added with TD) and then updated Global Isel to generate libre-soc's textual > assembly for newly added llvm instrinsics. we will eventually want to use llvm-ir's built-in support for vector types and use the common intrinsics/instructions rather than libre-soc-specific intrinsics, allowing waay more code to generate SV instructions without needing to be modified.
(In reply to Jacob Lifshay from comment #64) > we will eventually want to use llvm-ir's built-in support for vector types > and use the common intrinsics/instructions rather than libre-soc-specific > intrinsics, allowing waay more code to generate SV instructions without > needing to be modified. i need to review the LLVM-IR vector format type. as long as they've kept the type separate from the operation - and don't expand them out via all possible permutations into one single massive IR list, it'll be fine.
(In reply to Jacob Lifshay from comment #64) > (In reply to vivekvpandya from comment #60) > > This particular path can get us started on LLVM side. I am thinking to > > modify LLVM power-pc backend which will run a simple vectorizer pass (just > > before Global ISEL)and create libre-soc specific LLVM intrinsics (can be > > added with TD) and then updated Global Isel to generate libre-soc's textual > > assembly for newly added llvm instrinsics. > > we will eventually want to use llvm-ir's built-in support for vector types > and use the common intrinsics/instructions rather than libre-soc-specific > intrinsics, allowing waay more code to generate SV instructions without > needing to be modified. target specific intrinsics are common in LLVM (for example NVPTX) and in this case it is just for one faze (CodeGen Prepare -> Global ISEL) Adding those intrinsics and doing a vectorization in codegen prepare. Again this just makes codegen process easier. In other option I was thinking say an add instruction is being fed with output of extract_vector (llvm instruction) then in GlobalIsel we can match that pattern and generate code.
(edited to remove irrelevant context, please do this in future) (In reply to Luke Kenneth Casson Leighton from comment #61) > (In reply to vivekvpandya from comment #60) > > > Again may be some low hanging fruits like > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4385 > > interesting, a parallel addition. LP_NUM_THREADS=1 this env variable indicates that llvmpipe rasterization is threaded.
(In reply to vivekvpandya from comment #67) > LP_NUM_THREADS=1 this env variable indicates that llvmpipe rasterization is > threaded. jacob can you remember which bit of llvmpipe / gallium3d is incapable of using threads no matter what this definition is? i recall vaguely from 3 years ago it was something to do with the shader engine. of course, this *may* have changed in the intervening time, in which case its use can be reevaluated.
(In reply to Luke Kenneth Casson Leighton from comment #68) > (In reply to vivekvpandya from comment #67) > > > LP_NUM_THREADS=1 this env variable indicates that llvmpipe rasterization is > > threaded. > > jacob can you remember which bit of llvmpipe / gallium3d is incapable of > using threads no matter what this definition is? i recall vaguely from 3 > years ago it was something to do with the shader engine. of course, this > *may* have changed in the intervening time, in which case its use can be > reevaluated. iirc the part which isn't both multithreaded and vectorized is the vertex shader
(In reply to Jacob Lifshay from comment #69) > (In reply to Luke Kenneth Casson Leighton from comment #68) > > (In reply to vivekvpandya from comment #67) > > > > > LP_NUM_THREADS=1 this env variable indicates that llvmpipe rasterization is > > > threaded. > > > > jacob can you remember which bit of llvmpipe / gallium3d is incapable of > > using threads no matter what this definition is? i recall vaguely from 3 > > years ago it was something to do with the shader engine. of course, this > > *may* have changed in the intervening time, in which case its use can be > > reevaluated. > > iirc the part which isn't both multithreaded and vectorized is the vertex > shader that applies to llvmpipe, not gallium3d as a whole.
(In reply to vivekvpandya from comment #58) > I tried running demos from > https://software.intel.com/content/www/us/en/develop/articles/api-without- > secrets-introduction-to-vulkan-part-1.html and all 7 runs fine with lavapipe. vivek did you remember to "git push" to git.libre-soc.org? i checked, last update was 3 months ago, is that correct?
(In reply to Luke Kenneth Casson Leighton from comment #71) > (In reply to vivekvpandya from comment #58) > > > I tried running demos from > > https://software.intel.com/content/www/us/en/develop/articles/api-without- > > secrets-introduction-to-vulkan-part-1.html and all 7 runs fine with lavapipe. > > vivek did you remember to "git push" to git.libre-soc.org? > i checked, last update was 3 months ago, is that correct? Yes that is correct. No recent updates on code.
From LLVM Discord I got following information on Power ISA 3.1 status. "The back end supports ISA 3.1 with -mcpu=pwr10. the scheduling model is not updated yet. The scheduling model will be updated once the chips are generally available."
(In reply to vivekvpandya from comment #72) > Yes that is correct. No recent updates on code. ok no problem, it's nice that it works with lavapipe, that's significant. (In reply to vivekvpandya from comment #73) > From LLVM Discord I got following information on Power ISA 3.1 status. > > "The back end supports ISA 3.1 with -mcpu=pwr10. the scheduling model is not > updated yet. The scheduling model will be updated once the chips are > generally available." ah excellent. ok, so the concept of "prefixes" is in. they've been running on the IBM proprietary Power ISA simulator up until now. with 64-bit "prefixes" in LLVM, adding SVP64 will be slightly easier.
(In reply to vivekvpandya from comment #73) > From LLVM Discord I got following information on Power ISA 3.1 status. > > "The back end supports ISA 3.1 with -mcpu=pwr10. the scheduling model is not > updated yet. The scheduling model will be updated once the chips are > generally available." we're not using Power ISA 3.1 so this is not a problem, and is not a blocker to progress, at all.
this needs justification -- iirc the driver that was worked on for this is far from complete.
with these additions it is able to run basic Shader Compilations to LLVM and passes some mesa unit tests on the native (host) target. it therefore meets the defined goal by being able to create native (host) non-accelerated LLVM. https://git.libre-soc.org/?p=mesa.git;a=shortlog;h=refs/heads/libresoc_dev 2021-03-20 Vivek Pandya Updated code generation so that for vertex shader outpu... 2021-03-14 Vivek Pandya Implement RenderPass, CommandBuffers, Buffers, GPUState, 2021-02-22 Vivek Pandya Update libresoc_CmdClearColorImage to get color in... 2021-02-21 Vivek Pandya Add pointer in libresoc_device_memory to hold bytes... 2021-02-06 Vivek Pandya Added code to process NIR shared load/store intrinsic. 2020-12-21 Vivek Pandya Fixing load input and store output related issues. 2020-12-21 Vivek Pandya Add processing if and loops in nir_to_llvm translation 2020-12-19 Vivek Pandya At this commit driver is able to generate broken LLVM IR 2020-09-19 Vivek Pandya Add code to LLVM from its C-API. 2020-09-10 Vivek Pandya Add missing vk_format* files. 2020-09-07 Vivek Pandya Added few more stubs so that control reaches to Destroy...