Bug 365 - ROCM/Libre-SOC GPU Opcode interoperability
Summary: ROCM/Libre-SOC GPU Opcode interoperability
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Mac OS
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2020-06-07 00:57 BST by Yehowshua
Modified: 2020-06-12 19:24 BST (History)
3 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yehowshua 2020-06-07 00:57:35 BST
It recently occurred to me that the entire ROCM stack is open.
ROCM also has, OpenGL, Vulkan, OpenCL, and CUDA -> HIP
frontends.

At the very bottom of ROCM sits llvm which emits AMD GPU opcodes.

It **should be possible to modify this emitter for our GPU. Then we’d virtually
have to do little to no work.

I’m actively investigating if this is possible and am trying to start a conversation
with some ROCM engineers.
Comment 2 Cole Poirier 2020-06-07 03:50:16 BST
I watched this LinuxConf talk that I think is salient you this discussion, sometime in the last year and rewatched it today. But Mummy I don't want to use CUDA - Open source GPU compute, https://m.youtube.com/watch?v=ZTq8wKnVUZ8. It seems like Dave Aerlie at Red Hat Melbourne would be someone worth talking to about this.


Not quite as related thoughts:

In the future perhaps the standardization of an open compute stack could be undertaken by the OpenPower Foundation. Just spitballing here, I have no idea of practicality or feasibility.

A related (OPF Linux conf) thing with this talk about booting faster (https://m.youtube.com/watch?v=fTLsS_QZ8us) reminded me of our discussion of using our single core iteration as a first stage boot processor, but I’m posting it here as it’s so what related, and I could find the boot stage processor discussion searching bugzilla. I’d like to migrate this to the Systèmes Librés Bugzilla once we have that discussion and set up what we decide on.
Comment 3 Cole Poirier 2020-06-07 04:03:20 BST
Dave Airlie

His graphics blog: https://airlied.blogspot.com/?m=1

Looks like he’s still at Red Hat Melbourne: https://au.linkedin.com/in/dave-airlie-a2b07a1
Comment 4 Luke Kenneth Casson Leighton 2020-06-07 04:31:57 BST
(In reply to Cole Poirier from comment #3)
> Dave Airlie

caused massive problems for luc verhaegen, setting him back years in his reverse engineering efforts of AMDGPU.

airlied's disruptive interference and
blatant disregard for luc's efforts caused AMD to terminate release of key strategic internal documentation that luc had spent several YEARS carefully negotiating to get public access to for the wider benefit of the entire libre community *and* ultimately of AMD.
Comment 5 Cole Poirier 2020-06-07 04:47:26 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to Cole Poirier from comment #3)
> > Dave Airlie
> 
> caused massive problems for luc verhaegen, setting him back years in his
> reverse engineering efforts of AMDGPU.
> 
> airlied's disruptive interference and
> blatant disregard for luc's efforts caused AMD to terminate release of key
> strategic internal documentation that luc had spent several YEARS carefully
> negotiating to get public access to for the wider benefit of the entire
> libre community *and* ultimately of AMD.

YIKES!! I read about that a year ago from links you posted, I was very upset by Airlies actions. I clearly forgot that the name of the villain in that story was Airlie...

In light of this, I think what would be vastly more valuable would be to solicit Luc Verhaegen’s input on this bug specifically. I believe I remember him commenting on list a few times, however I can’t recall with enough clarity to comment.
Comment 6 Luke Kenneth Casson Leighton 2020-06-07 05:30:04 BST
(In reply to Cole Poirier from comment #5)


> In light of this, I think what would be vastly more valuable would be to
> solicit Luc Verhaegen’s input on this bug specifically. I believe I remember
> him commenting on list a few times, however I can’t recall with enough
> clarity to comment.

sadly, luc (who is named explicitly in ARM NDAs that under no circumstances is he to be contacted for any reason by the signatories) has been under such constant attack for over a decade, for his work, that he has given up working on graphics drivers entirely.

a friend of his found him some work to do and he now has enough money to pay for his family.

some microsoft employees at one point gave serious consideration to engaging in a similar style of concerted attack against me, because of the reverse engineering that i did back in 1996-2000.  they even called my employer, ISS, to arrange to have me fired or silenced.

several senior employees inside microsoft, people who had been with the company since its beginning, had to explain to them in very blunt and clear terms that if they pissed me off, the knowledge and expertise that i had on the security vulnerabilities within the NT Operating System (of which those senior employees were keenly aware) could, if i focussed on revealing those vuulnerabilities day after day, week after week, could literally have brought their billion dollar company to its knees.

they left me alone.

luc verhaegen was not in a similar position because MALI and GPUs in general are not exactly critical components (unlike spectre, meltdown etc), and there is not a monopoly situation like there was with microsoft.


now you know a little bit more about the background, why i started this project, and also why full transparency is so very important.  it's because with full transparency there *is* no opportunity to exploit, blackmail or undermine software libre developers, and there *is* no need for people to frivolously have their time and expertise wasted on reverse engineering.

it just so happens that this results in things being far easier for customers (like the fact that RTOSes such as the Amazon IOT one) get *direct* access to GPU capabilities, debugging is easier, extensibility is easier, development costs are dramatically reduced and so on.
Comment 7 Jacob Lifshay 2020-06-07 18:41:08 BST
From what I understand, ROCm is only for OpenCL/CUDA-style compute, it doesn't implement the Vulkan or OpenGL APIs:
https://github.com/RadeonOpenCompute/ROCm/issues/706
https://github.com/RadeonOpenCompute/ROCm/issues/131
Comment 8 Luke Kenneth Casson Leighton 2020-06-07 18:52:27 BST
https://gpuopen.com/gpuperfapi/

that's interesting. Version 3.5 (dec 2019):
"Remove ROCm/HSA support."
Comment 9 Cole Poirier 2020-06-07 21:21:27 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)
> sadly, luc (who is named explicitly in ARM NDAs that under no circumstances
> is he to be contacted for any reason by the signatories) has been under such
> constant attack for over a decade, for his work, that he has given up
> working on graphics drivers entirely.

That's devastatingly sad, what a loss for the libre community! Just so incredibly upsetting.
 
> a friend of his found him some work to do and he now has enough money to pay
> for his family.

I'm thankful that he was able to survive, but it's still such a tragedy.

> some microsoft employees at one point gave serious consideration to engaging
> in a similar style of concerted attack against me, because of the reverse
> engineering that i did back in 1996-2000.  they even called my employer,
> ISS, to arrange to have me fired or silenced.
> 
> several senior employees inside microsoft, people who had been with the
> company since its beginning, had to explain to them in very blunt and clear
> terms that if they pissed me off, the knowledge and expertise that i had on
> the security vulnerabilities within the NT Operating System (of which those
> senior employees were keenly aware) could, if i focussed on revealing those
> vuulnerabilities day after day, week after week, could literally have
> brought their billion dollar company to its knees.
> 
> they left me alone.

This in contrast to what happened to Luc is pretty darn cool. I'm very glad you posed a great enough threat that you were left alone.
 
> luc verhaegen was not in a similar position because MALI and GPUs in general
> are not exactly critical components (unlike spectre, meltdown etc), and
> there is not a monopoly situation like there was with microsoft.

That's interesting, learning more about these market dynamics would probably help me write better pitches. Not antagonistic, just knowing the state of the markets we are trying to get into.
 
> now you know a little bit more about the background, why i started this
> project, and also why full transparency is so very important.  it's because
> with full transparency there *is* no opportunity to exploit, blackmail or
> undermine software libre developers, and there *is* no need for people to
> frivolously have their time and expertise wasted on reverse engineering.

This is why I joined the project and am trying to take a more active role in the company as well. You, Luke are the reason I joined the project, because of your awesome capabilities gained through very smart "laziness" in your reverse engineering and learning, but far more importantly your commitment to the Titanian principles. I want to help you succeed in your mission to bring genuine ethics to technology.

> it just so happens that this results in things being far easier for
> customers (like the fact that RTOSes such as the Amazon IOT one) get
> *direct* access to GPU capabilities, debugging is easier, extensibility is
> easier, development costs are dramatically reduced and so on.

I'm focusing heavily on this in the pitch. I really liked Cesar's comments from a few weeks ago about the difference our open stack would make to his productivity and capabilities when it comes to scientific data acquisition. I think this definitely applies across the board from hobbyists right up through mega corporations. With open standards and documentation, from the top of the software stack, down through firmware and to the level of hardware schematic diagrams, "debugging is easier, extensibility is easier, development costs are dramatically reduced and so on."
Comment 10 Cole Poirier 2020-06-07 21:29:52 BST
(In reply to Jacob Lifshay from comment #7)
> From what I understand, ROCm is only for OpenCL/CUDA-style compute, it
> doesn't implement the Vulkan or OpenGL APIs:
> https://github.com/RadeonOpenCompute/ROCm/issues/706
> https://github.com/RadeonOpenCompute/ROCm/issues/131

(In reply to Luke Kenneth Casson Leighton from comment #8)
> https://gpuopen.com/gpuperfapi/
> 
> that's interesting. Version 3.5 (dec 2019):
> "Remove ROCm/HSA support."

Can you clarify the purpose of the present investigation for me? 

Are we trying to find out if we can use the many man-years of development put into the Radeon Open Compute Stack by making minimal modifications to our external graphics api to make it compatible with HSA/ROCM?

Is HSA general across all radeon gpus? I remember reading that HSA was an abstraction specific to only two of AMD's APUs

Are we looking not so much for OpenCL support, but instead the graphics stack for OpenGL, Vulkan, DirectX, etc?

Or are we looking for both Compute and Graphics?

Would we take the route of making a fork of clang/llvm specific to Libre-SOC, and trying to have these changes upstreamed to clang/llvm trunk?

All of these questions are asked in near total ignorance.
Comment 11 Yehowshua 2020-06-07 22:23:25 BST
(In reply to Jacob Lifshay from comment #7)
> From what I understand, ROCm is only for OpenCL/CUDA-style compute, it
> doesn't implement the Vulkan or OpenGL APIs:
> https://github.com/RadeonOpenCompute/ROCm/issues/706
> https://github.com/RadeonOpenCompute/ROCm/issues/131

Hi Jacob, you are correct - however - Mesa has a Vulkan implementation that can sit on top of ROCM.
Comment 12 Luke Kenneth Casson Leighton 2020-06-07 22:39:08 BST
(In reply to Yehowshua from comment #11)
> (In reply to Jacob Lifshay from comment #7)
> > From what I understand, ROCm is only for OpenCL/CUDA-style compute, it
> > doesn't implement the Vulkan or OpenGL APIs:
> > https://github.com/RadeonOpenCompute/ROCm/issues/706
> > https://github.com/RadeonOpenCompute/ROCm/issues/131
> 
> Hi Jacob, you are correct - however - Mesa has a Vulkan implementation that
> can sit on top of ROCM.

from 706:

"ROCm is AMD's compute stack. If you want Vulkan support, you have two options. 1) the Mesa community RADV driver or 2) AMD's AMDVLK driver (which should be also part of their AMDGPU-PRO driver)."

my interpretation of that is that he hasn't explicitly said "no" (which is where the confusion comes about), he's said, "if you want vulkan, you need RADV or AMDVLK" from which we *deduce* that there is no connection between the two.

now, given that AMDGPU llvm support is now upstream, it *might* be the case
that two out of three of these projects *happen* to use the exact same
AMDGPU llvm compiler support.

my understanding of AMDVLK - jacob will be able to clarify - is that it's
a radically different approach, basically a sort-of effort to port and
libre-license what was formerly a proprietary Win32 driver to linux, where
the team working on it were isolated from what went into the upstream
AMDGPU llvm.
Comment 13 Jacob Lifshay 2020-06-09 18:04:18 BST
(In reply to Luke Kenneth Casson Leighton from comment #12)
> (In reply to Yehowshua from comment #11)
> > (In reply to Jacob Lifshay from comment #7)
> > > From what I understand, ROCm is only for OpenCL/CUDA-style compute, it
> > > doesn't implement the Vulkan or OpenGL APIs:
> > > https://github.com/RadeonOpenCompute/ROCm/issues/706
> > > https://github.com/RadeonOpenCompute/ROCm/issues/131
> > 
> > Hi Jacob, you are correct - however - Mesa has a Vulkan implementation that
> > can sit on top of ROCM.

If your talking about RADV, it does not require ROCM, it doesn't even require LLVM (though may require some refactoring to build without LLVM), since it works with two different compiler backends: ACO and LLVM.

AMDVLK also doesn't require ROCM, though does require LLVM.

> 
> from 706:
> 
> "ROCm is AMD's compute stack. If you want Vulkan support, you have two
> options. 1) the Mesa community RADV driver or 2) AMD's AMDVLK driver (which
> should be also part of their AMDGPU-PRO driver)."
> 
> my interpretation of that is that he hasn't explicitly said "no" (which is
> where the confusion comes about), he's said, "if you want vulkan, you need
> RADV or AMDVLK" from which we *deduce* that there is no connection between
> the two.
> 
> now, given that AMDGPU llvm support is now upstream, it *might* be the case
> that two out of three of these projects *happen* to use the exact same
> AMDGPU llvm compiler support.

All of RADV, AMDVLK, RadeonSI (Mesa Gallium driver), and ROCM can use LLVM as their compiler backend.

> 
> my understanding of AMDVLK - jacob will be able to clarify - is that it's
> a radically different approach, basically a sort-of effort to port and
> libre-license what was formerly a proprietary Win32 driver to linux, where
> the team working on it were isolated from what went into the upstream
> AMDGPU llvm.

From what I understand, AMDVLK was/is the shared codebase between AMD's proprietary Windows Vulkan driver and their open-source Linux Vulkan driver. On Windows and previously on Linux, it uses their proprietary shader compiler stack, which they have not open-sourced. AMDVLK was refactored to be able to use LLVM before their initial Linux open-source release.