To add video acceleration to Libre-SoC, upstream, for ffmpeg, gstreamer, libswscale, libh264, libh265 and other libraries. https://libre-soc.org/nlnet_2019_video/ https://libre-soc.org/vpu/ Audio * bug #218, MP3 * bug #219, AC3 * bug #220, Vorbis * bug #221, Opus Video * bug #222, MJPEG (JPEG) * bug #223, MPEG1/2 * bug #224, MPEG4 ASP (xvid) * bug #225, H.264 * bug #226, H.265 * bug #227, VP8 * bug #228, VP9 * bug #229, AV1 Opcodes: bug #234 implement opcodes in hardware * rgb/bgr24 (TBD in 3D GPU or in this one?) * rgbx/bgrx/xrgb/xbgr32 (TBD in 3D GPU or in this one?) * nv12 (TBD in 3D GPU or in this one?) * nv21 (TBD in 3D GPU or in this one?) Simulator * bug #230 discuss and add opcode(s) proposed by lauri * bug #233 set up unit tests for opcodes under simulator Standards Documentation: bug #231 * write up all opcodes (related to #230) as formal standards note: this is where the iterative loop comes in. there will be several rounds adding different opcodes to try out FPGA - bug #235 * run unit tests under FPGA * run full OS (VLC?) demo under FPGA todo, edit this comment and list a series of tasks to assign budgets to. then, create bugreports for each. see bug #48 for a template TODO, subdivide these down into smaller tasks (discuss below) so that reasonably accurate budgetary amounts can be assigned to them. slight overestimation (10 to 15% or so) is recommended (and acceptable).
https://libre-soc.org/vpu/ Audio * 2 weeks MP3 EUR 750 * 2 weeks AC3 EUR 750 * 2 weeks Vorbis EUR 750 * 2 weeks Opus EUR 750 Video * 4 weeks MJPEG (JPEG) EUR 1500 * 4 weeks MPEG1/2 EUR 1500 * 5 weeks MPEG4 ASP (xvid) EUR 2000 * 8 weeks H.264 EUR 3000 * 10 weeks H.265 EUR 4000 * 8 weeks VP8 EUR 3000 * 8 weeks VP9 EUR 3000 * 10 weeks AV1 EUR 4000 Total EUR 25000 * Opcodes development and discussion: EUR 4000 * Opcodes Standards writeup: EUR 2000 * Implementation of opcodes in simulator: EUR 5000 * Unit tests in simulator: EUR 3000 * Hardware implementation: EUR 9000 * FPGA tests: EUR 2000
Each codec then has these phases: - research - for each hotspot, implementation - for each target library, upstreaming HW implementations of new instructions would be later, once the instructions are known.
(In reply to cand from comment #2) > Each codec then has these phases: > - research > - for each hotspot, implementation > - for each target library, upstreaming ok great, do you have an estimation of time (and budget you'd like to receive) for each? 1 week research, 2 week impl, 3 day upstream coordination, that sort of thing? we can subdivide later (3 subbugs per each top bug) if you would like part-payment however that is for later. the focus now is to identify toplevel and assign budgets. > HW implementations of new instructions would be later, once the instructions > are known. yes. or, more to the point, you advise us what you would like, then we implement them in a simulator (which we have to budget how to run under that, btw - it may be that we only run a subset of the code, say, only the algorithm or a unit test rather than full VLC or sonething) then after the cycles/sec is confirmed *then* we implement that opcode in hw and finally actually run under an FPGA. this will be much later, at the end of the process.
Each codec is of different complexity. The audio codecs usually only have a single hotspot, while at the other end AV1 has several dozen. I'll do a quick pass later, to get rough figures on those. I thought the simulator would be part of the implementation loop?
(In reply to cand from comment #4) > Each codec is of different complexity. The audio codecs usually only have a > single hotspot, while at the other end AV1 has several dozen. thought so. > I'll do a > quick pass later, to get rough figures on those. great. > I thought the simulator would be part of the implementation loop? hmmm yes, however think about it: several CODECs will share the same opcodes. you don't make a YUV2RGB opcode for VP9 and a different one for MPEG :) so i was kinda leaning towards them being on their own (aggregated) iterative cycle, if you know what i mean. if we can get a rough idea in advance of the sorts of opcodes needed, bear in mind that for the most part they need to be "scalar" in nature because the Vector System adds that hardware-for-loop on top *of* scalar operations, it would be very handy. then those can also be analysed as to a simulation implementation timescale and hw timescale and budget as well. we are not going to be able to predict exactly everything here, that is what the iterations are for. we just need a start.
Weren't the colorspace conversions part of the GPU milestone? That's what I understood from the ML earlier.
(In reply to cand from comment #6) > Weren't the colorspace conversions part of the GPU milestone? That's what I > understood from the ML earlier. yes good point, so we need to make sure not to double-allocate budget.
Rough relative complexities: MP3 1 1% AC3 1 1% Vorbis 1 1% Opus 1 1% MJPEG (JPEG) 2 2% MPEG1/2 2 2% MPEG4 ASP (xvid) 4 5% H.264 10 11% H.265 20 23% VP8 8 9% VP9 10 11% AV1 28 32% This doesn't translate well to budget though, no sense in spending a third on AV1. Perhaps a more sensible goal would be to target the largest hot spots of each, with only smaller budget differences due to complexity. Another point to consider is that while ffmpeg is the prime lib, parts of accel code made for ffmpeg aren't really usable in the various standalone libs. Different structures, etc. In order to not write things twice, some decisions need to be made on which upstreams particularly matter.
(In reply to cand from comment #8) > This doesn't translate well to budget though, no sense in spending a third > on AV1. Perhaps a more sensible goal would be to target the largest hot > spots of each, with only smaller budget differences due to complexity. yes. and, during later iterations, do some more. > Another point to consider is that while ffmpeg is the prime lib, parts of > accel code made for ffmpeg aren't really usable in the various standalone > libs. Different structures, etc. In order to not write things twice, some > decisions need to be made on which upstreams particularly matter. well, ultimately, gstreamer has an ffmpeg plugin, ffmpeg has a gstreamer plugin, vdpau has a vaa plugin, vaa has a vdpau plugin, it's all circular [1] and up its own backside [2], so whichever we pick is good :) which route would be easiest for you, do let's go with that. [1] yes i managed to install both vdpau and vaa recursively, once, whoops... [2] the beatles "yellow submarine" film demonstrates this well
Okay, then I'd say ffmpeg for everything else except av1 (dav1d) and jpeg (libjpeg-turbo). Time and budget, your earlier comment on 1 week research, 2 week impl, 3 day upstream coordination is fairly on point, for one hotspot (or a couple smaller ones). For the later iterations only the impl phase would be budgeted. I'd say 400e/wk, so 400 for research, 800 for one impl iteration, and 240 for the upstream part. I don't know how difficult the fpga side is, how much should be budgeted for that; IIRC you also said the entire amount should be used this year, or it'd be lost. Starting point for discussion anyway.
> should be budgeted for that; IIRC you also said the entire amount should be > used this year, or it'd be lost. Starting point for discussion anyway. we have until mid 2021 so not as heavy there.
lauri do the budgets look reasonable? l.
Sorry, which ones?
Oh, you edited comment #1, emails don't go out for edits so didn't see it at first. Yes, they look ok, other than being off by 1k (51k total).
adjusted thx. alain this one needs a writeup too when the time is right. similar to http://bugs.libre-riscv.org/show_bug.cgi?id=158#c4 except we need to create the individual bugreports first (all 17 of them)
# Schedule A to be attached to MoU List of tasks, plus description, bugtracker URL and budget # MP3 optimizations Optimizing MP3 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=218 Budget: EUR 750 # AC3 optimizations Optimizing AC3 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=219 Budget: EUR 750 # Vorbis optimizations Optimizing Vorbis code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=220 Budget: EUR 750 # Opus optimizations Optimizing Opus code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=221 Budget: EUR 750 # JPEG optimizations Optimizing JPEG code in libjpeg-turbo with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=222 Budget: EUR 1500 # MPEG1/2 optimizations Optimizing MPEG1/2 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=223 Budget: EUR 1500 # MPEG4 ASP optimizations Optimizing MPEG4 ASP (xvid) code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=224 Budget: EUR 2000 # H.264 optimizations Optimizing H.264 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=225 Budget: EUR 3000 # H.265 optimizations Optimizing H.265 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=226 Budget: EUR 4000 # VP8 optimizations Optimizing VP8 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=227 Budget: EUR 3000 # VP9 optimizations Optimizing VP9 code in ffmpeg with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=228 Budget: EUR 3000 # AV1 optimizations Optimizing AV1 code in dav1d with new instructions. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=229 Budget: EUR 4000 # Video opcode development and discussion Video opcode development and discussion is needed, as well as research and informal write-up. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=230 Budget: EUR 4000 # Video Opcodes Standards "Formal" writeup Video Opcodes Standards writeup is required, to a level that is acceptable for formal proposal to the OpenPOWER Foundation URL: http://bugs.libre-riscv.org/show_bug.cgi?id=231 Budget: EUR 2000 # Implementation of video opcodes in simulator Implementation of video opcodes in simulator is needed, so that the effectiveness of the opcodes can be tested prior to implementing them in hardware (which simulates 10,000 to 100,000 times slower) URL: http://bugs.libre-riscv.org/show_bug.cgi?id=232 Budget: EUR 5000 # Audio and Video unit tests in simulator Audio and Video unit tests are needed, to be run in the simulator. These are not the full GUI, just the core algorithm. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=233 Budget: EUR 3000 # Hardware implementation of video opcodes Hardware implementation of video opcodes is needed, implementing the instructions that were demonstrated to be effective from earlier (software) simulations. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=234 Budget: EUR 9000 # Video opcode FPGA tests Video opcode FPGA tests are needed, demonstrating the correctness of the hardware implementation of the opcodes. URL: http://bugs.libre-riscv.org/show_bug.cgi?id=235 Budget: EUR 2000
Summary sentence for MoU Video acceleration is a necessary component for any modern CPU and GPU. Given the large amount of time the typical user spends on videos and its applications like videoconferencing, a performant and power-efficient implementation is necessary for wide adoption. With Zoom (etc.) now being critical to our modern life, and full of security holes, full transparency in the video encode/decode algorithms is more important than ever.
status: MoU signed, sent to NLNet. bob to countersign.
I see that you already sent this to NLNet, but I hope it's not too late to add AAC to the list of Audio Formats to accelerate. The reason being is that it is one of the primary audio formats used on YouTube and also has common use as a Bluetooth audio format. I would even argue that AAC is even more widely used than AC3 and Vorbis and Opus at least in my experience as an average user. In regards to the video formats, what is the legality of putting into the hardware to help decoding for MPEG2, MPEG4, H.264, and H.265? I know these are at least covered by the patent pool of MPEG LA ( https://en.wikipedia.org/wiki/MPEG_LA ). Some of the audio codecs may or may not also be tangled in legalities (not sure, I haven't really looked into it for the audio formats). Would it be prudent to get legal advice first? Or do you feel it is safe doing this?
I believe it's too late to add AAC, however due to how ffmpeg is structured, some of the work on the other codecs will speed up AAC too. Not as much as focusing on it, but more than plain sw. MPEG2, MP3, AC3 are all free, patents expired. Vorbis and Opus were explicitly designed for that. For the newer video codecs, it's possible any implementation would infringe patents. So does 90% of ffmpeg code. IANAL, but only the sw patents I believe, as the hw blocks we will have will not be specialized to any codec. Unlike a modern GPU that has "H.264 frame in, RGB out" blocks, we will have sub-operations such as "calculate transform XYZ for this data". Those are not patentable in general, some specific algorithm in hw may be. So my conclusion is that the hw is safe, but if a commercial entity wishes to ship our software in the US, they will need to disable the newer video codecs or to license patents. I.e. the exact same situation as ffmpeg code, if they want to ship that in a product with new stuff enabled.
(In reply to cand from comment #20) > I believe it's too late to add AAC, however due to how ffmpeg is structured, > some of the work on the other codecs will speed up AAC too. Not as much as > focusing on it, but more than plain sw. > Ah ok, at least it will get some treatment. > MPEG2, MP3, AC3 are all free, patents expired. Vorbis and Opus were > explicitly designed for that. > Ok, good to hear the other audio formats are safe! > For the newer video codecs, it's possible any implementation would infringe > patents. So does 90% of ffmpeg code. IANAL, but only the sw patents I > believe, as the hw blocks we will have will not be specialized to any codec. > Unlike a modern GPU that has "H.264 frame in, RGB out" blocks, we will have > sub-operations such as "calculate transform XYZ for this data". Those are > not patentable in general, some specific algorithm in hw may be. > > So my conclusion is that the hw is safe, but if a commercial entity wishes > to ship our software in the US, they will need to disable the newer video > codecs or to license patents. I.e. the exact same situation as ffmpeg code, > if they want to ship that in a product with new stuff enabled. I'm not a lawyer either, though your conclusion sounds reasonable to me. :) Patents are a minefield to navigate (sigh)