231 – Video Opcodes Standards writeup

Bug 231 - Video Opcodes Standards writeup

Summary: Video Opcodes Standards writeup

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Alain D D Williams

URL:	https://libre-soc.org/openpower/sv/av...

Depends on:	230
Blocks:	137
	Show dependency tree / graph

Reported:	2020-03-13 10:00 GMT by cand
Modified:	2022-09-01 20:07 BST (History)
CC List:	4 users (show)

See Also:	230
NLnet milestone:	NLNet.2019.10.031.Video
total budget (EUR) for completion of task and all subtasks:	2000
budget (EUR) for this task, excluding subtasks' budget:	2000
parent task for budget allocation:	137
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	red = { amount = 2000, submitted = 2022-08-26, paid = 2022-08-31 }

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description cand 2020-03-13 10:00:22 GMT

Video Opcodes Standards writeup is required, to a level that is acceptable
for formal proposal to the OpenPOWER Foundation

Comment 1 Alain D D Williams 2020-09-02 16:36:36 BST

Please point me to where I can find the information to do the write up

Comment 2 cand 2020-09-02 16:50:56 BST

It doesn't exist yet, only after development is complete.

Comment 3 Luke Kenneth Casson Leighton 2020-09-02 17:47:52 BST

(In reply to cand from comment #2)
> It doesn't exist yet, only after development is complete.

at some point we will need to do some sort of reasonable educated
evaluation of what _should_ be (in general) added, and go from there.

Comment 4 Cole Poirier 2020-09-02 18:56:14 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to cand from comment #2)
> > It doesn't exist yet, only after development is complete.
> 
> at some point we will need to do some sort of reasonable educated
> evaluation of what _should_ be (in general) added, and go from there.

I know Jacob has an idea for at least one instruction, something to do with triangle vertex extraction into a vector(?), Jacob, please correct my flawed memory of what you relayed to me last week.

I think using pia to see what the most common operations executed by popular video compression algorithms would be one of the analyses we should do so that it can help inform our instruction addition evaluation. I may be severely misunderstanding some things here, but hey, that'll be good for both me and future readers of the bugs ;)

Comment 5 Jacob Lifshay 2020-09-02 20:42:43 BST

(In reply to Cole Poirier from comment #4)
> I know Jacob has an idea for at least one instruction, something to do with
> triangle vertex extraction into a vector(?), Jacob, please correct my flawed
> memory of what you relayed to me last week.

I was thinking of instructions to help with computing which pixels are inside a triangle for rendering purposes. That's generally conceptually split into two categories that overlap somewhat: triangle setup and rasterization.

> I think using pia to see what the most common operations executed by popular
> video compression algorithms would be one of the analyses we should do so
> that it can help inform our instruction addition evaluation. I may be
> severely misunderstanding some things here, but hey, that'll be good for
> both me and future readers of the bugs ;)

power-instruction-analyzer as it is currently is not suitable for that, since it has instruction models for single instructions and checks to see if they match the behavior of the inline-assembler versions of those instructions. It does not allow running arbitrary programs.

Something like gdb (on a power processor) or qemu (on any supported host ISA) would be better for that, since those are designed to actually run a program given to them. It would have to be set up to step instruction by instruction and disassemble at each step, incrementing counters for each instruction kind. Other good options would be something like a profiler, they are specifically designed to measure how often instructions/code pieces are run.

objdump or just compiling the program's source to assembly would also be useful (to a lesser extent), since you can see which instructions the program contains.

Comment 6 Luke Kenneth Casson Leighton 2020-09-02 22:27:10 BST

folks, just to clarify: this bugreport is for discussion of audio/video encode/decode specific instructions, and for their documentation and to track their submission to an OpenPOWER Foundation ISA Working Group. 

3D GPU instructions, such as triangle detection, unless they can be demonstrated as needed for a video CODEC, are out of scope for this bugreport.  there will be another bugreport for discussion of 3D opcodes to be documented and submitted to OPF ISA WG.

also, Cole: the technique used by Jeff Bush in his Nyuzi paper is the most effective technique (calculating the pixels per clock achieved for a given enhancement).   measuring the effectiveness of an *existing* instruction set merely gives us a comparative baseline.  as we are doing a software-accelerated augmented ISA then just as in Jeff's Nyuzi work we can "profile" different areas of a CODEC and target those areas where alternative opcodes would give the highest bang per buck.

Lauri is correct in that we need at least binutils support (gnu as) for SV even to begin to start that assessment.  or at the very least a really bad hack which pre-processes assembler

that in turn means that we need realistically to do the SV-i-fication of POWER9 before being able to start this work in earnest.

Comment 7 cand 2020-09-03 07:43:30 BST

Instruction level counting won't be useful for this. You'll discover that additions and multiplies are used a lot, sometimes in vector forms.

The iteration loop is covered in other bugs in detail, but basically it's "find hotspot at C function level, imagine if new instructions would speed it up, write in SV asm, measure".

Comment 8 Cole Poirier 2020-09-07 01:28:12 BST

(In reply to Luke Kenneth Casson Leighton from comment #6)
> folks, just to clarify: this bugreport is for discussion of audio/video
> encode/decode specific instructions, and for their documentation and to
> track their submission to an OpenPOWER Foundation ISA Working Group. 

Yes sorry, that's my bad, I accidentally conflated the two when writing my above comment.

> 3D GPU instructions, such as triangle detection, unless they can be
> demonstrated as needed for a video CODEC, are out of scope for this
> bugreport.  there will be another bugreport for discussion of 3D opcodes to
> be documented and submitted to OPF ISA WG.

Makes sense.

> also, Cole: the technique used by Jeff Bush in his Nyuzi paper is the most
> effective technique (calculating the pixels per clock achieved for a given
> enhancement).   measuring the effectiveness of an *existing* instruction set
> merely gives us a comparative baseline.  as we are doing a
> software-accelerated augmented ISA then just as in Jeff's Nyuzi work we can
> "profile" different areas of a CODEC and target those areas where
> alternative opcodes would give the highest bang per buck.

Pretty amazing work!! I finally finished reading his nyuzi rasterizer paper last night and there are at least three (!!) mind-blowing revelations in the last page alone. Definitely looking forward to chatting with you about your conversations with Jeff Bush circa last year or the year before (I forget when you wrote about having many discussions with him on a crowsupply update - or somewhere else...). Thanks for the tip to check out his published literature :)
The method he outlines is very clever, very glad to have my misapprehensions about the effectiveness of instruction counting corrected. There is genuinely no better feeling than learning that there is a far better, more efficient, and elegant way of going about something, and having it laid out for free and in public in very great detail so that I can really study it.

> Lauri is correct in that we need at least binutils support (gnu as) for SV
> even to begin to start that assessment.  or at the very least a really bad
> hack which pre-processes assembler
> 
> that in turn means that we need realistically to do the SV-i-fication of
> POWER9 before being able to start this work in earnest.

With you. I mentioned that lu_zero at gentoo is working on a lot of media/video encoding stuff and specifically working on the power side as well, with a good amount of work on - if I'm not misremembering - binutils for power!! I'm planning on reaching out to him this week. Should I send my draft email seeking technical and correctness input on what I've written to the libre-soc-dev or libre-soc-org mailing list?

Comment 9 Luke Kenneth Casson Leighton 2022-08-02 16:50:21 BST

this is now effectively completed: the instructions have ended up in
bitmanip, whilst others are in the int_fp_mv page.  additionally, features
such as Horizontal-Sum Schedules (etc) all ended up in the main SVP64
specification.

closing this one as complete although the URL above is more of a
"central hub" for how to *find* the work that's been done.