Bug 541 - implement nmigen HDL IEEE754 and Khronos FP transcendentals needed for 3D (CORDIC ones)
Summary: implement nmigen HDL IEEE754 and Khronos FP transcendentals needed for 3D (CO...
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 53
  Show dependency treegraph
 
Reported: 2020-12-06 18:38 GMT by Luke Kenneth Casson Leighton
Modified: 2021-05-10 06:38 BST (History)
2 users (show)

See Also:
NLnet milestone: NLnet.2019.02
total budget (EUR) for completion of task and all subtasks: 1000
budget (EUR) for this task, excluding subtasks' budget: 1000
parent task for budget allocation: 53
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-12-06 18:38:54 GMT
transcendental functions are needed for the GPU, including:

* exp
* log
* log1p
* expm1
* exp2
* log2
* atan2
* acos
* asin
* atan
* sin
* cos
* sincos
* cosh
* sinh
* tanh
* asinh
* acosh
* atanh

and appropriate pi-multiplied versions of the above (usually sin/cos/tan/asin/acos/atan/atan2)

sin/cos are already partially implemented

list to be reviewed:
https://libre-soc.org/ztrans_proposal/

we also would like different accuracies, as well as different FP sizes.  specs to be identified.  also to use Dynamic PartitionedSignal at some point (although probably as a separate bugreport)

others to be added by editing this comment

unit tests to be included.  test methodology to be discussed.  formal proofs out of scope (separate bugreport)

https://dspguru.com/dsp/faqs/cordic/

implementation by dan gisselquist:
https://github.com/ZipCPU/cordic
Comment 1 Luke Kenneth Casson Leighton 2020-12-06 18:47:15 GMT
these are all easiest done with different types of CORDIC. they may be more *efficiently* done with advanced algorithms such as 3rd order polynomials and NR with huge lookup tables however CORDIC is pretty straightforward so we go with that as an initial base, see how far that gets us.

CORDIC has many different modes including linear, polar, and so on that cover pretty much everything based around Euler's equation e^(i pi) = -1
Comment 2 Jacob Lifshay 2021-05-09 19:40:32 BST
accuracy of sin GLSL function on Intel/AMD/NVidia GPUs:
https://community.khronos.org/t/builtin-math-function-execution-cost-issues-with-accuracy-of-builtins/75130/4

Both AMD and NVidia GPUs are waay more accurate than is required by Vulkan, another reason I think we shouldn't implement horribly inaccurate functions just because they technically meet the Vulkan spec.
Comment 3 Luke Kenneth Casson Leighton 2021-05-09 19:51:52 BST
my initial thoughts are to give users control over the accuracy level in some way.  some developers clearly want high accuracy, others will want speed.
Comment 4 Jacob Lifshay 2021-05-09 21:07:35 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> my initial thoughts are to give users control over the accuracy level in
> some way.  some developers clearly want high accuracy, others will want
> speed.

https://libre-soc.org/irclog/%23libre-soc.2021-05-09.log.html#t2021-05-09T21:03:59

ok, except iirc the programs that run on amd gpus (which have the highest accuracy) aren't any different (they don't have an option saying give me high/low accuracy) than the ones that run on e.g. intel gpus (lowest accuracy out of amd, nvidia, intel), so having options is fine but we'd have to always just pick the high accuracy one to meet developer expectations who are used to gpus that greatly exceed khronos's junk-tier minimum requirements

meaning it takes extra silicon to implement the low-accuracy variant that we can't use anyway
Comment 5 Luke Kenneth Casson Leighton 2021-05-09 22:49:26 BST
(In reply to Jacob Lifshay from comment #4)

> meaning it takes extra silicon to implement the low-accuracy variant that we
> can't use anyway

the cost of the gates is irrelevant [except for leakage current],
it doesn't come into the equation of satisfying both end-user
requirements.

that post clearly indicates that there are different user requirements,
where different hardware fails to meet both.

i will state it again: we may solve this by providing the *end users*
the option to choose the level of accuracy that they require.

what the Khronos Group demands can take a back seat.
Comment 6 Jacob Lifshay 2021-05-10 06:38:56 BST
(In reply to Luke Kenneth Casson Leighton from comment #5)
> (In reply to Jacob Lifshay from comment #4)
> 
> > meaning it takes extra silicon to implement the low-accuracy variant that we
> > can't use anyway
> 
> the cost of the gates is irrelevant [except for leakage current],
> it doesn't come into the equation of satisfying both end-user
> requirements.
> 
> that post clearly indicates that there are different user requirements,
> where different hardware fails to meet both.
> 
> i will state it again: we may solve this by providing the *end users*
> the option to choose the level of accuracy that they require.
> 
> what the Khronos Group demands can take a back seat.

Sounds good to me as long as we have a decently performant implementation with higher accuracy that is good enough to use by default.

The Vulkan spec's accuracy requirements are honestly on the level I'd expect for machine learning or f16, not f32. I think we should have the accuracy requirements we expect for a GPU-class implementation be closer to <= 4 ULP (iirc the OpenCL spec).

For CPU stuff, I'd expect <= 1 ULP (since absolutely correct outputs are sooo hard to get right)