transcendental functions are needed for the GPU, including: * exp * log * log1p * expm1 * exp2 * log2 * atan2 * acos * asin * atan * sin * cos * sincos * cosh * sinh * tanh * asinh * acosh * atanh and appropriate pi-multiplied versions of the above (usually sin/cos/tan/asin/acos/atan/atan2) sin/cos are already partially implemented list to be reviewed: https://libre-soc.org/ztrans_proposal/ we also would like different accuracies, as well as different FP sizes. specs to be identified. also to use Dynamic PartitionedSignal at some point (although probably as a separate bugreport) others to be added by editing this comment unit tests to be included. test methodology to be discussed. formal proofs out of scope (separate bugreport) https://dspguru.com/dsp/faqs/cordic/ implementation by dan gisselquist: https://github.com/ZipCPU/cordic
these are all easiest done with different types of CORDIC. they may be more *efficiently* done with advanced algorithms such as 3rd order polynomials and NR with huge lookup tables however CORDIC is pretty straightforward so we go with that as an initial base, see how far that gets us. CORDIC has many different modes including linear, polar, and so on that cover pretty much everything based around Euler's equation e^(i pi) = -1
accuracy of sin GLSL function on Intel/AMD/NVidia GPUs: https://community.khronos.org/t/builtin-math-function-execution-cost-issues-with-accuracy-of-builtins/75130/4 Both AMD and NVidia GPUs are waay more accurate than is required by Vulkan, another reason I think we shouldn't implement horribly inaccurate functions just because they technically meet the Vulkan spec.
my initial thoughts are to give users control over the accuracy level in some way. some developers clearly want high accuracy, others will want speed.
(In reply to Luke Kenneth Casson Leighton from comment #3) > my initial thoughts are to give users control over the accuracy level in > some way. some developers clearly want high accuracy, others will want > speed. https://libre-soc.org/irclog/%23libre-soc.2021-05-09.log.html#t2021-05-09T21:03:59 ok, except iirc the programs that run on amd gpus (which have the highest accuracy) aren't any different (they don't have an option saying give me high/low accuracy) than the ones that run on e.g. intel gpus (lowest accuracy out of amd, nvidia, intel), so having options is fine but we'd have to always just pick the high accuracy one to meet developer expectations who are used to gpus that greatly exceed khronos's junk-tier minimum requirements meaning it takes extra silicon to implement the low-accuracy variant that we can't use anyway
(In reply to Jacob Lifshay from comment #4) > meaning it takes extra silicon to implement the low-accuracy variant that we > can't use anyway the cost of the gates is irrelevant [except for leakage current], it doesn't come into the equation of satisfying both end-user requirements. that post clearly indicates that there are different user requirements, where different hardware fails to meet both. i will state it again: we may solve this by providing the *end users* the option to choose the level of accuracy that they require. what the Khronos Group demands can take a back seat.
(In reply to Luke Kenneth Casson Leighton from comment #5) > (In reply to Jacob Lifshay from comment #4) > > > meaning it takes extra silicon to implement the low-accuracy variant that we > > can't use anyway > > the cost of the gates is irrelevant [except for leakage current], > it doesn't come into the equation of satisfying both end-user > requirements. > > that post clearly indicates that there are different user requirements, > where different hardware fails to meet both. > > i will state it again: we may solve this by providing the *end users* > the option to choose the level of accuracy that they require. > > what the Khronos Group demands can take a back seat. Sounds good to me as long as we have a decently performant implementation with higher accuracy that is good enough to use by default. The Vulkan spec's accuracy requirements are honestly on the level I'd expect for machine learning or f16, not f32. I think we should have the accuracy requirements we expect for a GPU-class implementation be closer to <= 4 ULP (iirc the OpenCL spec). For CPU stuff, I'd expect <= 1 ULP (since absolutely correct outputs are sooo hard to get right)
I found what looks like a hw transcendental function generator written in python: https://github.com/metalibm/metalibm/tree/main/metalibm_hw_blocks (metalibm is the successor to crlibm, a correctly-rounded libm)
(In reply to Jacob Lifshay from comment #7) > I found what looks like a hw transcendental function generator written in > python: > https://github.com/metalibm/metalibm/tree/main/metalibm_hw_blocks > (metalibm is the successor to crlibm, a correctly-rounded libm) fer real? woowww, respect. there's even a VHDL backend. holy cow. https://github.com/metalibm/metalibm/blob/main/metalibm_core/code_generation/vhdl_code_generator.py am i seeing this right?? it would be possible to simply... auto-generate an entire swathe of HDL algorithms automatically??