reading through the opencl list of ops, I realized we forgot to add some fp ops: fmax fmin fmod maxmag minmag remainder out of those, imho we need fmin/fmax (all of the several variants), it would be quite nice to have fmod/remainder and minmag/maxmag (all of the several variants). we also forgot erf/erfc/lgamma but they are uncommon enough that they should just be left to software implementations. the min/max/minmag/maxmag variants: * the minNum/maxNum functions from ieee 754-2008 match the behavior of the VSX xsmindp operation and avx512 vragess (maybe removed?) and armv8.2 fminnm. They are used in risc-v fmin.s for <= v2.1 of the f extension. They are unspecified for comparing signed zeros, we will want to treat +0 as greater than -0 to match a lot of other implementations. * the minimumNumber/maximumNumber functions from ieee 754-2019 match the behavior of java and the VMX vminfp operation. They are used in risc-v fmin.s for >= v2.2 of the f extension. * the minimum/maximum functions from ieee 754-2019 are basically the recommended default going forward, but a lot of programming languages don't use them yet for backward compatibility reasons. An explanation of why ieee 754 replaced minNum/maxNum with minimum/maximum/minimumNumber/maximumNumber: https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf
technically there's also x86's maxss operations, they implement the C function: float f(float a, float b) { return a < b ? a : b; } specifically if either input is a NaN or if both inputs are equal or if both inputs are zero of either sign they always return b. They never convert a signalling NaN to a quiet Nan. If we also add that, it would fill out the min/max/minmag/maxmag variants to 8, fitting nicely in a 3-bit mode field. or if we decide we don't want minmag/maxmag, it would fill out the variants to 4, fitting in a 2-bit mode field.
(In reply to Jacob Lifshay from comment #1) > technically there's also x86's maxss operations, they implement the C > function: > float f(float a, float b) { > return a < b ? a : b; > } as best i can tell that's fsel - p168 v3.0B 4.6.9 fsel FRT,FRA,FRC,FRB (Rc=0) fsel. FRT,FRA,FRC,FRB (Rc=1) if (FRA) >= 0.0 then FRT <- (FRC) else FRT <- (FRB) The floating-point operand in register FRA is compared to the value zero. If the operand is greater than or equal to zero, register FRT is set to the contents of register FRC. If the operand is less than zero or is a NaN, regis- ter FRT is set to the contents of register FRB. The com- parison ignores the sign of zero (i.e., regards +0 as equal to -0).
(In reply to Luke Kenneth Casson Leighton from comment #2) > (In reply to Jacob Lifshay from comment #1) > > technically there's also x86's maxss operations, they implement the C > > function: > > float f(float a, float b) { > > return a < b ? a : b; > > } > > as best i can tell that's fsel - p168 v3.0B 4.6.9 it isn't actually, x86 minss/maxss compare the inputs with each other, not against zero.
(In reply to Jacob Lifshay from comment #0) > reading through the opencl list of ops, I realized we forgot to add some fp > ops: > fmax > fmin > fmod > maxmag > minmag > remainder i'd like to add these new ops as part of doing the initial implementation of fptrans, there's space. what do you think?
(In reply to Jacob Lifshay from comment #3) > (In reply to Luke Kenneth Casson Leighton from comment #2) > > (In reply to Jacob Lifshay from comment #1) > > > technically there's also x86's maxss operations, they implement the C > > > function: > > > float f(float a, float b) { > > > return a < b ? a : b; > > > } > > > > as best i can tell that's fsel - p168 v3.0B 4.6.9 > > it isn't actually, x86 minss/maxss compare the inputs with each other, not > against zero. ahh yes. sigh. ok. let's take a look and see if the others are there.(In reply to Jacob Lifshay from comment #4) > (In reply to Jacob Lifshay from comment #0) > > reading through the opencl list of ops, I realized we forgot to add some fp > > ops: > > fmax > > fmin > > fmod > > maxmag > > minmag > > remainder > > i'd like to add these new ops as part of doing the initial implementation of > fptrans, there's space. > > what do you think? yes good idea, search for them first though, and in VSX as well, the section has to say "this is in VSX as {vxxxxxx} but not in scalar"
hang on... p181 v3.1 there's notes if a >= b then x <- y fsub fs,fa,fb else x <- z fsel fx,fs,fy,fz so no, we can't add fmaxss or fminss, they'll get rejected because of the macro-fusion advice. the only reason to add fminss/fmaxss/fmin/fmax would be because updating to IEEE754-2019. which is probably good enough. ---- https://stackoverflow.com/questions/30618991/simd-minmag-and-maxmag minmag(a,b) = |a|<|b| ? a : b maxmag(a,b) = |a|>|b| ? a : b not seeing anything like this - good idea to add them. --- fmod https://codebrowser.dev/glibc/glibc/sysdeps/ieee754/flt-32/e_fmodf.c.html blerk! that's awful! (i mean, the software). yep, good call. ---- remainder https://stackoverflow.com/questions/26671975/why-do-we-need-ieee-754-remainder blerk. i don;t get it. but i can understand other people do :) ---- yep all good here.
I added all the ops to the spec, I'll leave opcode allocation to #899: https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=3e363081ee0142d6948d5b6c66523d833d0a7711 I found x86-style min/max ops in VSX (xsmincdp), so I'll take that as sufficient justification to add scalar ops. I named them fminc/fmaxc to mirror the VSX ops and to avoid sticking x86 in the opcode mnemonics.