Bug 923 - we missed some important fp ops
Summary: we missed some important fp ops
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- major
Assignee: Jacob Lifshay
URL: https://libre-soc.org/openpower/trans...
Depends on:
Blocks: 1027 899
  Show dependency treegraph
 
Reported: 2022-09-08 08:50 BST by Jacob Lifshay
Modified: 2023-04-06 10:17 BST (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2022-09-08 08:50:21 BST
reading through the opencl list of ops, I realized we forgot to add some fp ops:
fmax
fmin
fmod
maxmag
minmag
remainder

out of those, imho we need fmin/fmax (all of the several variants), it would be quite nice to have fmod/remainder and minmag/maxmag (all of the several variants).

we also forgot erf/erfc/lgamma but they are uncommon enough that they should just be left to software implementations.

the min/max/minmag/maxmag variants:
* the minNum/maxNum functions from ieee 754-2008 match the behavior of the VSX xsmindp operation and avx512 vragess (maybe removed?) and armv8.2 fminnm. They are used in risc-v fmin.s for <= v2.1 of the f extension. They are unspecified for comparing signed zeros, we will want to treat +0 as greater than -0 to match a lot of other implementations.

* the minimumNumber/maximumNumber functions from ieee 754-2019 match the behavior of java and the VMX vminfp operation. They are used in risc-v fmin.s for >= v2.2 of the f extension.

* the minimum/maximum functions from ieee 754-2019 are basically the recommended default going forward, but a lot of programming languages don't use them yet for backward compatibility reasons.

An explanation of why ieee 754 replaced minNum/maxNum with minimum/maximum/minimumNumber/maximumNumber:
https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/background/minNum_maxNum_Removal_Demotion_v3.pdf
Comment 1 Jacob Lifshay 2022-09-08 08:58:40 BST
technically there's also x86's maxss operations, they implement the C function:
float f(float a, float b) {
    return a < b ? a : b;
}
specifically if either input is a NaN or if both inputs are equal or if both inputs are zero of either sign they always return b. They never convert a signalling NaN to a quiet Nan.

If we also add that, it would fill out the min/max/minmag/maxmag variants to 8, fitting nicely in a 3-bit mode field. or if we decide we don't want minmag/maxmag, it would fill out the variants to 4, fitting in a 2-bit mode field.
Comment 2 Luke Kenneth Casson Leighton 2022-09-08 13:00:17 BST
(In reply to Jacob Lifshay from comment #1)
> technically there's also x86's maxss operations, they implement the C
> function:
> float f(float a, float b) {
>     return a < b ? a : b;
> }

as best i can tell that's fsel - p168 v3.0B 4.6.9

fsel  FRT,FRA,FRC,FRB (Rc=0)
fsel. FRT,FRA,FRC,FRB (Rc=1)

if (FRA) >= 0.0 then FRT <- (FRC)
else FRT <- (FRB)

The floating-point operand in register FRA is compared
to the value zero. If the operand is greater than or equal
to zero, register FRT is set to the contents of register
FRC. If the operand is less than zero or is a NaN, regis-
ter FRT is set to the contents of register FRB. The com-
parison ignores the sign of zero (i.e., regards +0 as
equal to -0).
Comment 3 Jacob Lifshay 2022-09-08 13:16:17 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)
> (In reply to Jacob Lifshay from comment #1)
> > technically there's also x86's maxss operations, they implement the C
> > function:
> > float f(float a, float b) {
> >     return a < b ? a : b;
> > }
> 
> as best i can tell that's fsel - p168 v3.0B 4.6.9

it isn't actually, x86 minss/maxss compare the inputs with each other, not against zero.
Comment 4 Jacob Lifshay 2022-09-08 13:18:22 BST
(In reply to Jacob Lifshay from comment #0)
> reading through the opencl list of ops, I realized we forgot to add some fp
> ops:
> fmax
> fmin
> fmod
> maxmag
> minmag
> remainder

i'd like to add these new ops as part of doing the initial implementation of fptrans, there's space.

what do you think?
Comment 5 Luke Kenneth Casson Leighton 2022-09-08 22:08:46 BST
(In reply to Jacob Lifshay from comment #3)
> (In reply to Luke Kenneth Casson Leighton from comment #2)
> > (In reply to Jacob Lifshay from comment #1)
> > > technically there's also x86's maxss operations, they implement the C
> > > function:
> > > float f(float a, float b) {
> > >     return a < b ? a : b;
> > > }
> > 
> > as best i can tell that's fsel - p168 v3.0B 4.6.9
> 
> it isn't actually, x86 minss/maxss compare the inputs with each other, not
> against zero.

ahh yes.  sigh.  ok. let's take a look and see if the others are there.(In reply to Jacob Lifshay from comment #4)
> (In reply to Jacob Lifshay from comment #0)
> > reading through the opencl list of ops, I realized we forgot to add some fp
> > ops:
> > fmax
> > fmin
> > fmod
> > maxmag
> > minmag
> > remainder
> 
> i'd like to add these new ops as part of doing the initial implementation of
> fptrans, there's space.
> 
> what do you think?

yes good idea, search for them first though, and in VSX as well,
the section has to say "this is in VSX as {vxxxxxx} but not in scalar"
Comment 6 Luke Kenneth Casson Leighton 2022-09-08 22:19:14 BST
hang on...
p181 v3.1 there's notes

if a >= b then x <- y  fsub fs,fa,fb
else x <- z            fsel fx,fs,fy,fz

so no, we can't add fmaxss or fminss,
they'll get rejected because of the
macro-fusion advice.

the only reason to add fminss/fmaxss/fmin/fmax
would be because updating to IEEE754-2019. which
is probably good enough.


----

https://stackoverflow.com/questions/30618991/simd-minmag-and-maxmag

minmag(a,b) = |a|<|b| ? a : b
maxmag(a,b) = |a|>|b| ? a : b

not seeing anything like this - good idea to add them.

---

fmod

https://codebrowser.dev/glibc/glibc/sysdeps/ieee754/flt-32/e_fmodf.c.html

blerk!  that's awful! (i mean, the software).  yep, good call.

----

remainder

https://stackoverflow.com/questions/26671975/why-do-we-need-ieee-754-remainder

blerk. i don;t get it. but i can understand other people do :)

----

yep all good here.
Comment 7 Jacob Lifshay 2022-09-09 09:25:08 BST
I added all the ops to the spec, I'll leave opcode allocation to #899:

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=3e363081ee0142d6948d5b6c66523d833d0a7711

I found x86-style min/max ops in VSX (xsmincdp), so I'll take that as sufficient justification to add scalar ops. I named them fminc/fmaxc to mirror the VSX ops and to avoid sticking x86 in the opcode mnemonics.