I have an idea for how to handle XLEN for FP: currently, for FP, XLEN is poorly-defined for BFP16 and BFloat16, since we currently need XLEN to both match element size and allow distinguishing between BFP16 and BFloat16, which are conflicting requirements since they're both 16-bit types. Therefore, I think we should introduce 2 new variables like XLEN that cleanly specify which types should be used for FP and FP Single instructions, allowing XLEN to only convey type size. FTYPE specifies which format is used for FP instructions, as well as the in-register format for FP Single instructions. FSTYPE specified which format is used for computations in FP Single instructions. | SVP64 elwid | Int XLEN | FP XLEN | FTYPE | FSTYPE | Notes | |-------------|----------|---------|----------|--------|----------------| | 00 | 64 | 64 | BFP64 | BFP32 | DEFAULT values | | 01 | 32 | 32 | BFP32 | BFP16 | | | 10 | 16 | 16 | BFP16 | - | | | 11 | 8 | 16 | BFloat16 | - | | Additionally, pseudocode will, instead of using bfp64_* or bfp32_* pseudocode functions directly, use new f_* and fs_* pseudocode functions that switch on F[S]TYPE and call the appropriate functions. e.g.: fatan2s pseudocode becomes: FRT <- DOUBLE(fs_ATAN2(SINGLE(FRA), SINGLE(FRB))) fatan2 pseudocode becomes: FRT <- f_ATAN2(FRA, FRB) This still leaves the issue of what to set XLEN to for instructions like ctfpr that are both integer and fp operations, I had proposed having FLEN for FP and XLEN for integer, but that was rejected (maybe reconsider?).
(In reply to Jacob Lifshay from comment #0) > I have an idea for how to handle XLEN for FP: > currently, for FP, XLEN is poorly-defined for BFP16 and BFloat16, since we > currently need XLEN to both match element size and allow distinguishing > between BFP16 and BFloat16, which are conflicting requirements since they're > both 16-bit types. a function helps there. or, a global variable inserted into the namespace (and spec) > This still leaves the issue of what to set XLEN to for instructions like > ctfpr that are both integer and fp operations, I had proposed having FLEN > for FP and XLEN for integer, but that was rejected (maybe reconsider?). it's the one crossover point that makes the different elwidths a bit hairy. INT ops can get away with overcalculating then truncating (dropping bits) but FP ops can't. i really wanted to avoid two XLENs. fp-int converts i think make them unavoidabe but i am sure there is a workaround. good to record: can we leave detailed discussions until much later.
another closely related issue, fp ops with different src and dest elwidth end up double-rounding according to the current spec, which is bad.
(In reply to Jacob Lifshay from comment #2) > another closely related issue, fp ops with different src and dest elwidth > end up double-rounding according to the current spec, which is bad. good reason for programmers to avoid doing that by not using different widths, then, isn't it? we are not here to "nanny" people [making hardware more complex in order to "protect" them from shooting themselves in the foot] one to think through in the future. not now.
(In reply to Luke Kenneth Casson Leighton from comment #3) > (In reply to Jacob Lifshay from comment #2) > > another closely related issue, fp ops with different src and dest elwidth > > end up double-rounding according to the current spec, which is bad. > > good reason for programmers to avoid doing that by not using > different widths, then, isn't it? > > we are not here to "nanny" people [making hardware more complex > in order to "protect" them from shooting themselves in the foot] well, now that I think of it, we may be making the hardware more complex by *not* avoiding double-rounding. e.g.: sv.fadds/sw=f64/dw=f32 has to do: convert f64-in-f32 sources to internal format add sources round result to f32 (as expensive as converting to f32 due to denormals) convert f32 to internal format round to f16 (as expensive as converting to f16) convert f16 to f32 if we avoided double rounding, it would be: convert f64-in-f32 sources to internal format add sources round result to f16 (as expensive as converting to f16) convert f16 to f32 Note that when the inputs are the same type or strictly smaller than the outputs, then there isn't a problem, because the extra conversions on the inputs are exact and so we can just convert straight to the internal format instead of doing two conversions. So, what I think we should do about it: I think we should just define as undefined-behavior or trap all FP operations where the output type is not the same type as the intermediate type or the input conversion is not exact. This leaves us free to define better semantics later as another ISA extension without being a breaking change for SW. e.g.: * sv.fadd/sw=f32/dw=f64 is defined since both the output and intermediate types are f64. * sv.fadd/sw=f64/dw=f32 is UB or trap since the output type (f32) isn't the intermediate type (f64). * sv.ctfpr/sw=64/dw=f16 is defined since the output defines the intermediate type since the input isn't FP. * sv.fadds/sw=f32/dw=f64 is defined since both the output and intermediate types are f64. * sv.fadd/sw=f64/dw=f16 is UB or trap since the output type (f16) isn't the intermediate type (f64). * sv.fadd/sw=f16/dw=bf16 is UB or trap since for the intermediate type being: * f16 -- the output type doesn't match the intermediate type * bf16 -- the input conversion isn't exact (f16 has more mantissa bits)