Bug 564 - add SV variant of fcvt to deal with elwidth differences in OpenPOWER FP scalar formats
Summary: add SV variant of fcvt to deal with elwidth differences in OpenPOWER FP scala...
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
Depends on:
Blocks: 213
  Show dependency treegraph
Reported: 2020-12-31 16:54 GMT by Luke Kenneth Casson Leighton
Modified: 2020-12-31 19:11 GMT (History)
2 users (show)

See Also:
NLnet milestone: NLNet.2019.10.Standards
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation: 213
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-12-31 16:54:36 GMT

this one is... annoying/tedious/necessary.

elwidth overrides when srcwid!=destwid are such a performance killer due to lane crossing that it is better to perform an "in-advance conversion" to make the bitwidth the same across src and dest than it is to do lane-crossing.

in addition, OpenPOWER Scalar FP32 fits across the bits of a FP64 to make it look as if it was actually an FP64.

both RVV and VSX perform fcvt conversion such that packed FP32 is easy and routine.

fcvt capability is therefore required somehow.  the most sensible method is adding an explicit opcode, although there are other methods.

one interesting option for fcvt is to also combine it with fclass (storing the analysis bits in CR1) when Rc=1
Comment 1 Luke Kenneth Casson Leighton 2020-12-31 17:02:46 GMT

p429 v3.0B

format of instructions around p548.

here is the location where the special-casing should be performed, jacob, but not by way of being part of the SV loop, but by this particular operation explicitly being encoded and defined as:

* input formats (src) are DEFINED as being OpenPOWER bit-spread at the src elwidth (defaults to FP64)
* output formats (dest) are DEFINED as being "compacted and in the sensible sane way".

and vice-versa.

these both would be a null-operation (fmv) when srcwid == destwid.

it *might* be possible, with some careful analysis, to allow for fmv itself to perform this conversion process.
Comment 2 Luke Kenneth Casson Leighton 2020-12-31 19:11:08 GMT
fmr p150 v3.0B 4.6.5

my feeling is that it would be reasonable to have these perform fcvt between the src elwidth and dest elwidth, such that followup FP operations that were also elwidth overridden had ("understood") the exact same FP format.

which brings us to an interesting point: what the heck does running single-precision FP ops on elwidth=32 even mean??

single precision FP Ops on elwidth=default means "do the op @ FP32 but distribute the bits across FP64"

my feeling is that this behaviour should be preserved at lower elwidth.

i.e. "do the op @ FP16 but distribute the bits across FP32".

i.e. single precision ops is redefined to be "do the op at half the precision"

fascinatingly if this is followed and the dest elwidth is *also* FP16 then there is a way to get faster computation even when the src elwidth is FP32 formatted.