564 – add SV variant of fcvt to deal with elwidth differences in OpenPOWER FP scalar formats

Bug 564 - add SV variant of fcvt to deal with elwidth differences in OpenPOWER FP scalar formats

Summary: add SV variant of fcvt to deal with elwidth differences in OpenPOWER FP scala...

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Specification (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	213
	Show dependency tree / graph

Reported:	2020-12-31 16:54 GMT by Luke Kenneth Casson Leighton
Modified:	2022-09-30 21:02 BST (History)
CC List:	2 users (show)

See Also:	560
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2020-12-31 16:54:36 GMT

https://bugs.libre-soc.org/show_bug.cgi?id=560#c32

this one is... annoying/tedious/necessary.

elwidth overrides when srcwid!=destwid are such a performance killer due to lane crossing that it is better to perform an "in-advance conversion" to make the bitwidth the same across src and dest than it is to do lane-crossing.

in addition, OpenPOWER Scalar FP32 fits across the bits of a FP64 to make it look as if it was actually an FP64.

both RVV and VSX perform fcvt conversion such that packed FP32 is easy and routine.

fcvt capability is therefore required somehow.  the most sensible method is adding an explicit opcode, although there are other methods.

one interesting option for fcvt is to also combine it with fclass (storing the analysis bits in CR1) when Rc=1

Comment 1 Luke Kenneth Casson Leighton 2020-12-31 17:02:46 GMT

xscvdphp

p429 v3.0B

format of instructions around p548.

here is the location where the special-casing should be performed, jacob, but not by way of being part of the SV loop, but by this particular operation explicitly being encoded and defined as:

* input formats (src) are DEFINED as being OpenPOWER bit-spread at the src elwidth (defaults to FP64)
* output formats (dest) are DEFINED as being "compacted and in the sensible sane way".

and vice-versa.

these both would be a null-operation (fmv) when srcwid == destwid.

it *might* be possible, with some careful analysis, to allow for fmv itself to perform this conversion process.

Comment 2 Luke Kenneth Casson Leighton 2020-12-31 19:11:08 GMT

fmr p150 v3.0B 4.6.5

my feeling is that it would be reasonable to have these perform fcvt between the src elwidth and dest elwidth, such that followup FP operations that were also elwidth overridden had ("understood") the exact same FP format.

which brings us to an interesting point: what the heck does running single-precision FP ops on elwidth=32 even mean??

single precision FP Ops on elwidth=default means "do the op @ FP32 but distribute the bits across FP64"

my feeling is that this behaviour should be preserved at lower elwidth.

i.e. "do the op @ FP16 but distribute the bits across FP32".

i.e. single precision ops is redefined to be "do the op at half the precision"

fascinatingly if this is followed and the dest elwidth is *also* FP16 then there is a way to get faster computation even when the src elwidth is FP32 formatted.