an IEEE754 FP "multiply" pipeline is needed, for FP16/32/64.
also needed is the optional ability to specify how many stages
the actual multiplication is to take
FP16 mul pipeline bug:
* 0xe7bb 0x81ce 0x2afa (returns 0x2af9)
* 0x113 0xf569 0xb5d0 (returns 0xb5ce)
found source of inaccuracy: alignment (pre-normalisation) of a and b
were entirely misssing!
unit tests pass, ran several tens of thousands of tests on FP16, FP32