FSGNJ is needed, 16/32/64-bit, all three versions: MV, ABS, NEG.
(In reply to Luke Kenneth Casson Leighton from comment #0) > FSGNJ is needed, 16/32/64-bit, all three versions: MV, ABS, NEG. note that fsgnj* can do more than just mv, abs, and neg -- we need to make sure that we implement more than just those pseudo-instructions the implementation seems utterly trivial, just wires a not gate and an xor gate
I'd like to take a stab at this, however I'm not sure where to put it. Should it be made a separate module in the ieee754 directory or should it go somewhere else (like part of an existing module?) Also, I've been looking through the FPU modules and I've noticed there are two "styles" of modules: - Pipelined - like fpmul/pipeline.py - State machine - like fpmul/fmul.py Since this module is only a couple of gates and can be combinatorial, I'd imagine it should probably have a similar interface to the state machine variety, but I'm not sure.
(In reply to Michael Nolan from comment #2) > Also, I've been looking through the FPU modules and I've noticed there are > two "styles" of modules: > > - Pipelined - like fpmul/pipeline.py > - State machine - like fpmul/fmul.py > > Since this module is only a couple of gates and can be combinatorial, I'd > imagine it should probably have a similar interface to the state machine > variety, but I'm not sure. From what I recall, the state-machine modules were added to serve as a reference for conversion to pipelined form. The RISC-V sign-manipulation instructions are simple enough that they can be a single-stage pipeline.
(In reply to Michael Nolan from comment #2) > I'd like to take a stab at this, cool! this is pretty easy, not rocket science. > however I'm not sure where to put it. start cookie-cut from e.g. the fclass one or the fcvt int to float one. > Should it be made a separate module in the ieee754 directory yes. > or should it go > somewhere else (like part of an existing module?) no, see hdl_workflow > Also, I've been looking through the FPU modules and I've noticed there are > two "styles" of modules: > > - Pipelined - like fpmul/pipeline.py > - State machine - like fpmul/fmul.py yes. right. i was experimenting. a lot. the idea was, FSM or pipeline, it's not your decision, you just create combinatorial building blocks and the user decides how to join them together. it worked, but got complicated and i left it for another day. leave FSM for now. > Since this module is only a couple of gates and can be combinatorial, I'd > imagine it should probably have a similar interface to the state machine > variety, but I'm not sure. you'll see on inspecting the (simpler) fcvts that it's mostly setup blurb. gimme sec let me find some links on git.libre-riscv.org and illustrate
(In reply to Jacob Lifshay from comment #3) > From what I recall, the state-machine modules were added to serve as a > reference for conversion to pipelined form. pretty much. i used them as stepping stones in a looong incremental conversion process, making sure there was always something to test at all times. it was a pig :) > The RISC-V sign-manipulation > instructions are simple enough that they can be a single-stage pipeline. yes.
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fcvt/int2float.py;h=fcf3e34cf3025b02995b76251d363834640b7deb;hb=HEAD right. copy that, aaall of it, call it fcvt.py or something in a new subdirectory ieee754/fcvt, rename as appropriate (global search replace). replace lines 32 to 119. lines 18 and 19 set up a 2 bit op comment, 0x3 == 0b00 / 0b01 / 0b10 for mv, abs, neg. https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fcvt/pipeline.py;h=5e6cc19dee7f98c971d1ea7e2266339b4bf4d2ec;hb=HEAD copy this except delete lines 117 onwards and search/replace FPCVTF2IntMuxInOut with the new fcvt class name. i am goung to have to think how the classes stack together (it's been a while) leave that with me an __init__.py is necessary. the test dir gets copied too, cooy just this file https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fcvt/test/test_fcvt_f2int_pipe.py;h=4e01d8c2bc6915a6465162eae48decdeb710645f;hb=HEAD now you should notice there, opwidth has to be set to 2 because, duh, you have 3 ops, 0b00 0b01 and 0b10 and to test them each you have to set opcode=.... the callback function in the unit test obviously has to change. if you do this: $ python3 >>> from sfpy import Float32 >>> x = Float32(1.0) >>> dir(x) you will get a list of functions, you're looking for bits (for mv), a sgn function and an abs function. those give you the function to call in the test converters see lines 10 to 68 all of which you replace with fsgn_mv_f32_to_f32 etc etc etc etc.
Thanks for the help! I have something working more or less, but I'd like to formally verify it too.
(In reply to Michael Nolan from comment #7) > Thanks for the help! I have something working more or less, cool that was quick. mind you i only allocated a small budget to it precisely because it was likely to be. i'd say commit and push it however i need your agreement (on-list) to the charter, first. if you can do that first, then i can add your ssh key, then push, then review, then approve, and then we can set you up an RfP. apologies it's quite a few hoops, i am delighted that someone else is happy to work on the FPU. > but I'd like to > formally verify it too. that's a little trickier as it's outside of everyone currently in the teams' experience. and also a separate bugreport. and a different Grant etc etc
> i need your agreement (on-list) to the charter, first. I (just now) replied to the HDL Workflow thread with my agreement. > that's a little trickier as it's outside of everyone currently in the teams' experience. I should probably have mentioned this in my intro post, but I have some experience with formal verification (the yosys flavor anyway). Provided the pipeline control signals don't screw things up too much, it should be pretty straightforward.
(In reply to Michael Nolan from comment #9) > > i need your agreement (on-list) to the charter, first. > > I (just now) replied to the HDL Workflow thread with my agreement. appreciated. > > that's a little trickier as it's outside of everyone currently in the teams' experience. > > I should probably have mentioned this in my intro post, but I have some > experience with formal verification (the yosys flavor anyway). Provided the > pipeline control signals don't screw things up too much, it should be pretty > straightforward. niiice. that'll come in real handy. you'll be interested to know that one of the pipelines uses nmigen FIFO, and that that does actually have a formal proof in nmigen as a unit test. it's worth taking a look at. there is a lot to do here which is why formal proofs has its own entire budget. let's keep this particular bugreport on topic though.
> let's keep this particular bugreport on topic though. Alrighty
https://git.libre-riscv.org/?p=ieee754fpu.git;a=commitdiff;h=2352155642c82227e24e8782c6de61601938aa56 that's funny, i watched the progression of the commits, i wondered how long it would take to work out you could index by -1 to get at the sign bit for arbitrary float length :) there are other classes in FPBase i think which calculate the mantissa and exponent offset and length etc. so that you don't have to hardcode those values. also if we want to FP8, FP80 or FP128 in the future no code has to change except a one-line addition. so, the last bit here: to check in the RISCV spec as to whether normalisation needs to occur, certainly for F.MV i know it doesn't however for other ops i am not sure. one way to check is to track down the equivalent code for rocket-chip.
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fsgnj/fsgnj.py;h=c43e180c0c55d6e12ee081a5b6e236aad6a1e554;hb=2352155642c82227e24e8782c6de61601938aa56 nice. have you done this before? :) some comments on lines 44 46 48 for each of the opcodes would be nice. and one around 42, "first do sign" or something then line 52, "then copy mantissa/exp unmodified" oh, also, line 41, Signals are all resetless. the reason is because they are combinatorial blocks, where the latches (pipelines) are reset. actually because of the way the 6600 works they don't need it. the 6600 DMs isolate garbage during init.
https://github.com/chipsalliance/rocket-chip/blob/master/src/main/scala/tile/FPU.scala#L514 class FPtoFP. what's going on there with widening-conversions, i have no idea, given that the spec specifically says that NaN payloads are preserved relevant part of spec: Floating-point to floating-point sign-injection instructions, FSGNJ.S, FSGNJN.S, and FSGNJX.S, produce a result that takes all bits except the sign bit from rs1. For FSGNJ, the result'ss sign bit is rs2'ss sign bit; for FSGNJN, the result'ss sign bit is the opposite of rs2'ss sign bit; and for FSGNJX, the sign bit is the XOR of the sign bits of rs1 and rs2. Sign-injection instructions do not set floating-point exception flags, nor do they canonicalize NaNs. Note, FSGNJ.S rx, ry, ry moves ry to rx (assembler pseudoinstruction FMV.S rx, ry); FSGNJN.S rx, ry, ry moves the negation of ry to rx (assembler pseudoinstruction FNEG.S rx, ry); and FSGNJX.S rx, ry, ry moves the absolute value of ry to rx (assembler pseudoinstruction FABS.S rx, ry).
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fsgnj/pipeline.py;h=63f3a8a089072db75affe5ccf5fea4faf90c42bc;hb=HEAD cool! you worked out how to remove the denormalisation phase already!
*click*, apologies michael: i just realised something. although Switch/Case statements ultimately map down to Mux() or in rare cases PMux, for the FPU code we can't use them.... or If() statements (basically anything with a "with" context). the reason is down to the deployment of the PartitionedSignal class, which will perform operations in parallel. to support m.Case(), m.If() etc. we would need to do a seeerious amount of coding, possibly even modifying nmigen itself. by keeping to Mux as a decision-maker plus basic arithmetic operators we can write a parallel-PartitionSignal-aware-Mux() plus parallel-aware arith and the job of converting all code to dynamic SIMD is a by-the-numbers process pretty much replacing Signal with PartitionedSignal across all FP classes. apologies i forgot to mention that.
> the reason is down to the deployment of the PartitionedSignal class, which > will perform operations in parallel. to support m.Case(), m.If() etc. we > would need to do a seeerious amount of coding, possibly even modifying > nmigen itself. So what you're saying is: If(), Switch() and friends are fine for modules that are strictly scalar, but will not work if the module is converted to SIMD. > apologies i forgot to mention that. All good
Created attachment 19 [details] screenshot of fsgnj
(In reply to Michael Nolan from comment #17) > > the reason is down to the deployment of the PartitionedSignal class, which > > will perform operations in parallel. to support m.Case(), m.If() etc. we > > would need to do a seeerious amount of coding, possibly even modifying > > nmigen itself. > > So what you're saying is: If(), Switch() and friends are fine for modules > that are strictly scalar, but will not work if the module is converted to > SIMD. correct. the partition mask will at some point be passed in through a Mix-In class such that there should be very little in the way of changes. > > apologies i forgot to mention that. > > All good btw if you want to see the actual gate-level graph, which i highly recommend getting into the habit of doing, install graphviz and xdot and use this: yosys > read_ilang test_fsgnj_something_something.il > show top or type "show <tab> <tab>" to get a list of modules. the one you want is fsgnj$4 you can see it here: http://bugs.libre-riscv.org/attachment.cgi?id=19 what you've written basically falls very neatly into the "acceptably small size range such that it's easy to see what's going on and also will be dead-easy to do the layout for using alliance/coriolis2" the block "proc group_0" is where the switch/case statement is, which needs replacing with some Mux()es. those will basically be: Mux(op[1], Mux(op[0], 0b11 calculation, 0b10 calculation), Mux(op[0], 0b01 calculation, 0b00 calculation) ) something like that, where 0b11 will be "default" err i think.
> correct. the partition mask will at some point be passed in through a > Mix-In class such that there should be very little in the way of changes. That's pretty neat! > the block "proc group_0" is where the switch/case statement is, which > needs replacing with some Mux()es. those will basically be: > Mux(op[1], > Mux(op[0], 0b11 calculation, 0b10 calculation), > Mux(op[0], 0b01 calculation, 0b00 calculation) > ) I added something like this on my latest commit. However I defined the behavior of opcode 0b11 to be the same as 0b10 (Saving 1 mux in the process). Since the RISCV spec doesn't define the behavior of opcode 0b11, this seems safe to do (right? Rocket does it)
Created attachment 20 [details] yosys screenshot
(In reply to Michael Nolan from comment #20) > > correct. the partition mask will at some point be passed in through a > > Mix-In class such that there should be very little in the way of changes. > > That's pretty neat! yeah :) testing's going to be a pig. far more cases... *sigh* :) > > the block "proc group_0" is where the switch/case statement is, which > > needs replacing with some Mux()es. those will basically be: > > > Mux(op[1], > > Mux(op[0], 0b11 calculation, 0b10 calculation), > > Mux(op[0], 0b01 calculation, 0b00 calculation) > > ) > > I added something like this on my latest commit. However I defined the > behavior of opcode 0b11 to be the same as 0b10 (Saving 1 mux in the > process). Since the RISCV spec doesn't define the behavior of opcode 0b11, > this seems safe to do (right? Rocket does it) yehyeh makes perfect sense to me. okaaay, so this gets interesting. take a look at the second screenshot (after doing a git pull and re-run the unit tests you can generate it yourself if you prefer). http://bugs.libre-riscv.org/attachment.cgi?id=20 can you see that there's some spurious inexplicable maths going on, there, compared to the first graphviz? http://bugs.libre-riscv.org/attachment.cgi?id=19 that extra "stuff" is because inside the FPNumDecode, for convenience i got it to subtract the "offset" that's part of an exponent, and the "create()" function adds it back on. so if you do yosys "show sc_decode_a" you'll find a whole stack of extra stuff, none of which is actually necessary. we know it's not necessary because the previous version - without FPNumDecode - worked perfectly well and pass all unit tests. this is why i emphasise in https://libre-riscv.org/HDL_workflow/ that it's essential to run the yosys "show" command as part of the workflow. you get to see the actual graph - the actual gates created - and go "oink?? what's that doing in there??" :) second (minor) thing, you'd created a Signal "sign" and then had immediately overwritten it with a python variable *also* named "sign" which was the result of the Mux(). this seems to be a common source of confusion in nmigen, i've seen this done before: a python variable is created and then overwritten. remember that nmigen creates ASTs (abstract syntax trees) and so anything added to m.d.comb (or m.d.sync) is completely unused. third thing: again, which i spotted only after looking at the graphviz: Mux(opcode[0], ~b.s, b.s) was extremely puzzling in the graph however is quite logical in the code. i thought about it and realised that the Mux could be replaced with an XOR gate. sign = opcode[0] ^ ~b.s however that's really not very obvious so i provided a (terse) explanatory comment. i'd suggest getting rid of FPNumDecode, and returning to b[-1] etc. can i leave that with you?
> okaaay, so this gets interesting. take a look at the second > screenshot (after doing a git pull and re-run the unit tests > you can generate it yourself if you prefer). > > http://bugs.libre-riscv.org/attachment.cgi?id=20 > > can you see that there's some spurious inexplicable maths > going on, there, compared to the first graphviz? > > http://bugs.libre-riscv.org/attachment.cgi?id=19 > > that extra "stuff" is because inside the FPNumDecode, for > convenience i got it to subtract the "offset" that's part > of an exponent, and the "create()" function adds it back on. > > so if you do yosys "show sc_decode_a" you'll find a whole stack > of extra stuff, none of which is actually necessary. > > we know it's not necessary because the previous version - without > FPNumDecode - worked perfectly well and pass all unit tests. > > this is why i emphasise in https://libre-riscv.org/HDL_workflow/ > that it's essential to run the yosys "show" command as part of the > workflow. you get to see the actual graph - the actual gates > created - and go "oink?? what's that doing in there??" I'll admit, I did run the yosys show, but was only looking at the Mux part, and didn't notice that extra add. I'm a little surprised that part doesn't get optimized away (I played around with flattening fsignj and running yosys' optimizations on it). I wonder if it's because FPNumDecode adds some extra bits to the exponant when it decodes it. > > second (minor) thing, you'd created a Signal "sign" and then had > immediately overwritten it with a python variable *also* named > "sign" which was the result of the Mux(). > > this seems to be a common source of confusion in nmigen, i've seen > this done before: a python variable is created and then overwritten. > remember that nmigen creates ASTs (abstract syntax trees) and so > anything added to m.d.comb (or m.d.sync) is completely unused. This was why I was looking at the graph to see if the mux was actually there. Though it and https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpmul/specialcases.py;h=066d2b9f0fc15dd153cb1c7b262901636b7fe344;hb=HEAD (line 88) seem to work ok, is that because the result variable is added to the nmigen ast in the "m.d.comb +=" > third thing: again, which i spotted only after looking at the > graphviz: Mux(opcode[0], ~b.s, b.s) was extremely puzzling > in the graph however is quite logical in the code. > > i thought about it and realised that the Mux could be replaced with > an XOR gate. sign = opcode[0] ^ ~b.s > > however that's really not very obvious so i provided a (terse) > explanatory comment. Note, this should probably read (and does in fsgnj.py) as `sign = opcode[0] ^ b.s` This doesn't seem any more obvious in the graph to me than the Mux and invert so idk. > i'd suggest getting rid of FPNumDecode, and returning to b[-1] etc. > can i leave that with you? Sure Edit: Whoops, looks like I replied to the mailing list post, not the bugzilla one
> second (minor) thing, you'd created a Signal "sign" and then had > immediately overwritten it with a python variable *also* named > "sign" which was the result of the Mux(). Oh, I see what you mean now. sign = Signal() # 1 sign = Mux(op[0],...) # 2 sign = Mux(op[1],...) # 3 I thought you were referring to lines 2 and 3 overwriting sign, not line 1 being overwritten.