this example does not compute overflow correctly in div output_stage.py: lst = ["divw. 3, 1, 2"] initial_regs = [0] * 32 initial_regs[1] = 0x61c1cc3b80f2a6af initial_regs[2] = 0x9dc66a7622c32bc0 run fu/div/test/test_pipe_caller.py it is test_6_regression()
jacob can you take a look at this one, there's something odd and i can't work it out commit f759f42491c04c7d28401b9d08aa9b0fa0bc7f7d (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Jul 10 00:18:53 2020 +0100 add regression test for div overflow case see https://bugs.libre-soc.org/show_bug.cgi?id=425
I ran that test after fixing some breakage from recent changes in nmigen. It passed. Could you point out in more detail what's broken? I ran it with: python src/soc/fu/div/test/test_pipe_caller.py DivTestCase.test_6_regression
The results I get from running it using power-instruction-analyzer's model is: >>> import power_instruction_analyzer as pia >>> print(pia.divw_(pia.InstructionInput(ra=0x61c1cc3b80f2a6af,rb=0x9dc66a7622c32bc0,rc=0))) {"rt":"0xFFFFFFFD","cr0":{"lt":false,"gt":true,"eq":false,"so":false}}
(In reply to Jacob Lifshay from comment #2) > I ran that test after fixing some breakage from recent changes in nmigen. > > It passed. Could you point out in more detail what's broken? > > I ran it with: > python src/soc/fu/div/test/test_pipe_caller.py DivTestCase.test_6_regression Aha, I was using DivTestCase instead of DIVTestCase. oops
(In reply to Jacob Lifshay from comment #4) > (In reply to Jacob Lifshay from comment #2) > > I ran that test after fixing some breakage from recent changes in nmigen. > > > > It passed. Could you point out in more detail what's broken? > > > > I ran it with: > > python src/soc/fu/div/test/test_pipe_caller.py DivTestCase.test_6_regression > > Aha, I was using DivTestCase instead of DIVTestCase. oops Well, I got the test to fail, the problem is either the pipeline is broken to the point where ra and rb in DivSetupStage and lots of other places are always 0, or nmigen's simulator is broken where it writes zeros to random wires in the .vcd file. I'd have to figure out where to add print statements for debugging or something, since there are so many layers of abstraction that I'm a bit lost. I'm not really sure how to write a test case that just sets the inputs of the pipeline to values and runs it without needing all the extra stuff for handling instruction cancellation, matching against qemu, or matching against the model from the Power spec.
(In reply to Jacob Lifshay from comment > Well, I got the test to fail, the problem is either the pipeline is broken > to the point where ra and rb in DivSetupStage and lots of other places are > always 0, or nmigen's simulator is broken where it writes zeros to random > wires in the .vcd file. gtkwave unfortunately has bugs. > I'd have to figure out where to add print statements for debugging or > something, since there are so many layers of abstraction that I'm a bit lost. a FSM would be one hell of a lot aimpler. > I'm not really sure how to write a test case that just sets the inputs of > the pipeline to values and runs it without needing all the extra stuff for > handling instruction cancellation, matching against qemu, or matching > against the model from the Power spec. the result 0xfffffffd is correct. it's the overflow calculation that is not. i tested that by manually setting overflow to zero. if you check the git commit diff at comment 1 you will see the places where i added code comments for the locations which need checking. i also listed the function. it is possible by using yield x.y.z to drill down the clase structure and print things that you need. if they are private variables add them to self. temporarily i typically use print dir(instance) to find the members, then do that again print dir(instance.subinstance) it is laborious but gets there i will help more tomorrow when it is not 3am.
the class that contains the failing code is DivOutputStage. therefore we need to "get at" DivOutputStage. the gtkwave tree (and yosys graphvis) show the tree into the failing code as: alu -> pipe_end -> output_stage the classes/instances are: alu=DIVBasePipe -> pipe_end=DivStagesEnd -> div_out=DivOutputStage div_out is local therefore i am committing a change which makes it accessible: +++ b/src/soc/fu/div/pipeline.py @@ -34,6 +34,7 @@ class DivStagesEnd(PipeModBaseChain): core_final = DivCoreFinalStage(self.pspec) div_out = DivOutputStage(self.pspec) alu_out = DivMulOutputStage(self.pspec) + self.div_out = div_out # debugging - bug #425 return [core_final, div_out, alu_out] commit f11ffd6b4be87763d7b93a77215212e1004f76e3 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Jul 10 10:16:28 2020 +0100 add debugging chain for #425 now we can look at that, back in fu/div/test/test_pipe_caller.py: vld = yield alu.n.valid_o while not vld: yield vld = yield alu.n.valid_o print ("bug track", alu.pipe_end.div_out) yield this has to be printed out *during* that analysis because it's a pipeline, and if we leave the inspection too late the data is gone. after valid_o has dropped is.... too late. ok so running the test with that, outputted lots of copies of the DivOutputStage object. now for something more useful: vld = yield alu.n.valid_o while not vld: yield vld = yield alu.n.valid_o # bug #425 investigation do = alu.pipe_end.div_out dive_abs_ov32 = yield do.i.dive_abs_ov32 quotient_neg = yield do.quotient_neg print ("dive_abs_ov32", hex(dive_abs_ov32)) print ("quotient_neg", hex(quotient_neg)) yield that outputs this: dive_abs_ov32 0x0 quotient_neg 0x1 dive_abs_ov32 0x0 quotient_neg 0x1 ok let's add some more: # bug #425 investigation do = alu.pipe_end.div_out ctx_op = do.i.ctx.op is_32bit = yield ctx_op.is_32bit is_signed = yield ctx_op.is_signed quotient_root = yield do.i.core.quotient_root dive_abs_ov32 = yield do.i.dive_abs_ov32 div_by_zero = yield do.i.div_by_zero quotient_neg = yield do.quotient_neg print ("32bit", hex(is_32bit)) print ("signed", hex(is_signed)) print ("quotient_root", hex(quotient_root)) print ("div_by_zero", hex(div_by_zero)) print ("dive_abs_ov32", hex(dive_abs_ov32)) print ("quotient_neg", hex(quotient_neg)) print ("") committed with this: commit 1040fedb60ba617b0a85997330c10e5769c06051 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Jul 10 10:40:15 2020 +0100 add more debug output for #425 and we get this output: 32bit 0x1 signed 0x1 quotient_root 0xfc00000000000003 div_by_zero 0x0 dive_abs_ov32 0x0 quotient_neg 0x1 so that's 32-bit, and signed. therefore sign_bit_mask is 0x8000000. therefore, 0xfc0000000000003 > 0x8000000 is successful. therefore, overflow is indicated when it should *not* be indicated. what i therefore suspect is that when op.is_32bit is set, the top bits of abs_quotient (quotient_root) must be ignored. but that dive_abs_ov32 obviously should not. the relevant code in microwatt divide.vhdl is here: if is_32bit = '0' then did_ovf <= overflow or (is_signed and (sresult(64) xor sresult(63))); elsif is_signed = '1' then if ovf32 = '1' or sresult(32) /= sresult(31) then did_ovf <= '1'; end if; else did_ovf <= ovf32; end if; where it can be seen that they XOR bit 64 and 63 for 64-bit overflow detection in the signed case and for 32-bit overflow, bit 32 and 31 are (effectively) XORed together. i'm going to give that a shot, see what happens (translate microwatt overflow detection into nmigen)
(In reply to Luke Kenneth Casson Leighton from comment #7) > > and we get this output: > > 32bit 0x1 > signed 0x1 > quotient_root 0xfc00000000000003 > div_by_zero 0x0 > dive_abs_ov32 0x0 > quotient_neg 0x1 that quotient_root is incorrect, assuming we're still using test case #6, check the divpipecore inputs. alternatively, nmigen just currently has a simulator bug which is also showing up as zero vcd values.
(In reply to Jacob Lifshay from comment #8) > (In reply to Luke Kenneth Casson Leighton from comment #7) > > > > > and we get this output: > > > > 32bit 0x1 > > signed 0x1 > > quotient_root 0xfc00000000000003 > > div_by_zero 0x0 > > dive_abs_ov32 0x0 > > quotient_neg 0x1 > > that quotient_root is incorrect, assuming we're still using test case #6, > check the divpipecore inputs. alternatively, nmigen just currently has a > simulator bug which is also showing up as zero vcd values. there should not be any spuriously set msb bits
ok so that translation worked. that resulted in that particular regression "working". however on re-enabling the random testing (s/def tst_/def test_/g) _another_ test came up (test_7_regression) which this time is when overflow *should* be set (according to the simulator) but that the overflow detection (microwatt-style this time) is *not* detecting. the input: def test_7_regression(self): # https://bugs.libre-soc.org/show_bug.cgi?id=425 lst = ["divw. 3, 1, 2"] initial_regs = [0] * 32 initial_regs[1] = 0xf1791627e05e8096 initial_regs[2] = 0xffc868bf4573da0b self.run_tst_program(Program(lst), initial_regs) the hardware output (from the debug output): res output {'cr_a': 2, 'xer_ov': 0, 'o': 0, 'xer_so': 0} the simulator output (from the debug output) sim output {'o': 0, 'cr_a': 3, 'xer_ov': 1, 'xer_so': 1} cr_a[1] is correct - the output is zero. cr_a[0] contains a copy of XER.ov which is wrong. actually the sim output might be wrong, here. it should never be set to 0b1. the two legal values are 0b00 and 0b11, NEVER 0b01.
(In reply to Jacob Lifshay from comment #9) > there should not be any spuriously set msb bits ok i checked when setting compare_len back to bitwidth * 3 and quotient_root was set to 0x1ffffffffffffffffd at the end. i suspect then that there are not enough bits in some places, which tends to suggest that there may be an undiscovered bug in the div_pipe code for other cases. all of this is pointing strongly towards simply using microwatt's FSM for now.
(In reply to Luke Kenneth Casson Leighton from comment #10) > actually the sim output might be wrong, here. it should never be > set to 0b1. the two legal values are 0b00 and 0b11, NEVER 0b01. it was wrong. fixed. commit f25aad525dfc184b5a8407e52b62a499dfb182f4 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Jul 10 16:04:08 2020 +0100 check for div_overflow equal to None rather than == 1 going through some more regression tests, it looks like we're out of the woods. the div_by_zero flag is not being propagated through the pipeline stages to the output stage, that was shown by this test: def test_div_by_zero_1(self): lst = ["divw. 3, 1, 2"] initial_regs = [0] * 32 initial_regs[1] = 0x1 initial_regs[2] = 0x0 self.run_tst_program(Program(lst), initial_regs) i'll sort that now.
ahh i think i now know why gtkwave (nmigen) "appeared" not to be working and also why the overflow calculation wasn't working: you'd missed out propagating the full set of parameters down the pipeline chain class CoreBaseData(DIVInputData): def __init__(self, pspec, core_data_class): super().__init__(pspec) self.core = core_data_class(pspec.core_config) self.divisor_neg = Signal(reset_less=True) self.dividend_neg = Signal(reset_less=True) self.div_by_zero = Signal(reset_less=True) <- missing self.dive_abs_ov32 = Signal(reset_less=True) <- missing self.dive_abs_ov64 = Signal(reset_less=True) <- missing +++ b/src/soc/fu/div/core_stages.py @@ -27,10 +27,12 @@ class DivCoreBaseStage(PipeModBase): def elaborate(self, platform): m = Module() + # pass-through on non-core parameters m.d.comb += self.o.eq_without_core(self.i) m.submodules.core = self.core + # copy parameters to/from divremsqrt core into the Base, here. m.d.comb += self.core.i.eq(self.i.core) m.d.comb += self.o.core.eq(self.core.o) diff --git a/src/soc/fu/div/pipe_data.py b/src/soc/fu/div/pipe_data.py index fa67aded..02f169f9 100644 --- a/src/soc/fu/div/pipe_data.py +++ b/src/soc/fu/div/pipe_data.py @@ -67,7 +67,10 @@ class CoreBaseData(DIVInputData): def eq_without_core(self, rhs): return super().eq(rhs) + \ [self.divisor_neg.eq(rhs.divisor_neg), - self.dividend_neg.eq(rhs.dividend_neg)] + self.dividend_neg.eq(rhs.dividend_neg), + self.dive_abs_ov32.eq(rhs.dive_abs_ov32), + self.dive_abs_ov64.eq(rhs.dive_abs_ov64), + self.div_by_zero.eq(rhs.div_by_zero)]
moving on to divuw - unsigned div - we now find that trunc_div is a *signed* div sigh. currently sorting that...