Bug 425 - DIV overflow not being calculated correctly
Summary: DIV overflow not being calculated correctly
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Jacob Lifshay
URL:
Depends on:
Blocks: 324
  Show dependency treegraph
 
Reported: 2020-07-10 00:17 BST by Luke Kenneth Casson Leighton
Modified: 2020-07-10 16:58 BST (History)
1 user (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-07-10 00:17:32 BST
this example does not compute overflow correctly in div output_stage.py:

        lst = ["divw. 3, 1, 2"]
        initial_regs = [0] * 32
        initial_regs[1] = 0x61c1cc3b80f2a6af
        initial_regs[2] = 0x9dc66a7622c32bc0

run fu/div/test/test_pipe_caller.py it is test_6_regression()
Comment 1 Luke Kenneth Casson Leighton 2020-07-10 00:19:46 BST
jacob can you take a look at this one, there's something odd and i can't
work it out

commit f759f42491c04c7d28401b9d08aa9b0fa0bc7f7d (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Jul 10 00:18:53 2020 +0100

    add regression test for div overflow case
    see https://bugs.libre-soc.org/show_bug.cgi?id=425
Comment 2 Jacob Lifshay 2020-07-10 01:29:14 BST
I ran that test after fixing some breakage from recent changes in nmigen.

It passed. Could you point out in more detail what's broken?

I ran it with:
python src/soc/fu/div/test/test_pipe_caller.py DivTestCase.test_6_regression
Comment 3 Jacob Lifshay 2020-07-10 01:33:08 BST
The results I get from running it using power-instruction-analyzer's model is:
>>> import power_instruction_analyzer as pia
>>> print(pia.divw_(pia.InstructionInput(ra=0x61c1cc3b80f2a6af,rb=0x9dc66a7622c32bc0,rc=0)))
{"rt":"0xFFFFFFFD","cr0":{"lt":false,"gt":true,"eq":false,"so":false}}
Comment 4 Jacob Lifshay 2020-07-10 01:37:31 BST
(In reply to Jacob Lifshay from comment #2)
> I ran that test after fixing some breakage from recent changes in nmigen.
> 
> It passed. Could you point out in more detail what's broken?
> 
> I ran it with:
> python src/soc/fu/div/test/test_pipe_caller.py DivTestCase.test_6_regression

Aha, I was using DivTestCase instead of DIVTestCase. oops
Comment 5 Jacob Lifshay 2020-07-10 02:33:11 BST
(In reply to Jacob Lifshay from comment #4)
> (In reply to Jacob Lifshay from comment #2)
> > I ran that test after fixing some breakage from recent changes in nmigen.
> > 
> > It passed. Could you point out in more detail what's broken?
> > 
> > I ran it with:
> > python src/soc/fu/div/test/test_pipe_caller.py DivTestCase.test_6_regression
> 
> Aha, I was using DivTestCase instead of DIVTestCase. oops

Well, I got the test to fail, the problem is either the pipeline is broken to the point where ra and rb in DivSetupStage and lots of other places are always 0, or nmigen's simulator is broken where it writes zeros to random wires in the .vcd file.

I'd have to figure out where to add print statements for debugging or something, since there are so many layers of abstraction that I'm a bit lost.

I'm not really sure how to write a test case that just sets the inputs of the pipeline to values and runs it without needing all the extra stuff for handling instruction cancellation, matching against qemu, or matching against the model from the Power spec.
Comment 6 Luke Kenneth Casson Leighton 2020-07-10 03:07:08 BST
(In reply to Jacob Lifshay from comment 

> Well, I got the test to fail, the problem is either the pipeline is broken
> to the point where ra and rb in DivSetupStage and lots of other places are
> always 0, or nmigen's simulator is broken where it writes zeros to random
> wires in the .vcd file.

gtkwave unfortunately has bugs.

> I'd have to figure out where to add print statements for debugging or
> something, since there are so many layers of abstraction that I'm a bit lost.

a FSM would be one hell of a lot aimpler.

> I'm not really sure how to write a test case that just sets the inputs of
> the pipeline to values and runs it without needing all the extra stuff for
> handling instruction cancellation, matching against qemu, or matching
> against the model from the Power spec.

the result 0xfffffffd is correct.  it's the overflow calculation that is not.

i tested that by manually setting overflow to zero.

if you check the git commit diff at comment 1 you will see the places where i added code comments for the locations which need checking.

i also listed the function.

it is possible by using yield x.y.z to drill down the clase structure and print things that you need.  if they are private variables add them to self. temporarily

i typically use print dir(instance) to find the members, then do that again print dir(instance.subinstance) it is laborious but gets there

i will help more tomorrow when it is not 3am.
Comment 7 Luke Kenneth Casson Leighton 2020-07-10 10:41:18 BST
the class that contains the failing code is DivOutputStage.
therefore we need to "get at" DivOutputStage.



the gtkwave tree (and yosys graphvis) show the tree into the failing
code as:

alu -> pipe_end -> output_stage

the classes/instances are:

alu=DIVBasePipe -> pipe_end=DivStagesEnd -> div_out=DivOutputStage

div_out is local therefore i am committing a change which makes it
accessible:

+++ b/src/soc/fu/div/pipeline.py
@@ -34,6 +34,7 @@ class DivStagesEnd(PipeModBaseChain):
         core_final = DivCoreFinalStage(self.pspec)
         div_out = DivOutputStage(self.pspec)
         alu_out = DivMulOutputStage(self.pspec)
+        self.div_out = div_out # debugging - bug #425
         return [core_final, div_out, alu_out]

 commit f11ffd6b4be87763d7b93a77215212e1004f76e3 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Jul 10 10:16:28 2020 +0100

    add debugging chain for #425


now we can look at that, back in fu/div/test/test_pipe_caller.py:
                    vld = yield alu.n.valid_o
                    while not vld:
                        yield
                        vld = yield alu.n.valid_o
                        print ("bug track", alu.pipe_end.div_out)
                    yield

this has to be printed out *during* that analysis because it's a pipeline,
and if we leave the inspection too late the data is gone.

after valid_o has dropped is.... too late.

ok so running the test with that, outputted lots of copies of the
DivOutputStage object.  now for something more useful:

                    vld = yield alu.n.valid_o
                    while not vld:
                        yield
                        vld = yield alu.n.valid_o
                        # bug #425 investigation
                        do = alu.pipe_end.div_out
                        dive_abs_ov32 = yield do.i.dive_abs_ov32
                        quotient_neg = yield do.quotient_neg
                        print ("dive_abs_ov32", hex(dive_abs_ov32))
                        print ("quotient_neg", hex(quotient_neg))
                    yield

that outputs this:

dive_abs_ov32 0x0
quotient_neg 0x1
dive_abs_ov32 0x0
quotient_neg 0x1


ok let's add some more:

                        # bug #425 investigation
                        do = alu.pipe_end.div_out
                        ctx_op = do.i.ctx.op
                        is_32bit = yield ctx_op.is_32bit
                        is_signed = yield ctx_op.is_signed
                        quotient_root = yield do.i.core.quotient_root
                        dive_abs_ov32 = yield do.i.dive_abs_ov32
                        div_by_zero = yield do.i.div_by_zero
                        quotient_neg = yield do.quotient_neg
                        print ("32bit", hex(is_32bit))
                        print ("signed", hex(is_signed))
                        print ("quotient_root", hex(quotient_root))
                        print ("div_by_zero", hex(div_by_zero))
                        print ("dive_abs_ov32", hex(dive_abs_ov32))
                        print ("quotient_neg", hex(quotient_neg))
                        print ("")

committed with this:

commit 1040fedb60ba617b0a85997330c10e5769c06051 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Jul 10 10:40:15 2020 +0100

    add more debug output for #425


and we get this output:

32bit 0x1
signed 0x1
quotient_root 0xfc00000000000003
div_by_zero 0x0
dive_abs_ov32 0x0
quotient_neg 0x1

so that's 32-bit, and signed.  therefore sign_bit_mask is 0x8000000.

therefore, 0xfc0000000000003 > 0x8000000 is successful.

therefore, overflow is indicated when it should *not* be indicated.

what i therefore suspect is that when op.is_32bit is set, the top
bits of abs_quotient (quotient_root) must be ignored.  but that
dive_abs_ov32 obviously should not.


the relevant code in microwatt divide.vhdl is here:

        if is_32bit = '0' then
            did_ovf <= overflow or (is_signed and (sresult(64) xor sresult(63)));
        elsif is_signed = '1' then
            if ovf32 = '1' or sresult(32) /= sresult(31) then
                did_ovf <= '1';
            end if;
        else
            did_ovf <= ovf32;
        end if;


where it can be seen that they XOR bit 64 and 63 for 64-bit overflow
detection in the signed case

and for 32-bit overflow, bit 32 and 31 are (effectively) XORed together.

i'm going to give that a shot, see what happens (translate microwatt
overflow detection into nmigen)
Comment 8 Jacob Lifshay 2020-07-10 10:53:00 BST
(In reply to Luke Kenneth Casson Leighton from comment #7)

> 
> and we get this output:
> 
> 32bit 0x1
> signed 0x1
> quotient_root 0xfc00000000000003
> div_by_zero 0x0
> dive_abs_ov32 0x0
> quotient_neg 0x1

that quotient_root is incorrect, assuming we're still using test case #6, check the divpipecore inputs. alternatively, nmigen just currently has a simulator bug which is also showing up as zero vcd values.
Comment 9 Jacob Lifshay 2020-07-10 10:55:02 BST
(In reply to Jacob Lifshay from comment #8)
> (In reply to Luke Kenneth Casson Leighton from comment #7)
> 
> > 
> > and we get this output:
> > 
> > 32bit 0x1
> > signed 0x1
> > quotient_root 0xfc00000000000003
> > div_by_zero 0x0
> > dive_abs_ov32 0x0
> > quotient_neg 0x1
> 
> that quotient_root is incorrect, assuming we're still using test case #6,
> check the divpipecore inputs. alternatively, nmigen just currently has a
> simulator bug which is also showing up as zero vcd values.

there should not be any spuriously set msb bits
Comment 10 Luke Kenneth Casson Leighton 2020-07-10 11:26:33 BST
ok so that translation worked.  that resulted in that particular
regression "working".  however on re-enabling the random testing
(s/def tst_/def test_/g) _another_ test came up (test_7_regression)
which this time is when overflow *should* be set (according to the
simulator) but that the overflow detection (microwatt-style this
time) is *not* detecting.

the input:

    def test_7_regression(self):
        # https://bugs.libre-soc.org/show_bug.cgi?id=425
        lst = ["divw. 3, 1, 2"]
        initial_regs = [0] * 32
        initial_regs[1] = 0xf1791627e05e8096
        initial_regs[2] = 0xffc868bf4573da0b
        self.run_tst_program(Program(lst), initial_regs)


the hardware output (from the debug output):

     res output {'cr_a': 2, 'xer_ov': 0, 'o': 0, 'xer_so': 0}

the simulator output (from the debug output)

     sim output {'o': 0, 'cr_a': 3, 'xer_ov': 1, 'xer_so': 1}

cr_a[1] is correct - the output is zero.
cr_a[0] contains a copy of XER.ov which is wrong.

actually the sim output might be wrong, here.  it should never be
set to 0b1.  the two legal values are 0b00 and 0b11, NEVER 0b01.
Comment 11 Luke Kenneth Casson Leighton 2020-07-10 11:36:48 BST
(In reply to Jacob Lifshay from comment #9)

> there should not be any spuriously set msb bits

ok i checked when setting compare_len back to bitwidth * 3 and quotient_root
was set to 0x1ffffffffffffffffd at the end.

i suspect then that there are not enough bits in some places, which
tends to suggest that there may be an undiscovered bug in the
div_pipe code for other cases.

all of this is pointing strongly towards simply using microwatt's FSM
for now.
Comment 12 Luke Kenneth Casson Leighton 2020-07-10 16:19:54 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)

> actually the sim output might be wrong, here.  it should never be
> set to 0b1.  the two legal values are 0b00 and 0b11, NEVER 0b01.

it was wrong.  fixed.  

commit f25aad525dfc184b5a8407e52b62a499dfb182f4
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Jul 10 16:04:08 2020 +0100

    check for div_overflow equal to None rather than == 1


going through some more regression tests, it looks like we're out of the
woods.  the div_by_zero flag is not being propagated through the
pipeline stages to the output stage, that was shown by this test:

    def test_div_by_zero_1(self):
        lst = ["divw. 3, 1, 2"]
        initial_regs = [0] * 32
        initial_regs[1] = 0x1
        initial_regs[2] = 0x0
        self.run_tst_program(Program(lst), initial_regs)

i'll sort that now.
Comment 13 Luke Kenneth Casson Leighton 2020-07-10 16:37:47 BST
ahh i think i now know why gtkwave (nmigen) "appeared" not to be working
and also why the overflow calculation wasn't working: you'd missed out
propagating the full set of parameters down the pipeline chain

class CoreBaseData(DIVInputData):
    def __init__(self, pspec, core_data_class):
        super().__init__(pspec)
        self.core = core_data_class(pspec.core_config)
        self.divisor_neg = Signal(reset_less=True)
        self.dividend_neg = Signal(reset_less=True)
        self.div_by_zero = Signal(reset_less=True)  <- missing
        self.dive_abs_ov32 = Signal(reset_less=True) <- missing
        self.dive_abs_ov64 = Signal(reset_less=True) <- missing


+++ b/src/soc/fu/div/core_stages.py
@@ -27,10 +27,12 @@ class DivCoreBaseStage(PipeModBase):
     def elaborate(self, platform):
         m = Module()
 
+        # pass-through on non-core parameters
         m.d.comb += self.o.eq_without_core(self.i)
 
         m.submodules.core = self.core
 
+        # copy parameters to/from divremsqrt core into the Base, here.
         m.d.comb += self.core.i.eq(self.i.core)
         m.d.comb += self.o.core.eq(self.core.o)
 
diff --git a/src/soc/fu/div/pipe_data.py b/src/soc/fu/div/pipe_data.py
index fa67aded..02f169f9 100644
--- a/src/soc/fu/div/pipe_data.py
+++ b/src/soc/fu/div/pipe_data.py
@@ -67,7 +67,10 @@ class CoreBaseData(DIVInputData):
     def eq_without_core(self, rhs):
         return super().eq(rhs) + \
             [self.divisor_neg.eq(rhs.divisor_neg),
-             self.dividend_neg.eq(rhs.dividend_neg)]
+             self.dividend_neg.eq(rhs.dividend_neg),
+             self.dive_abs_ov32.eq(rhs.dive_abs_ov32),
+             self.dive_abs_ov64.eq(rhs.dive_abs_ov64),
+             self.div_by_zero.eq(rhs.div_by_zero)]
Comment 14 Luke Kenneth Casson Leighton 2020-07-10 16:58:27 BST
moving on to divuw - unsigned div - we now find that trunc_div is
a *signed* div sigh.  currently sorting that...