Bug 432 - mulld pseudocode overflow calculation incorrect
Summary: mulld pseudocode overflow calculation incorrect
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: All All
: --- normal
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 323 383
  Show dependency treegraph
 
Reported: 2020-07-15 15:38 BST by Jacob Lifshay
Modified: 2020-07-15 16:42 BST (History)
1 user (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2020-07-15 15:38:29 BST
I was reading some of the pseudocode, and discovered that the overflow calculation is incorrect.

it currently is:
prod[0:127] <- MULS((RA), (RB))
RT <- prod[64:127]
overflow <- ((prod[0:63] != 0x0000_0000_0000_0000) &
             (prod[0:63] != 0xffff_ffff_ffff_ffff))

it should be:
prod[0:127] <- MULS((RA), (RB))
RT <- prod[64:127]
overflow <- ((prod[0:64] != 0x0_0000_0000_0000_0000) &
             (prod[0:64] != 0x1_ffff_ffff_ffff_ffff))

since otherwise a product such as 1 << 63 is marked as not overflowing when it is clearly out of range since it's greater than the max i64. likewise for -1 << 64, which is smaller than the min i64.

reporting as a bug rather than just altering the pseudocode so we can check any existing code that is built off the incorrect pseudocode
Comment 1 Luke Kenneth Casson Leighton 2020-07-15 15:42:27 BST
let's double-check because it's directly from microwatt source
(which has been out long enough to work), let me find the right section
Comment 2 Luke Kenneth Casson Leighton 2020-07-15 15:52:45 BST
https://github.com/antonblanchard/microwatt/blob/master/multiply.vhdl#L72

        if v.multiply_pipeline(PIPELINE_DEPTH-1).is_32bit = '1' then
            ov := (or d(63 downto 31)) and not (and d(63 downto 31));
        else
            ov := (or d(127 downto 63)) and not (and d(127 downto 63));
        end if;

let's take the 64-bit case (because this is mulld).

* 127 downto 63 is 63 to 128 inclusive in nmigen/python
* however the numbering in the pseudocode is *inverted* but *NOT*
  python-style "end+1"

therefore the numbering should be:

* 127-127 to 127-63 which is indeed 0 to 64

i'll go ahead and correct that.
Comment 3 Luke Kenneth Casson Leighton 2020-07-15 15:54:11 BST
look reasonable?

diff --git a/openpower/isa/fixedarith.mdwn b/openpower/isa/fixedarith.mdwn
index 1931c72..5f55660 100644
--- a/openpower/isa/fixedarith.mdwn
+++ b/openpower/isa/fixedarith.mdwn
@@ -350,8 +350,8 @@ Pseudo-code:
 
     prod[0:63] <- MULS((RA)[32:63], (RB)[32:63])
     RT <- prod
-    overflow <- ((prod[0:31] != 0x0000_0000) &
-                 (prod[0:31] != 0xffff_ffff))
+    overflow <- ((prod[0:32] != 0x0_0000_0000) &
+                 (prod[0:32] != 0x1_ffff_ffff))
 
 Special Registers Altered:
 
@@ -564,8 +564,8 @@ Pseudo-code:
 
     prod[0:127] <- MULS((RA), (RB))
     RT <- prod[64:127]
-    overflow <- ((prod[0:63] != 0x0000_0000_0000_0000) &
-                 (prod[0:63] != 0xffff_ffff_ffff_ffff))
+    overflow <- ((prod[0:64] != 0x0_0000_0000_0000_0000) &
+                 (prod[0:64] != 0x1_ffff_ffff_ffff_ffff))
 
 Special Registers Altered:
Comment 4 Jacob Lifshay 2020-07-15 15:54:33 BST
sec 3.3.9.1 of powerisa v3.1:
If OE=1 then OV and OV32 are set to 1 if the product can-
not be represented in 64 bits.

so, it doesn't explicitly say u64 or i64

however, I did include test cases in power-instruction-analyzer which wouldn't overflow according to the buggy pseudocode but are still marked as overflowing in the mulld instruction model:
-1i64 * -0x8000_0000_0000_0000i64
2i64 * -0x8000_0000_0000_0000i64

instruction model code:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/fa9a4d450614adcdda9ec08dca40423492dcdbbd/src/instr_models.rs#L341

pub fn mulldo(inputs: InstructionInput) -> InstructionResult {
    let ra = inputs.ra as i64;
    let rb = inputs.rb as i64;
    let result = ra.wrapping_mul(rb) as u64;
    let overflow = ra.checked_mul(rb).is_none();
    InstructionResult {
        rt: Some(result),
        overflow: Some(OverflowFlags::from_overflow(overflow)),
        ..InstructionResult::default()
    }
}
Comment 5 Jacob Lifshay 2020-07-15 15:56:37 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> look reasonable?

looks good to me!
Comment 6 Luke Kenneth Casson Leighton 2020-07-15 15:57:56 BST
likewise this one?

@@ -51,10 +56,10 @@ class MulMainStage3(PipeModBase):
                 # compute overflow
                 mul_ov = Signal(reset_less=True)
                 with m.If(is_32bit):
-                    m32 = mul_o[32:64]
+                    m31 = mul_o[31:64] # yes really bits 31 to 63 (incl)
                     comb += mul_ov.eq(m32.bool() & ~m32.all())
                 with m.Else():
-                    m64 = mul_o[64:128]
+                    m64 = mul_o[63:128] # yes really bits 63 to 127 (incl)
                     comb += mul_ov.eq(m64.bool() & ~m64.all())
Comment 7 Luke Kenneth Casson Leighton 2020-07-15 16:02:05 BST
(In reply to Jacob Lifshay from comment #5)
> (In reply to Luke Kenneth Casson Leighton from comment #3)
> > look reasonable?
> 
> looks good to me!

ok great.  unit tests pass too.  however they're simulator-HDL not
simulator-qemu.  if similar overflow unit tests had been added to run against
qemu this would have been caught earlier.
Comment 8 Jacob Lifshay 2020-07-15 16:03:59 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)
> likewise this one?

yes, but...
> 
> @@ -51,10 +56,10 @@ class MulMainStage3(PipeModBase):
>                  # compute overflow
>                  mul_ov = Signal(reset_less=True)
>                  with m.If(is_32bit):
> -                    m32 = mul_o[32:64]
> +                    m31 = mul_o[31:64] # yes really bits 31 to 63 (incl)

add comment explaining that we're checking that the top 32 bits is the sign-extended version of the bottom 32 bits.
>                      comb += mul_ov.eq(m32.bool() & ~m32.all())

change to m31
>                  with m.Else():
> -                    m64 = mul_o[64:128]
> +                    m64 = mul_o[63:128] # yes really bits 63 to 127 (incl)
add similar sign-extended comment
>                      comb += mul_ov.eq(m64.bool() & ~m64.all())

change to m65
Comment 9 Luke Kenneth Casson Leighton 2020-07-15 16:11:52 BST
commit d139c34e6c337d692341609236c1d00579a33df5 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Jul 15 16:11:19 2020 +0100

    add better comments on mul overflow
Comment 10 Jacob Lifshay 2020-07-15 16:29:01 BST
(In reply to Luke Kenneth Casson Leighton from comment #7)
> (In reply to Jacob Lifshay from comment #5)
> > (In reply to Luke Kenneth Casson Leighton from comment #3)
> > > look reasonable?
> > 
> > looks good to me!
> 
> ok great.  unit tests pass too.  however they're simulator-HDL not
> simulator-qemu.  if similar overflow unit tests had been added to run against
> qemu this would have been caught earlier.

qemu is known to not match POWER9, so some tests should fail. I added the Python  bindings to power-instruction-analyzer specifically so we could write unit tests against its instruction models, since they are likely to catch cases where POWER9 and qemu disagree. So, to use it:

import power_instruction_analyzer as pia

def unit_test_mulldo_(self, ra, rb, rt, so, ov, ov32, cr0):
    # ra, rb, and rt are u64
    # so, ov, and ov32 are bool

    # includes rc for madd*, didn't yet set a default value
    # so rc=<something> required for now:
    inputs = pia.InstructionInputs(ra=ra, rb=rb, rc=0)
    # all functions assumes SO flag starts as 0
    outputs = pia.mulldo_(inputs) # the mulldo. instruction
    self.assertEqual(outputs.rt, rt)
    self.assertEqual(outputs.overflow.so, so)
    self.assertEqual(outputs.overflow.ov, ov)
    self.assertEqual(outputs.overflow.ov32, ov32)
    self.assertEqual(outputs.cr0.lt, bool(cr0 & 8))
    self.assertEqual(outputs.cr0.gt, bool(cr0 & 4))
    self.assertEqual(outputs.cr0.eq, bool(cr0 & 2))
    self.assertEqual(outputs.cr0.so, bool(cr0 & 1))
Comment 11 Jacob Lifshay 2020-07-15 16:31:20 BST
(In reply to Jacob Lifshay from comment #10)
> fn unit_test_mulldo_(self, ra, rb, rt, so, ov, ov32, cr0):

oops, should be def not fn
Comment 12 Luke Kenneth Casson Leighton 2020-07-15 16:32:05 BST
thank you jacob for spotting this. am reasonably happy with this one however would prefer to leave it open until qemu tests against the simulator are added.  also crossref to the Formal Proof as something to watch out for
Comment 13 Luke Kenneth Casson Leighton 2020-07-15 16:37:25 BST
(In reply to Jacob Lifshay from comment #10)

> qemu is known to not match POWER9, so some tests should fail.

arg yes.

> I added the
> Python  bindings to power-instruction-simulator specifically so we could
> write unit tests against its instruction models, since they are likely to
> catch cases where POWER9 and qemu disagree. So, to use it:
> 
> import power_instruction_analyzer as pia

>     inputs = pia.InstructionInputs(ra=ra, rb=rb, rc=0)
>     # all functions assumes SO flag starts as 0
>     outputs = pia.mulldo_(inputs) # the mulldo. instruction


ooo niice

ok so that makes pia a build/test dependency.

hm ok so a build script for pia is needed, in dev-setup-env at the minimum.  plus updating the wiki and INSTALL.txt

will raise a new bugreport on this