I was reading some of the pseudocode, and discovered that the overflow calculation is incorrect. it currently is: prod[0:127] <- MULS((RA), (RB)) RT <- prod[64:127] overflow <- ((prod[0:63] != 0x0000_0000_0000_0000) & (prod[0:63] != 0xffff_ffff_ffff_ffff)) it should be: prod[0:127] <- MULS((RA), (RB)) RT <- prod[64:127] overflow <- ((prod[0:64] != 0x0_0000_0000_0000_0000) & (prod[0:64] != 0x1_ffff_ffff_ffff_ffff)) since otherwise a product such as 1 << 63 is marked as not overflowing when it is clearly out of range since it's greater than the max i64. likewise for -1 << 64, which is smaller than the min i64. reporting as a bug rather than just altering the pseudocode so we can check any existing code that is built off the incorrect pseudocode
let's double-check because it's directly from microwatt source (which has been out long enough to work), let me find the right section
https://github.com/antonblanchard/microwatt/blob/master/multiply.vhdl#L72 if v.multiply_pipeline(PIPELINE_DEPTH-1).is_32bit = '1' then ov := (or d(63 downto 31)) and not (and d(63 downto 31)); else ov := (or d(127 downto 63)) and not (and d(127 downto 63)); end if; let's take the 64-bit case (because this is mulld). * 127 downto 63 is 63 to 128 inclusive in nmigen/python * however the numbering in the pseudocode is *inverted* but *NOT* python-style "end+1" therefore the numbering should be: * 127-127 to 127-63 which is indeed 0 to 64 i'll go ahead and correct that.
look reasonable? diff --git a/openpower/isa/fixedarith.mdwn b/openpower/isa/fixedarith.mdwn index 1931c72..5f55660 100644 --- a/openpower/isa/fixedarith.mdwn +++ b/openpower/isa/fixedarith.mdwn @@ -350,8 +350,8 @@ Pseudo-code: prod[0:63] <- MULS((RA)[32:63], (RB)[32:63]) RT <- prod - overflow <- ((prod[0:31] != 0x0000_0000) & - (prod[0:31] != 0xffff_ffff)) + overflow <- ((prod[0:32] != 0x0_0000_0000) & + (prod[0:32] != 0x1_ffff_ffff)) Special Registers Altered: @@ -564,8 +564,8 @@ Pseudo-code: prod[0:127] <- MULS((RA), (RB)) RT <- prod[64:127] - overflow <- ((prod[0:63] != 0x0000_0000_0000_0000) & - (prod[0:63] != 0xffff_ffff_ffff_ffff)) + overflow <- ((prod[0:64] != 0x0_0000_0000_0000_0000) & + (prod[0:64] != 0x1_ffff_ffff_ffff_ffff)) Special Registers Altered:
sec 3.3.9.1 of powerisa v3.1: If OE=1 then OV and OV32 are set to 1 if the product can- not be represented in 64 bits. so, it doesn't explicitly say u64 or i64 however, I did include test cases in power-instruction-analyzer which wouldn't overflow according to the buggy pseudocode but are still marked as overflowing in the mulld instruction model: -1i64 * -0x8000_0000_0000_0000i64 2i64 * -0x8000_0000_0000_0000i64 instruction model code: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/fa9a4d450614adcdda9ec08dca40423492dcdbbd/src/instr_models.rs#L341 pub fn mulldo(inputs: InstructionInput) -> InstructionResult { let ra = inputs.ra as i64; let rb = inputs.rb as i64; let result = ra.wrapping_mul(rb) as u64; let overflow = ra.checked_mul(rb).is_none(); InstructionResult { rt: Some(result), overflow: Some(OverflowFlags::from_overflow(overflow)), ..InstructionResult::default() } }
(In reply to Luke Kenneth Casson Leighton from comment #3) > look reasonable? looks good to me!
likewise this one? @@ -51,10 +56,10 @@ class MulMainStage3(PipeModBase): # compute overflow mul_ov = Signal(reset_less=True) with m.If(is_32bit): - m32 = mul_o[32:64] + m31 = mul_o[31:64] # yes really bits 31 to 63 (incl) comb += mul_ov.eq(m32.bool() & ~m32.all()) with m.Else(): - m64 = mul_o[64:128] + m64 = mul_o[63:128] # yes really bits 63 to 127 (incl) comb += mul_ov.eq(m64.bool() & ~m64.all())
(In reply to Jacob Lifshay from comment #5) > (In reply to Luke Kenneth Casson Leighton from comment #3) > > look reasonable? > > looks good to me! ok great. unit tests pass too. however they're simulator-HDL not simulator-qemu. if similar overflow unit tests had been added to run against qemu this would have been caught earlier.
(In reply to Luke Kenneth Casson Leighton from comment #6) > likewise this one? yes, but... > > @@ -51,10 +56,10 @@ class MulMainStage3(PipeModBase): > # compute overflow > mul_ov = Signal(reset_less=True) > with m.If(is_32bit): > - m32 = mul_o[32:64] > + m31 = mul_o[31:64] # yes really bits 31 to 63 (incl) add comment explaining that we're checking that the top 32 bits is the sign-extended version of the bottom 32 bits. > comb += mul_ov.eq(m32.bool() & ~m32.all()) change to m31 > with m.Else(): > - m64 = mul_o[64:128] > + m64 = mul_o[63:128] # yes really bits 63 to 127 (incl) add similar sign-extended comment > comb += mul_ov.eq(m64.bool() & ~m64.all()) change to m65
commit d139c34e6c337d692341609236c1d00579a33df5 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Jul 15 16:11:19 2020 +0100 add better comments on mul overflow
(In reply to Luke Kenneth Casson Leighton from comment #7) > (In reply to Jacob Lifshay from comment #5) > > (In reply to Luke Kenneth Casson Leighton from comment #3) > > > look reasonable? > > > > looks good to me! > > ok great. unit tests pass too. however they're simulator-HDL not > simulator-qemu. if similar overflow unit tests had been added to run against > qemu this would have been caught earlier. qemu is known to not match POWER9, so some tests should fail. I added the Python bindings to power-instruction-analyzer specifically so we could write unit tests against its instruction models, since they are likely to catch cases where POWER9 and qemu disagree. So, to use it: import power_instruction_analyzer as pia def unit_test_mulldo_(self, ra, rb, rt, so, ov, ov32, cr0): # ra, rb, and rt are u64 # so, ov, and ov32 are bool # includes rc for madd*, didn't yet set a default value # so rc=<something> required for now: inputs = pia.InstructionInputs(ra=ra, rb=rb, rc=0) # all functions assumes SO flag starts as 0 outputs = pia.mulldo_(inputs) # the mulldo. instruction self.assertEqual(outputs.rt, rt) self.assertEqual(outputs.overflow.so, so) self.assertEqual(outputs.overflow.ov, ov) self.assertEqual(outputs.overflow.ov32, ov32) self.assertEqual(outputs.cr0.lt, bool(cr0 & 8)) self.assertEqual(outputs.cr0.gt, bool(cr0 & 4)) self.assertEqual(outputs.cr0.eq, bool(cr0 & 2)) self.assertEqual(outputs.cr0.so, bool(cr0 & 1))
(In reply to Jacob Lifshay from comment #10) > fn unit_test_mulldo_(self, ra, rb, rt, so, ov, ov32, cr0): oops, should be def not fn
thank you jacob for spotting this. am reasonably happy with this one however would prefer to leave it open until qemu tests against the simulator are added. also crossref to the Formal Proof as something to watch out for
(In reply to Jacob Lifshay from comment #10) > qemu is known to not match POWER9, so some tests should fail. arg yes. > I added the > Python bindings to power-instruction-simulator specifically so we could > write unit tests against its instruction models, since they are likely to > catch cases where POWER9 and qemu disagree. So, to use it: > > import power_instruction_analyzer as pia > inputs = pia.InstructionInputs(ra=ra, rb=rb, rc=0) > # all functions assumes SO flag starts as 0 > outputs = pia.mulldo_(inputs) # the mulldo. instruction ooo niice ok so that makes pia a build/test dependency. hm ok so a build script for pia is needed, in dev-setup-env at the minimum. plus updating the wiki and INSTALL.txt will raise a new bugreport on this