324 – create POWER DIV pipeline

Bug 324 - create POWER DIV pipeline

Summary: create POWER DIV pipeline

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Jacob Lifshay

URL:

Depends on:	356 396 413 420 425 437 438 439 433 449 471
Blocks:	383
	Show dependency tree / graph

Reported:	2020-05-19 13:03 BST by Luke Kenneth Casson Leighton
Modified:	2020-09-23 18:46 BST (History)
CC List:	3 users (show)

See Also:	305 417 449
NLnet milestone:	NLNet.2019.10.043.Wishbone
total budget (EUR) for completion of task and all subtasks:	1500
budget (EUR) for this task, excluding subtasks' budget:	1500
parent task for budget allocation:	383
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	programmerjake={amount=1000,paid=2020-09-22} lkcl={amount=500,paid=2020-08-14}

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2020-05-19 13:03:54 BST

a DIV (and MOD) pipeline is needed similar to the other soc.fu pipelines.

Comment 1 Luke Kenneth Casson Leighton 2020-05-19 13:16:10 BST

register allocations, for fu.div.pipe_data.py

looks to be identical to ALUInputData and ALUOutputData.

https://libre-soc.org/openpower/isa/fixedarith/

# Divide

XO-Form   divw* RT,RA,RB

Special Registers Altered:

* CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1)
* SO OV OV32  (if OE=1)

# Modulo

X-Form  mod* RT,RA,RB

no special regs altered

Comment 2 Luke Kenneth Casson Leighton 2020-05-20 01:21:28 BST

https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_enums.py;hb=HEAD

TODO, add DIV to Function enum, and back in the (annoyingly separated) CSV files change DIV and MOD operations from type ALU Function Unit to type DIV.

Comment 3 Jacob Lifshay 2020-05-22 22:03:14 BST

after looking through the spec some, it turns out that div and rem don't actually trap on division by zero, so that makes them much easier to implement. The numerical results are undefined on division-by-zero/overflow, so I will need to run some tests on a Power processor to see which results we should produce, hopefully they are the same as for RISC-V.

Comment 4 Luke Kenneth Casson Leighton 2020-05-22 23:58:29 BST

(In reply to Jacob Lifshay from comment #3)
> after looking through the spec some, it turns out that div and rem don't
> actually trap on division by zero, so that makes them much easier to
> implement. The numerical results are undefined on division-by-zero/overflow,
> so I will need to run some tests on a Power processor to see which results
> we should produce, hopefully they are the same as for RISC-V.

good idea.

Comment 5 Jacob Lifshay 2020-05-26 04:55:20 BST

I wrote a simple Rust program that runs different Power instructions (just div* and mod* for now) and put the results here:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/output-for-initial-2-ga7250d2.txt

Source here:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer

result_prev is assigned to the result register before executing the tested instruction since sometimes processors just skip writing to a register in exceptional cases

From what I can tell, for all error cases for div and mod instructions, POWER9 just writes 0 to the result register.

For the instructions with just 32-bit results, from what I can tell from a relatively small number of test cases, it zero extends for all signed/unsigned div instructions, zero extends for unsigned mod instructions, and sign extends for signed mod instructions.

Comment 6 Jacob Lifshay 2020-05-28 07:51:35 BST

I added a Rust model for the div and rem instructions as well as converted the output format to JSON since that's waay easier to ingest than the custom output format I had before. The model matches the native POWER9 instructions for all the cases I tested.

I also added conditional compilation code that allows running just the model on any computer rather than just on POWER9:
Run it with:
cargo run --no-default-features

Model functions:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/src/instr_models.rs

This should make it easier to copy the behavior of POWER9 in the nmigen div pipeline.

Comment 7 Luke Kenneth Casson Leighton 2020-05-28 11:03:53 BST

(In reply to Jacob Lifshay from comment #6)
> I added a Rust model for the div and rem instructions as well as converted
> the output format to JSON since that's waay easier to ingest than the custom
> output format I had before.

:)

> This should make it easier to copy the behavior of POWER9 in the nmigen div
> pipeline.

ah fantastic.  yes, looking at microwatt, the synthesis of all the various
options is quite obtuse (and brilliant).

i've found it is essential to have some form of guide in a familiar language.
it's really useful that these are also binary-executable on native POWER9
as well as really clear documented actual implementations.

Comment 8 Jacob Lifshay 2020-06-10 08:00:24 BST

Created the setup stage that translates from the input data to the format expected by DivPipeCore but didn't create tests yet, am going to be done for today. Still have to double check that I got the number of fractional bits correct.

Comment 9 Luke Kenneth Casson Leighton 2020-06-10 12:00:59 BST

(In reply to Jacob Lifshay from comment #8)
> Created the setup stage that translates from the input data to the format
> expected by DivPipeCore but didn't create tests yet, am going to be done for
> today. Still have to double check that I got the number of fractional bits
> correct.

fantastic.  as you can probably see i'm slowly going through the test_pipe_caller.py code removing code-duplication.  for now i recommend
cookie-cutting fu/alu/test_pipe_caller.py in its entirety, because i
noticed that these use OE/Rc.  

* divdu RT,RA,RB (OE=0 Rc=0)

Special Registers Altered:

    CR0                     (if Rc=1)
    SO OV OV32             (if OE=1)

that means they're exactly like fixedarith, for *both* input and output
data, so you can straight-cookie-cut alu test_pipe_caller.py

i will drop in an fu/compunits/test/test_div_compunit.py

Comment 10 Luke Kenneth Casson Leighton 2020-06-10 12:32:54 BST

(In reply to Luke Kenneth Casson Leighton from comment #9)

> that means they're exactly like fixedarith, for *both* input and output
> data, so you can straight-cookie-cut alu test_pipe_caller.py

dropped in a copy for you (the one that was there already needed
updating).

Comment 11 Luke Kenneth Casson Leighton 2020-06-19 11:33:21 BST

jacob i've added soc/simulator/test_sim_div.py and immediately ran into
both pseudocode and spec issues.  both divsw and modsw contain a
"minus" symbol following the setup of the divisor:

    divisor [0:31] <- (RB)[32:63]-

this has no explanation in the spec notation as to what it means.
also, the variables "dividend" and "divisor" do not exist.  given
that this is the pseudocode we're talking about - the actual spec -
i am reluctant to make significant modifications.  we may however
have to.

i have already made this change:

    RT[0:31 ] <- undefined

-->

    RT[0:31 ] <- undefined[0:31]

however i am not happy with it.

Comment 12 Luke Kenneth Casson Leighton 2020-06-19 11:42:00 BST

i had to change decoder/isa/fixedarith.py to this, to get it to work:

    @inject()
    def op_divw(self, RA, RB, RT):
        dividend = RA[32:64]
        divisor = RB[32:64]
        RT[32:64] = dividend // divisor
        RT[0:32] = undefined[0:32]
        return (RT,)

fixedarith.py is auto-generated by pywriter.py

oh, i also updated the parser so that it will output floordiv not div,
and updated SelectableInt so that it has an __floordiv__ function.  the
parser was previously outputting this:

        RT[32:64] = dividend / divisor

it will be necessary to check the behaviour of signed/unsigned and
potentially code up the correct behaviour in SelectableInt.__mod__
and SelectableInt.__floordiv__:

>>> 5//0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> 

>>> -1 % 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> 


yes, definitely.  we can't have that.  those will need to be caught in
SelectableInt and the correct value returned.


also this, from the spec for modsw:

If an attempt is made to perform any of the divisions
0x8000_0000 % -1
<anything> % 0

and for modsd:

If an attempt is made to perform any of the divisions
<anything> % 0 0x8000_0000_0000_0000 % -1

that's likely going to have to be special-cased as well, or at least tested
and we work out what is supposed to be put into RT.  interestingly these
do *not* result in overflow flags being set.

Comment 13 Jacob Lifshay 2020-06-19 14:05:33 BST

The spec pseudo-code doesn't contain all the required information to implement any of the div or mod operations since only the prose states the conditions for overflow (which *DO* show up in the overflow flags when div*o is used). Additionally, only the prose states if the division is to be interpreted as signed or unsigned.

All divisions are truncating division, not floor division, so we should translate them to function calls to custom trunc_div and trunc_rem functions rather than python's // and % operators (since the mod* instructions implement the mathematical remainder of truncating division operation rather than the mathematical modulo operation).

Code not tested:

def trunc_div(n, d):
    f = getattr(n, "trunc_div", None)
    if f is not None:
        return f(d)
    fr = getattr(d, "rtrunc_div", None)
    if fr is not None:
        return fr(n)
    abs_n = abs(n)
    abs_d = abs(d)
    abs_q = n // d
    if (n < 0) == (d < 0):
        return abs_q
    return -abs_q

def trunc_rem(n, d):
    f = getattr(n, "trunc_rem", None)
    if f is not None:
        return f(d)
    fr = getattr(d, "rtrunc_rem", None)
    if fr is not None:
        return fr(n)
    return n - d * trunc_div(n, d)

Comment 14 Jacob Lifshay 2020-06-19 14:23:34 BST

(In reply to Luke Kenneth Casson Leighton from comment #12)
> also this, from the spec for modsw:
> 
> If an attempt is made to perform any of the divisions
> 0x8000_0000 % -1
> <anything> % 0
> 
> and for modsd:
> 
> If an attempt is made to perform any of the divisions
> <anything> % 0 0x8000_0000_0000_0000 % -1

Those are just the conditions where the corresponding div* instruction would overflow.

> 
> that's likely going to have to be special-cased as well, or at least tested
> and we work out what is supposed to be put into RT.

already done:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L195

> interestingly these
> do *not* result in overflow flags being set.

To clarify my previous statement, the OV, OV32, and SO flags are only set when the div*o variant of the instructions is selected but that doesn't exist for the mod* instructions, that's because div* is XO-form and mod* is X-form. I'm assuming ALUOutputStage will handle that wrinkle, though didn't yet check.

Comment 15 Luke Kenneth Casson Leighton 2020-06-19 14:52:01 BST

(In reply to Jacob Lifshay from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #12)
> > If an attempt is made to perform any of the divisions
> > <anything> % 0 0x8000_0000_0000_0000 % -1
> 
> Those are just the conditions where the corresponding div* instruction would
> overflow.

ok.
 
> > 
> > that's likely going to have to be special-cased as well, or at least tested
> > and we work out what is supposed to be put into RT.
> 
> already done:
> https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/
> 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L195

brilliant.
 
> > interestingly these
> > do *not* result in overflow flags being set.
> 
> To clarify my previous statement, the OV, OV32, and SO flags are only set
> when the div*o variant of the instructions is selected but that doesn't
> exist for the mod* instructions, that's because div* is XO-form and mod* is
> X-form. I'm assuming ALUOutputStage will handle that wrinkle, though didn't
> yet check.

yes it does.  microwatt's source (now in the CSV files) sets on a per-op basis
whether followup OV/OV32/SO shall be actioned.  the Simulator ISACaller uses
the *exact* same PowerDecoder2 to determine this exact same information.

Comment 16 Luke Kenneth Casson Leighton 2020-06-19 15:00:40 BST

(In reply to Jacob Lifshay from comment #13)
> The spec pseudo-code doesn't contain all the required information to
> implement any of the div or mod operations since only the prose states the
> conditions for overflow (which *DO* show up in the overflow flags when div*o
> is used).

answered this one.

> All divisions are truncating division, not floor division, so we should
> translate them to function calls to custom trunc_div and trunc_rem functions
> rather than python's // and % operators (since the mod* instructions
> implement the mathematical remainder of truncating division operation rather
> than the mathematical modulo operation).

urk. yuk.  that's going to be a bad hack (based on the name of the variables
involved - what is being assigned to).  it's doable - just not pretty.

> Code not tested:
> 
> def trunc_div(n, d):
> def trunc_rem(n, d):

ok i've put both of these into soc.decoder.helpers will commit shortly.

btw a trick, jacob: open a copy of decoder/isa/all.py in vim, run
"pywriter.py fixedarith" then *overwrite* all.py restoring the former
version.

this will save several minutes when re-running pywriter.py.  it would be
nice for this process to be more efficient (all.py not to be statically
written but to be dynamic instead).

Comment 17 Luke Kenneth Casson Leighton 2020-06-19 15:15:45 BST

(In reply to Luke Kenneth Casson Leighton from comment #16)
> (In reply to Jacob Lifshay from comment #13)
> > All divisions are truncating division, not floor division, so we should
> > translate them to function calls to custom trunc_div and trunc_rem functions
> > rather than python's // and % operators (since the mod* instructions
> > implement the mathematical remainder of truncating division operation rather
> > than the mathematical modulo operation).
> 
> urk. yuk.  that's going to be a bad hack (based on the name of the variables
> involved - what is being assigned to).  it's doable - just not pretty.

done:

[master 428ee14] add in really bad hack which calls trunc_div or trunc_mod https://bugs.libre-soc.org/show_bug.cgi?id=324#c16

i have one more thing left to do (one more bad hack) - identify where
"undefined" is used and copy only the bits required from it.

Comment 18 Jacob Lifshay 2020-06-19 15:23:10 BST

(In reply to Luke Kenneth Casson Leighton from comment #17)
> (In reply to Luke Kenneth Casson Leighton from comment #16)
> > (In reply to Jacob Lifshay from comment #13)
> > > All divisions are truncating division, not floor division, so we should
> > > translate them to function calls to custom trunc_div and trunc_rem functions
> > > rather than python's // and % operators (since the mod* instructions
> > > implement the mathematical remainder of truncating division operation rather
> > > than the mathematical modulo operation).
> > 
> > urk. yuk.  that's going to be a bad hack (based on the name of the variables
> > involved - what is being assigned to).  it's doable - just not pretty.
> 
> done:
> 
> [master 428ee14] add in really bad hack which calls trunc_div or trunc_mod
> https://bugs.libre-soc.org/show_bug.cgi?id=324#c16

I was thinking that the spec's / and % operators would always translate to trunc_div and trunc_rem unless they are the FP operators -- if all inputs are non-negative there is no difference between trunc_* and floor div/mod.

I don't think the hack with detecting if the variable name is `dividend` is necessary.

are there any other uses of the div and rem operators?

Comment 19 Luke Kenneth Casson Leighton 2020-06-19 15:24:53 BST

(In reply to Luke Kenneth Casson Leighton from comment #17)

> i have one more thing left to do (one more bad hack) - identify where
> "undefined" is used and copy only the bits required from it.

done.

this is a good sign:

    RT[32:64] = trunc_div(dividend, divisor)
  File "/home/lkcl/src/libresoc/soc/src/soc/decoder/helpers.py", line 20, in trunc_div
    abs_n = abs(n)
TypeError: bad operand type for abs(): 'SelectableInt'

and appears to be sorted by this:

diff --git a/src/soc/decoder/selectable_int.py b/src/soc/decoder/selectable_int.py
index 7264090..eebada6 100644
--- a/src/soc/decoder/selectable_int.py
+++ b/src/soc/decoder/selectable_int.py
@@ -181,6 +181,8 @@ class SelectableInt:
         return self._op(or_, b)
     def __xor__(self, b):
         return self._op(xor, b)
+    def __abs__(self):
+        return SelectableInt(0, self.bits) - self

Comment 20 Jacob Lifshay 2020-06-19 15:26:42 BST

(In reply to Jacob Lifshay from comment #18)
> (In reply to Luke Kenneth Casson Leighton from comment #17)
> > done:
> > 
> > [master 428ee14] add in really bad hack which calls trunc_div or trunc_mod
> > https://bugs.libre-soc.org/show_bug.cgi?id=324#c16
> 
> I was thinking that the spec's / and % operators would always translate to
> trunc_div and trunc_rem unless they are the FP operators -- if all inputs
> are non-negative there is no difference between trunc_* and floor div/mod.
> 
> I don't think the hack with detecting if the variable name is `dividend` is
> necessary.
> 
> are there any other uses of the div and rem operators?

according to the spec, the integer div and rem operators are always truncating division:

÷ Division, with result truncated to integer
% Remainder of integer division

Comment 21 Jacob Lifshay 2020-06-19 15:29:55 BST

(In reply to Luke Kenneth Casson Leighton from comment #19)
> (In reply to Luke Kenneth Casson Leighton from comment #17)
> 
> > i have one more thing left to do (one more bad hack) - identify where
> > "undefined" is used and copy only the bits required from it.
> 
> done.
> 
> this is a good sign:
> 
>     RT[32:64] = trunc_div(dividend, divisor)
>   File "/home/lkcl/src/libresoc/soc/src/soc/decoder/helpers.py", line 20, in
> trunc_div
>     abs_n = abs(n)
> TypeError: bad operand type for abs(): 'SelectableInt'
> 
> and appears to be sorted by this:
> 
> diff --git a/src/soc/decoder/selectable_int.py
> b/src/soc/decoder/selectable_int.py
> index 7264090..eebada6 100644
> --- a/src/soc/decoder/selectable_int.py
> +++ b/src/soc/decoder/selectable_int.py
> @@ -181,6 +181,8 @@ class SelectableInt:
>          return self._op(or_, b)
>      def __xor__(self, b):
>          return self._op(xor, b)
> +    def __abs__(self):
> +        return SelectableInt(0, self.bits) - self

I was thinking SelectableInt and other custom classes would implement the trunc_div and rtrunc_div member functions, which is the trunc_div version of __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem.

Comment 22 Luke Kenneth Casson Leighton 2020-06-19 15:31:06 BST

(In reply to Jacob Lifshay from comment #18)
> I was thinking that the spec's / and % operators would always translate to
> trunc_div and trunc_rem unless they are the FP operators -- if all inputs
> are non-negative there is no difference between trunc_* and floor div/mod.
> 
> I don't think the hack with detecting if the variable name is `dividend` is
> necessary.

well... it's in now :)
 
> are there any other uses of the div and rem operators?

don't know: didn't want to take the risk - something to keep an eye on though.

$ python3 simulator/test_div_sim.py 

this now works.  i'm just adding a quick moduw test as well.  which also works.

remember to run pywriter.py after a git pull.

Comment 23 Luke Kenneth Casson Leighton 2020-06-19 15:35:34 BST

(In reply to Jacob Lifshay from comment #21)

> I was thinking SelectableInt and other custom classes would implement the
> trunc_div and rtrunc_div member functions, which is the trunc_div version of
> __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem.

the problem is that if we do that, SelectableInt behaves unpredictably.

SelectableInt is designed not to be a "POWER9 compliant integer arithmetic"
class, it's designed to be a direct one-to-one and onto map of the behaviour
of the python *int* class - just one that happens to have a bitlength associated
with it.

thus we genuinely expect SelectableInt's floordiv and truncdiv behaviour to
be exactly that of int, and consequently if in any code anywhere we need to
perform a div (or mod), we expect it to actually work.

here's the thing: although you may not have been aware of it, trunc_div and
trunc_rem are *behaving as expected* despite the fact that you wrote them
assuming that they would take python ints as arguments!

and that's *exactly* why SelectableInt.__floordiv__ and SelectableInt.__truncdiv__
should not implement POWER9 div behaviour :)

Comment 24 Jacob Lifshay 2020-06-19 15:51:13 BST

(In reply to Luke Kenneth Casson Leighton from comment #23)
> (In reply to Jacob Lifshay from comment #21)
> 
> > I was thinking SelectableInt and other custom classes would implement the
> > trunc_div and rtrunc_div member functions, which is the trunc_div version of
> > __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem.
> 
> the problem is that if we do that, SelectableInt behaves unpredictably.
> 
> SelectableInt is designed not to be a "POWER9 compliant integer arithmetic"
> class, it's designed to be a direct one-to-one and onto map of the behaviour
> of the python *int* class - just one that happens to have a bitlength
> associated
> with it.
> 
> thus we genuinely expect SelectableInt's floordiv and truncdiv behaviour to
> be exactly that of int, and consequently if in any code anywhere we need to
> perform a div (or mod), we expect it to actually work.

they behave the same as int even if the trunc_div and trunc_rem member functions are overridden with a custom implementation that calculates the same results.

The member functions are designed to work for classes like Signal or other special classes that are sufficiently unlike int that they needs to be manually implemented.

the trunc_div free function calls the member functions if they exist -- just like abs(a) calls a.__abs__()

there's no reason trunc_div should or would match __floor_div__ -- they're different operations.

I was thinking about implementing trunc_div as a monkey-patched member function of int, but was not sure that was a good idea or if int can even be monkey-patched.

> 
> here's the thing: although you may not have been aware of it, trunc_div and
> trunc_rem are *behaving as expected* despite the fact that you wrote them
> assuming that they would take python ints as arguments!

that's mostly expected.

> 
> and that's *exactly* why SelectableInt.__floordiv__ and
> SelectableInt.__truncdiv__
> should not implement POWER9 div behaviour :)

__floor_div__ is a flooring-division operator, of course it shouldn't be truncating division, that's what trunc_div is for :)

Comment 25 Luke Kenneth Casson Leighton 2020-06-19 16:04:06 BST

(In reply to Jacob Lifshay from comment #24)
> (In reply to Luke Kenneth Casson Leighton from comment #23)
> > (In reply to Jacob Lifshay from comment #21)
> > 
> > > I was thinking SelectableInt and other custom classes would implement the
> > > trunc_div and rtrunc_div member functions, which is the trunc_div version of
> > > __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem.

right. ok.  i am with you now.  this will have to happen later... but *not* as part *of* SelectableInt, but as a module (or class) that has the operations.

in this way we will implement SimpleV at that level.... *not* in SelectableInt itself.

basically we are missing a class named "Power9CompliantSelectableInt"

which later will become:
SimpleVAwarePower9IshSelectableInt

> > 
> > the problem is that if we do that, SelectableInt behaves unpredictably.
> > 
> > SelectableInt is designed not to be a "POWER9 compliant integer arithmetic"
> > class, it's designed to be a direct one-to-one and onto map of the behaviour
> > of the python *int* class - just one that happens to have a bitlength
> > associated
> > with it.
> > 
> > thus we genuinely expect SelectableInt's floordiv and truncdiv behaviour to
> > be exactly that of int, and consequently if in any code anywhere we need to
> > perform a div (or mod), we expect it to actually work.
> 
> they behave the same as int even if the trunc_div and trunc_rem member
> functions are overridden with a custom implementation that calculates the
> same results.

i'm not understanding you, sorry.  trunc_div and trunc_rem are standalone functions not members so cannot be overridden.

the parser now specifically identifies the div and mod circumstances and calls them directly, *bypassing* the % and / operator overloads entirely.

this is something we have to do anyway for all operators (in order to be able to do dynamic SIMD backend for SimpleV)

 
> The member functions are designed to work for classes like Signal or other
> special classes that are sufficiently unlike int that they needs to be
> manually implemented.
> 
> the trunc_div free function calls the member functions if they exist -- just
> like abs(a) calls a.__abs__()
> 
> there's no reason trunc_div should or would match __floor_div__ -- they're
> different operations.

yes sorry i meant rem

> I was thinking about implementing trunc_div as a monkey-patched member
> function of int, but was not sure that was a good idea or if int can even be
> monkey-patched.

it can't, and changing int behaviour would cause havoc on a global scale.  we would need to create yet another int wrapper class and use it everywhere.

this would be terribly slow (and unnecessary)

> > 
> > here's the thing: although you may not have been aware of it, trunc_div and
> > trunc_rem are *behaving as expected* despite the fact that you wrote them
> > assuming that they would take python ints as arguments!
> 
> that's mostly expected.
> 
> > 
> > and that's *exactly* why SelectableInt.__floordiv__ and
> > SelectableInt.__truncdiv__
> > should not implement POWER9 div behaviour :)
> 
> __floor_div__ is a flooring-division operator, of course it shouldn't be
> truncating division, that's what trunc_div is for :)

sorry i meant rem.

Comment 26 Jacob Lifshay 2020-06-19 16:19:39 BST

(In reply to Luke Kenneth Casson Leighton from comment #25)
> (In reply to Jacob Lifshay from comment #24)
> > 
> > they behave the same as int even if the trunc_div and trunc_rem member
> > functions are overridden with a custom implementation that calculates the
> > same results.
> 
> i'm not understanding you, sorry.  trunc_div and trunc_rem are standalone
> functions not members so cannot be overridden.

The standalone functions tries to call the member functions if they exist, this is to emulate python's behavior where a - b calls a.__sub__(b) or b.__rsub__(a).

So, if I had a custom class:

class MyClass:
    def trunc_div(self, d):
        print("trunc_div:", self, "/", d)
        return 0
    def rtrunc_div(self, n):
        print("rtrunc_div:", n, "/", self)
        return 0
    def __repr__(self):
        return "MyClass"

trunc_div(MyClass(), 123)
trunc_div(MyClass(), MyClass())
trunc_div(456, MyClass())

outputs:
trunc_div: MyClass / 123
trunc_div: MyClass / MyClass
rtrunc_div: 456 / MyClass

Comment 27 Luke Kenneth Casson Leighton 2020-06-19 17:03:29 BST

(In reply to Jacob Lifshay from comment #26)

> The standalone functions tries to call the member functions if they exist,
> this is to emulate python's behavior where a - b calls a.__sub__(b) or
> b.__rsub__(a).
> 
> So, if I had a custom class:
> 
> class MyClass:
>     def trunc_div(self, d):
>         print("trunc_div:", self, "/", d)
>         return 0
>     def rtrunc_div(self, n):
>         print("rtrunc_div:", n, "/", self)
>         return 0
>     def __repr__(self):
>         return "MyClass"
> 
> trunc_div(MyClass(), 123)
> trunc_div(MyClass(), MyClass())
> trunc_div(456, MyClass())
> 
> outputs:
> trunc_div: MyClass / 123
> trunc_div: MyClass / MyClass
> rtrunc_div: 456 / MyClass

only the functions in the operator module work this way.  the above is not
how python works.  or, if it is, it's literally the first time i've heard
about it in 20 years!

>>> trunc_div(MyClass(), 123)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'trunc_div' is not defined

yep, it's not how python works.  you have to call MyClass().trunc_div(123)
or define a trunc_div (global, non-class) function:

def truncdiv(a, b):
    if isinstance(a, MyClass):
         return a.truncdiv(b)
    if isinstance(b, MyClass):
         return b.reverse_truncdiv(a)
    raise Exception()


operator.truncdiv on the other hand is *specifically* written to do what you
are expecting (except calling __truncdiv__, where operator.add calls __add__,
etc. etc. for each function operator.{xxx} it calls __xxx__ )

so the function operator.add goes something like:

def add(a, b):
   if hasattr(a, "__add__"):
       return a.__add__(b)
   else:
       return b.__radd__(a)

or close to it.  it *might* do some checking to see which is the largest
(most accurate) class.  in order this would be something like int, float, complex
where float is capable of representing all ints, and complex is capable of
representing all floats.

therefore, we are *bypassing* that with the functions that are now in
soc/decoder/helpers.py

really we should not in any way be using the operator overloads  __add__ etc.
at all at the POWER9 level.

at the POWER9 level they should all be done in the format MyClass and then
have the parser.py generate AST which calls those functions directly as
2-argument functions.

you can see this is done here for "!=" already, in parser.py:

def make_ne_compare(arg):
    (left, right) = arg
    return ast.Call(ast.Name("ne", ast.Load()), (left, right), [])


this will create python code "ne(RA, 0)" when it sees "RA != 0" in the
pseudocode.

*every* operator needs to be like this.

if you want to make it "clearer", it should be called "POWER9_ne",
and "POWER9_trunc_div" etc. however the reason we've not done this is
because it makes the auto-generated code pretty much unreadable :)

the places to be compliant with POWER9 are in those functions "POWER9_ne".
*not* in SelectableInt.  SelectableInt is very specifically designed to
be that "int override" class that you mentioned (the monkey-patch one).
*not* as a POWER9-compatible class.

yes we need that POWER9-compatible class, but not right now.  helpers.ne,
helpers.trunc_div, helpers.trunc_rem "do that job" and the plan is to
move them into a MyClass-style-formatted object... but not right now.

when we add SimpleV it will be absolutely essential to add that MyClass-like
object, because that MyClass-like object will be told what the SIMD partitioning
currently is, what VL currently is, and it will also have to be passed the
full Register File.

but... not right now.

Comment 28 Jacob Lifshay 2020-06-19 21:08:24 BST

(In reply to Luke Kenneth Casson Leighton from comment #27)
> (In reply to Jacob Lifshay from comment #26)
> 
> > The standalone functions tries to call the member functions if they exist,
> > this is to emulate python's behavior where a - b calls a.__sub__(b) or
> > b.__rsub__(a).
> > 
> > So, if I had a custom class:
> > 
> > class MyClass:
> >     def trunc_div(self, d):
> >         print("trunc_div:", self, "/", d)
> >         return 0
> >     def rtrunc_div(self, n):
> >         print("rtrunc_div:", n, "/", self)
> >         return 0
> >     def __repr__(self):
> >         return "MyClass"
> > 
> > trunc_div(MyClass(), 123)
> > trunc_div(MyClass(), MyClass())
> > trunc_div(456, MyClass())
> > 
> > outputs:
> > trunc_div: MyClass / 123
> > trunc_div: MyClass / MyClass
> > rtrunc_div: 456 / MyClass
> 
> only the functions in the operator module work this way.  the above is not
> how python works.  or, if it is, it's literally the first time i've heard
> about it in 20 years!

no, what I posted is exactly how python works, assuming the trunc_div function which you copied to soc.git earlier today is in scope (apparently that wasn't obvious):
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/helpers.py;h=ef6610e5c89d7d0e0316c216bf51faf8486f46df;hb=HEAD#l13


result from python repl:

...
>>> trunc_div(MyClass(), 123)
trunc_div: MyClass / 123
0
>>> trunc_div(MyClass(), MyClass())
trunc_div: MyClass / MyClass
0
>>> trunc_div(456, MyClass())
rtrunc_div: 456 / MyClass
0
>>>

Comment 29 Jacob Lifshay 2020-06-19 21:22:14 BST

(In reply to Luke Kenneth Casson Leighton from comment #27)
> if you want to make it "clearer", it should be called "POWER9_ne",
> and "POWER9_trunc_div" etc. however the reason we've not done this is
> because it makes the auto-generated code pretty much unreadable :)

That trunc_div function isn't intended to be Power ISA compatible (or POWER9, which is not even what we're implementing -- we're implementing Libre-SOC's first core -- not POWER9, they are different cpus -- please avoid calling the Power ISA "POWER9", they are different things. That's like if AMD called their hypothetical new processor core "Sandy Bridge" or "Pentium", which are widely associated with Intel).

trunc_div is intended to be the equivalent of python's // operator, except that it rounds differently -- the same way Rust, C, C++, Java, x86, and Power do.

Similarly with trunc_rem.

Comment 30 Luke Kenneth Casson Leighton 2020-06-19 21:22:46 BST

(In reply to Jacob Lifshay from comment #28)
> > only the functions in the operator module work this way.  the above is not
> > how python works.  or, if it is, it's literally the first time i've heard
> > about it in 20 years!
> 
> no, what I posted is exactly how python works, assuming the trunc_div
> function which you copied to soc.git earlier today is in scope (apparently
> that wasn't obvious):

not at all :)  i simply committed it and moved rapidly on with the focus on the
parser and pywriter :)

thank you for pointing it out: i'm with you now.

> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/helpers.py;
> h=ef6610e5c89d7d0e0316c216bf51faf8486f46df;hb=HEAD#l13
> 
> 
> result from python repl:
> 
> ...
> >>> trunc_div(MyClass(), 123)
> trunc_div: MyClass / 123
> 0
> >>> trunc_div(MyClass(), MyClass())
> trunc_div: MyClass / MyClass
> 0
> >>> trunc_div(456, MyClass())
> rtrunc_div: 456 / MyClass
> 0
> >>>

ok.  right.  so this would be merging some of the functionality and responsibility
of the (future-needed) class i mentioned (something like Power9SimpleVSelectableInt)
with something that's inspired by how operator.{xxx} work.

at the moment the functions are called from parser.py, they're assumed to be
SelectableInts(), those SelectableInts are assumed to function as integers
not as Power9SimpleVCompatibleSelectableInt.

we do very much need to move this on very quickly.  can we leave the discussion
of future work related to SimpleV for another time?

unless you can see a way in which not having a Power9CompatibleSelectableInt
class would be a major blocker to achieving the (extremely close) Oct 2020
tapeout deadline?

for example, overflow exceptions.  actually, even there, i see no reason why we should
need a special Power9CompatibleSelectableInt for that: it should be possible to simply 
catch the exception and drop a flag into the SelectableInt.

decoder/isa/caller.py could watch for that and set the overflow bits in XER if
it was required to do so.

Comment 31 Luke Kenneth Casson Leighton 2020-06-19 21:33:14 BST

(In reply to Jacob Lifshay from comment #29)

> That trunc_div function isn't intended to be Power ISA compatible 

ah then that will be the source of confusion: i was expecting it to
provide the functionality of... of... something-that-i-don't-have-a-name-for...

> (or
> POWER9, which is not even what we're implementing -- we're implementing
> Libre-SOC's first core -- not POWER9, they are different cpus -- please
> avoid calling the Power ISA "POWER9", they are different things. That's like
> if AMD called their hypothetical new processor core "Sandy Bridge" or
> "Pentium", which are widely associated with Intel).

urrrr... :)  ok then we need to know the exact name.  POWER ISA V3.0B is a
bit of a handful.  POWER 3.0B?  we've been asked by OpenPOWER Foundation to
be compatible with that.  short-hand i was referring to it as POWER9.

> trunc_div is intended to be the equivalent of python's // operator, except
> that it rounds differently -- the same way Rust, C, C++, Java, x86, and
> Power do.
> 
> Similarly with trunc_rem.

ok right: then it's... (POWER 3.0B?) compatible, would that be ok to call it?

if so, it would be part of the (using the appropriate name, now?)
Power30BCompatibleSelectableInt class.

however, right now, we're going for a collection of *functions* that are
Power 3.0B compatible, and those functions *use* SelectableInt, whose behavior
is expected to be directly equal to python int class.

when those functions require state (which they don't right now), it becomes
not just appropriate but necessary to move them to a class.

also if we move _those_ functions to a class, we have to also add (move)
helpers.ne, eq, lt, gt, ge and then *also* add sub, add, mul, everything
and then do a bit of a rewrite on parser.py to put it into use.

and that's a lot of work which is quite a major distraction.

i'm quite happy for that work to be done *if* you can see a good reason as to
why not having it is a major blocker to the October 2020 tapeout?

Comment 32 Jacob Lifshay 2020-06-19 21:55:47 BST

(In reply to Luke Kenneth Casson Leighton from comment #31)
> (In reply to Jacob Lifshay from comment #29)
> 
> > That trunc_div function isn't intended to be Power ISA compatible 
> 
> ah then that will be the source of confusion: i was expecting it to
> provide the functionality of... of...
> something-that-i-don't-have-a-name-for...
> 
> > (or
> > POWER9, which is not even what we're implementing -- we're implementing
> > Libre-SOC's first core -- not POWER9, they are different cpus -- please
> > avoid calling the Power ISA "POWER9", they are different things. That's like
> > if AMD called their hypothetical new processor core "Sandy Bridge" or
> > "Pentium", which are widely associated with Intel).
> 
> urrrr... :)  ok then we need to know the exact name.  POWER ISA V3.0B is a
> bit of a handful.  POWER 3.0B?  we've been asked by OpenPOWER Foundation to
> be compatible with that.  short-hand i was referring to it as POWER9.

If I were to pick a replacement name for POWER9 when you meant the Power ISA, I'd probably just pick Power, since are docs clearly state (or should state) that we're compatible with Power ISA v3.0B (or v3.1B). We can assume someone reading our code knows that since that's what we advertise (or at least knows that they should figure out which version we implement).
 
> > trunc_div is intended to be the equivalent of python's // operator, except
> > that it rounds differently -- the same way Rust, C, C++, Java, x86, and
> > Power do.
> > 
> > Similarly with trunc_rem.
> 
> ok right: then it's... (POWER 3.0B?) compatible, would that be ok to call it?

I was thinking we'd put it in nmutil, since it's useful for way more than the instruction simulator. As an example, I wrote a very similar function as part of ieee754.div_rem_sqrt_rsqrt.algorithm.

if you want something strictly compatible with IBM's POWER9 cpu, then translate (or create python bindings to) the code in:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/src/instr_models.rs

Comment 33 Luke Kenneth Casson Leighton 2020-06-19 22:29:55 BST

(In reply to Jacob Lifshay from comment #32)

> If I were to pick a replacement name for POWER9 when you meant the Power
> ISA, I'd probably just pick Power,

unfortunately, Power is a general english word.

> since are docs clearly state (or should
> state) that we're compatible with Power ISA v3.0B (or v3.1B). We can assume
> someone reading our code knows that since that's what we advertise (or at
> least knows that they should figure out which version we implement).

heck, *i* can't :)  or i do but not how to refer to it.  "Power" is far too
general.  i have conversations with ordinary people, i have to refer them to
PowerPC and mention "G4 Mac" and their eyes un-glaze at that point :)


> > > trunc_div is intended to be the equivalent of python's // operator, except
> > > that it rounds differently -- the same way Rust, C, C++, Java, x86, and
> > > Power do.
> > > 
> > > Similarly with trunc_rem.
> > 
> > ok right: then it's... (POWER 3.0B?) compatible, would that be ok to call it?
> 
> I was thinking we'd put it in nmutil, since it's useful for way more than
> the instruction simulator.

done.  nmutil.divmod.py - feel free to move it you think of a better name,
remember to update helpers.py though

> As an example, I wrote a very similar function as
> part of ieee754.div_rem_sqrt_rsqrt.algorithm.

ahhhh okaaay

> if you want something strictly compatible with IBM's POWER9 cpu, then
> translate (or create python bindings to) the code in:
> https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/
> src/instr_models.rs

right.  ok.  so that is tending to suggest that... in decode/helpers.py,
we *do* need a function called power_trunc_div which does the detection
of overflow.

which is slightly different from the other cases (other arithmetic cases)
in that you can't really post-analyse the *result*.

ok i'll take care of that if you can concentrate on the hardware side.
i may need help with the unit tests in test_div_sim.py
(if you know what do to, there, do just throw some in and i will chew
through them making sure that the simulator code is "correct" against them)

somewhere in the middle of the hardware pipeline you'll need to generate
that overflow condition and pass it through.  whether that's a special
flag or whether you actually use the xer_ov/32 data structure is up to you,
but it has to come _out_ as xer_ov/32 and xer_ca/32 when it goes into
ALUOutputStage.

        self.dive_abs_overflow_32 = Signal(reset_less=True)
        self.dive_abs_overflow_64 = Signal(reset_less=True)

ahhh excellent, you're on the ball :)

Comment 34 Luke Kenneth Casson Leighton 2020-06-19 22:30:27 BST

fyi

lkcl@fizzy:~/src/libresoc/soc/src/soc$ git commit -a -m 'whitespace update'
[master 1cd6c71] whitespace update

Comment 35 Jacob Lifshay 2020-06-19 22:54:28 BST

(In reply to Luke Kenneth Casson Leighton from comment #33)
> (In reply to Jacob Lifshay from comment #32)
> 
> > If I were to pick a replacement name for POWER9 when you meant the Power
> > ISA, I'd probably just pick Power,
> 
> unfortunately, Power is a general english word.
> 
> > since are docs clearly state (or should
> > state) that we're compatible with Power ISA v3.0B (or v3.1B). We can assume
> > someone reading our code knows that since that's what we advertise (or at
> > least knows that they should figure out which version we implement).
> 
> heck, *i* can't :)  or i do but not how to refer to it.  "Power" is far too
> general.  i have conversations with ordinary people, i have to refer them to
> PowerPC and mention "G4 Mac" and their eyes un-glaze at that point :)

Use PowerISA? PowerPC seems overly specific to old versions of the ISA, though that *is* what gcc, llvm, and rust uses (powerpc64). Maybe PPC64?
 
> somewhere in the middle of the hardware pipeline you'll need to generate
> that overflow condition and pass it through.  whether that's a special
> flag or whether you actually use the xer_ov/32 data structure is up to you,
> but it has to come _out_ as xer_ov/32 and xer_ca/32 when it goes into
> ALUOutputStage.
> 
>         self.dive_abs_overflow_32 = Signal(reset_less=True)
>         self.dive_abs_overflow_64 = Signal(reset_less=True)

see DivOutputStage for the rest of the overflow calculation:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/div/output_stage.py;h=88a9ec1eec4bbc4ebb3fece42bbe235a13ae57ff;hb=HEAD#l65

> 
> ahhh excellent, you're on the ball :)

yup, only missing unit tests and formal proofs.

Comment 36 Luke Kenneth Casson Leighton 2020-06-19 23:10:06 BST

(In reply to Jacob Lifshay from comment #35)

> Use PowerISA? 

i'm happy with that.

> PowerPC seems overly specific to old versions of the ISA,
> though that *is* what gcc, llvm, and rust uses (powerpc64). Maybe PPC64?

sounds like it excludes 32bit.
 

> > 
> >         self.dive_abs_overflow_32 = Signal(reset_less=True)
> >         self.dive_abs_overflow_64 = Signal(reset_less=True)
> 
> see DivOutputStage for the rest of the overflow calculation:
> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/div/output_stage.py;
> h=88a9ec1eec4bbc4ebb3fece42bbe235a13ae57ff;hb=HEAD#l65
> 
> > 
> > ahhh excellent, you're on the ball :)
> 
> yup, only missing unit tests and formal proofs.

ok you should be able to just import the class in test_div_sim.py and use it in fu/div/test/test_pipe_caller.py

can i leave that with you as it's 11pm here?

Comment 37 Luke Kenneth Casson Leighton 2020-06-29 16:25:35 BST

i have fu/div/test/test_pipe_caller.py running, i have no idea if it is correct or not.

jacob can you please take a look?

Comment 38 Jacob Lifshay 2020-06-29 21:20:41 BST

argh, Chrome deleted all the text I was typing.

Anyway, it looks like a good start, though I don't know for sure as I'm not familiar with the testing helper functions.

It needs some additional testing:
all the other div/mod instructions and their variants:
divdeo
divdeuo
divdo
divduo
divweo
divweuo
divwo
divwuo
modsd
modud
modsw
moduw

It needs some non-random test cases to hit the edge cases:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/13dae100c6bc5685059195010ceb46ae68b9f306/src/main.rs#L210

soc.fu.div.setup_stage.DivSetupStage.elaborate was incorrectly changed:
https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=8cc95f83fe437f4ac1eb5ebe57265bed30c0078c
...
-        # pass through op
+        # pass through core data
 
-        comb += self.o.op.eq(op)
+        comb += self.o.core.eq(core_o)

That should be op since op is used in the last div stage. self.o.core.eq(core_o) is the same as self.o.core.eq(self.o.core), so is useless.

Comment 39 Luke Kenneth Casson Leighton 2020-06-29 21:28:21 BST

(In reply to Jacob Lifshay from comment #38)
> argh, Chrome deleted all the text I was typing.
> 
> Anyway, it looks like a good start, though I don't know for sure as I'm not
> familiar with the testing helper functions.

it's very simple, just cookie-cut test_rand_div and ignore everything else.

> It needs some additional testing:
> all the other div/mod instructions and their variants:
> It needs some non-random test cases to hit the edge cases:
> https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/
> 13dae100c6bc5685059195010ceb46ae68b9f306/src/main.rs#L210

can you add these? there is so much to do and i am pretty much doing
everything right now.
 
> soc.fu.div.setup_stage.DivSetupStage.elaborate was incorrectly changed:
> https://git.libre-soc.org/?p=soc.git;a=commitdiff;
> h=8cc95f83fe437f4ac1eb5ebe57265bed30c0078c
> ...
> -        # pass through op
> +        # pass through core data
>  
> -        comb += self.o.op.eq(op)
> +        comb += self.o.core.eq(core_o)
> 
> That should be op since op is used in the last div stage.
> self.o.core.eq(core_o) is the same as self.o.core.eq(self.o.core), so is
> useless.

self.o.op is not defined, which is why i tried to find something to set it to.
if you can run the unit test "python3 fu/div/test/test_pipe_caller.py" and
investigate that would be really helpful.

Comment 40 Luke Kenneth Casson Leighton 2020-06-30 11:57:51 BST

(In reply to Luke Kenneth Casson Leighton from comment #39)

> self.o.op is not defined, which is why i tried to find something to set it
> to.
> if you can run the unit test "python3 fu/div/test/test_pipe_caller.py" and
> investigate that would be really helpful.

i went over everything that gets set up as part of self.o.core and
it looks like all parts of DivPipeCoreInputData are established:

class DivPipeCoreInputData:

        self.dividend = Signal(bw + fw, reset_less=reset_less)
        self.divisor_radicand = Signal(bw, reset_less=reset_less)
        self.operation = DP.create_signal(reset_less=reset_less)

operation:

        comb += core_o.operation.eq(int(DivPipeCoreOperation.UDivRem))

dividend and divisor:

        dividend_o = core_o.dividend
        divisor_o = core_o.divisor_radicand

these also get set up.  so i believe the self.o.op.eq() - presumably
intended to be self.o.core.eq() - is redundant.

Comment 41 Luke Kenneth Casson Leighton 2020-06-30 13:05:34 BST

btw jacob i noticed that on qemu, divw sets RT=RA when RB=0.

can you check that behaviour on POWER9?

Comment 42 Luke Kenneth Casson Leighton 2020-07-02 20:21:02 BST

btw several combinatorial stages chained together is absolutely fine
as long as it's not too many.  i'm going to try 8 (it's a 180nm ASIC)

Comment 43 Jacob Lifshay 2020-07-02 20:36:43 BST

(In reply to Luke Kenneth Casson Leighton from comment #40)
> (In reply to Luke Kenneth Casson Leighton from comment #39)
> 
> > self.o.op is not defined, which is why i tried to find something to set it
> > to.
> > if you can run the unit test "python3 fu/div/test/test_pipe_caller.py" and
> > investigate that would be really helpful.
> 
> i went over everything that gets set up as part of self.o.core and
> it looks like all parts of DivPipeCoreInputData are established:
> <snip>
> these also get set up.  so i believe the self.o.op.eq() - presumably
> intended to be self.o.core.eq() - is redundant.

No, self.o.op.eq() was intended to copy the PowerISA opcode since that's needed after DivPipeCore finishes its work. Notice that CoreInputData derives from CoreBaseData, which derives from DivInputData, which should contain a field for the PowerISA opcode, which is what I assumed self.o.op is.

Comment 44 Jacob Lifshay 2020-07-02 20:43:30 BST

(In reply to Luke Kenneth Casson Leighton from comment #41)
> btw jacob i noticed that on qemu, divw sets RT=RA when RB=0.
> 
> can you check that behaviour on POWER9?

IIRC, on POWER9, division by zero results in RT=0 for all div*/mod* instructions. I specifically included that as a test case in power-instruction-analyzer since that's a case where the result is undefined according to the PowerISA spec. For divwo, see:
https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153

Comment 45 Jacob Lifshay 2020-07-02 20:46:34 BST

(In reply to Luke Kenneth Casson Leighton from comment #42)
> btw several combinatorial stages chained together is absolutely fine
> as long as it's not too many.  i'm going to try 8 (it's a 180nm ASIC)

Seems excessive since then there would be a gate depth of on the order of 100 per pipeline stage, I seriously doubt we can get any decent clock speed from something *that* deep.

Comment 46 Luke Kenneth Casson Leighton 2020-07-02 20:49:54 BST

On Thu, Jul 2, 2020 at 8:36 PM bugzilla-daemon--- via libre-soc-bugs <libre-soc-bugs@lists.libre-riscv.org> wrote:

> No, self.o.op.eq() was intended to copy the PowerISA opcode since that's needed
> after DivPipeCore finishes its work.

i said already: it doesn't exist.  self.o.op does not exist.  if you ran the unit test as i suggested, you would have found this out.

> Notice that CoreInputData derives from
> CoreBaseData, which derives from DivInputData,

which derives from IntegerData, which contains "ctx" (pipeline context).
ctx contains "op", which is where the operation information comes from

> which should contain a field for
> the PowerISA opcode, which is what I assumed self.o.op is.

self.i.ctx (and self.o.ctx) contains the internal op from PowerDecode2, actually a subset (CompTrapOpSubset).

therefore, this gets the InternalOp (a member of CompTrapOpSubset):

        # convenience variables
        op, a, b = self.i.ctx.op, self.i.a, self.i.b

and therefore this copies the operation subset information needed
down the pipeline chain:

        ###### sticky overflow and context, both pass-through #####

        comb += self.o.xer_so.eq(self.i.xer_so)
-->     comb += self.o.ctx.eq(self.i.ctx)        <----

Comment 47 Jacob Lifshay 2020-07-02 20:54:16 BST

(In reply to Luke Kenneth Casson Leighton from comment #46)
> On Thu, Jul 2, 2020 at 8:36 PM bugzilla-daemon--- via libre-soc-bugs
> <libre-soc-bugs@lists.libre-riscv.org> wrote:
> 
> > No, self.o.op.eq() was intended to copy the PowerISA opcode since that's needed
> > after DivPipeCore finishes its work.
> 
> i said already: it doesn't exist.  self.o.op does not exist. 

yes, I know. I was explaining why I wrote the code the way it was.

Thank you for fixing it. :)

Comment 48 Luke Kenneth Casson Leighton 2020-07-02 20:55:39 BST

(In reply to Jacob Lifshay from comment #44)
> (In reply to Luke Kenneth Casson Leighton from comment #41)
> > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0.
> > 
> > can you check that behaviour on POWER9?
> 
> IIRC, on POWER9, division by zero results in RT=0 for all div*/mod*
> instructions. I specifically included that as a test case in
> power-instruction-analyzer since that's a case where the result is undefined
> according to the PowerISA spec. For divwo, see:
> https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/
> 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153

qemu definitely sets RT=RA for divw when RB=0.  note: not divwo - *divw*.

which is why i asked if you could check.  hugh and paul mackerras mentioned
that this *might* have been old behaviour on e.g. POWER7 or POWER8.

if you run the unit test simulator/test_div_sim.py (and edit it
accordingly) you will be able to see for yourself.

======================================================================
FAIL: test_0_divw (__main__.DivZeroDecoderTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "simulator/test_div_sim.py", line 80, in test_0_divw
    self.run_tst_program(program, [1, 2, 3])
  File "/home/lkcl/src/libresoc/soc/src/soc/simulator/test_sim.py", line 252, in run_tst_program
    self.qemu_register_compare(simulator, q, reglist)
  File "/home/lkcl/src/libresoc/soc/src/soc/simulator/test_sim.py", line 290, in qemu_register_compare
    self.assertEqual(qemu_val, sim_val)
AssertionError: 22136 != 0

Comment 49 Luke Kenneth Casson Leighton 2020-07-02 20:56:01 BST

(In reply to Jacob Lifshay from comment #47)

> yes, I know. I was explaining why I wrote the code the way it was.

ahh ok :)

Comment 50 Luke Kenneth Casson Leighton 2020-07-02 21:01:03 BST

(In reply to Jacob Lifshay from comment #45)
> (In reply to Luke Kenneth Casson Leighton from comment #42)
> > btw several combinatorial stages chained together is absolutely fine
> > as long as it's not too many.  i'm going to try 8 (it's a 180nm ASIC)
> 
> Seems excessive since then there would be a gate depth of on the order of
> 100 per pipeline stage, I seriously doubt we can get any decent clock speed
> from something *that* deep.

yehh i don't honestly know, should be running ltp to find out
(which is currently having a hard time with the number of register
"loops" detected...)

Comment 51 Jacob Lifshay 2020-07-02 21:01:45 BST

(In reply to Luke Kenneth Casson Leighton from comment #48)
> (In reply to Jacob Lifshay from comment #44)
> > (In reply to Luke Kenneth Casson Leighton from comment #41)
> > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0.
> > > 
> > > can you check that behaviour on POWER9?
> > 
> > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod*
> > instructions. I specifically included that as a test case in
> > power-instruction-analyzer since that's a case where the result is undefined
> > according to the PowerISA spec. For divwo, see:
> > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/
> > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153
> 
> qemu definitely sets RT=RA for divw when RB=0.  note: not divwo - *divw*.
> 
> which is why i asked if you could check.  hugh and paul mackerras mentioned
> that this *might* have been old behaviour on e.g. POWER7 or POWER8.

Ok, I'll check once I write the code. I was assuming that the div*o instructions produced the same results as all other div* variants. I'll also check "div*." and "div*o." variants (which I find their naming to be annoying due to confusion with the end of a sentence).

Comment 52 Luke Kenneth Casson Leighton 2020-07-02 21:13:39 BST

(In reply to Jacob Lifshay from comment #51)

> Ok, I'll check once I write the code. I was assuming that the div*o
> instructions produced the same results as all other div* variants. I'll also
> check "div*." and "div*o." variants (which I find their naming to be
> annoying due to confusion with the end of a sentence).

sigh i know.  and can't be a python function name.  we converted to underscore

Comment 53 Jacob Lifshay 2020-07-09 04:52:08 BST

(In reply to Luke Kenneth Casson Leighton from comment #48)
> (In reply to Jacob Lifshay from comment #44)
> > (In reply to Luke Kenneth Casson Leighton from comment #41)
> > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0.
> > > 
> > > can you check that behaviour on POWER9?
> > 
> > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod*
> > instructions. I specifically included that as a test case in
> > power-instruction-analyzer since that's a case where the result is undefined
> > according to the PowerISA spec. For divwo, see:
> > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/
> > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153
> 
> qemu definitely sets RT=RA for divw when RB=0.  note: not divwo - *divw*.
> 
> which is why i asked if you could check.  hugh and paul mackerras mentioned
> that this *might* have been old behaviour on e.g. POWER7 or POWER8.

I just updated power-instruction-analyzer to include all div*[o][.] variants, they all match the original instr_models.rs -- all of `divw`, `divwo`, `divw.`, and `divwo.` all provide the exact same results when tested on POWER9.

So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior (which is to be expected, since that part is undefined).

I also added all instructions to the Python API, so you can just call the functions in instr_model.rs:
(requires a recent version of Rust -- a nightly version is no longer required since PyO3 v0.11)

$ git clone https://salsa.debian.org/Kazan-team/power-instruction-analyzer.git
$ source <path-to-venv>/bin/activate
$ cd power-instruction-analyzer
$ pip install .
$ python
>>> import power_instruction_analyzer as pia
>>> inputs = pia.InstructionInput(ra=0x1234, rb=0, rc=0)
>>> print(pia.divw(inputs))
>>> print(pia.divwo(inputs))
>>> print(pia.divw_(inputs))
>>> print(pia.divwo_(inputs))

it will output (last output line split over two lines:
{"rt":"0x0"}
{"rt":"0x0","so":true,"ov":true,"ov32":true}
{"rt":"0x0","cr0":{"lt":false,"gt":false,"eq":true,"so":false}}
{"rt":"0x0","so":true,"ov":true,"ov32":true,
"cr0":{"lt":false,"gt":false,"eq":true,"so":true}}

Comment 54 Jacob Lifshay 2020-07-09 05:29:13 BST

(In reply to Jacob Lifshay from comment #53)
> So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior
> (which is to be expected, since that part is undefined).

Turns out I forgot to enable running using native assembly instructions when testing on the POWER9 server --- oops

Comment 55 Jacob Lifshay 2020-07-09 06:24:58 BST

(In reply to Jacob Lifshay from comment #54)
> (In reply to Jacob Lifshay from comment #53)
> > So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior
> > (which is to be expected, since that part is undefined).
> 
> Turns out I forgot to enable running using native assembly instructions when
> testing on the POWER9 server --- oops

Got that fixed. it still doesn't match qemu.

I also added the mul* and madd* instructions.

The instruction models now match POWER9 for all tested cases for all of:

divde
divdeo
divde.
divdeo.
divdeu
divdeuo
divdeu.
divdeuo.
divd
divdo
divd.
divdo.
divdu
divduo
divdu.
divduo.
divwe
divweo
divwe.
divweo.
divweu
divweuo
divweu.
divweuo.
divw
divwo
divw.
divwo.
divwu
divwuo
divwu.
divwuo.
modsd
modud
modsw
moduw
mullw
mullwo
mullw.
mullwo.
mulhw
mulhw.
mulhwu
mulhwu.
mulld
mulldo
mulld.
mulldo.
mulhd
mulhd.
mulhdu
mulhdu.
maddhd
maddhdu
maddld

Comment 56 Luke Kenneth Casson Leighton 2020-07-09 09:30:18 BST

(In reply to Jacob Lifshay from comment #53)
> (In reply to Luke Kenneth Casson Leighton from comment #48)
> > (In reply to Jacob Lifshay from comment #44)
> > > (In reply to Luke Kenneth Casson Leighton from comment #41)
> > > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0.
> > > > 
> > > > can you check that behaviour on POWER9?
> > > 
> > > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod*
> > > instructions. I specifically included that as a test case in
> > > power-instruction-analyzer since that's a case where the result is undefined
> > > according to the PowerISA spec. For divwo, see:
> > > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/
> > > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153
> > 
> > qemu definitely sets RT=RA for divw when RB=0.  note: not divwo - *divw*.
> > 
> > which is why i asked if you could check.  hugh and paul mackerras mentioned
> > that this *might* have been old behaviour on e.g. POWER7 or POWER8.
> 
> I just updated power-instruction-analyzer to include all div*[o][.]
> variants, they all match the original instr_models.rs -- all of `divw`,
> `divwo`, `divw.`, and `divwo.` all provide the exact same results when
> tested on POWER9.

ah fantastic.  this is really helpful 

> So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior
> (which is to be expected, since that part is undefined).

still annoying though as it means manual checking
> it will output (last output line split over two lines:
> {"rt":"0x0"}
> {"rt":"0x0","so":true,"ov":true,"ov32":true}
> {"rt":"0x0","cr0":{"lt":false,"gt":false,"eq":true,"so":false}}
> {"rt":"0x0","so":true,"ov":true,"ov32":true,
> "cr0":{"lt":false,"gt":false,"eq":true,"so":true}}

ah this is _really_ useful as it helps check the output regspec allocations
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/pipe_data.py;hb=HEAD

  17     regspec = [('INT', 'o', '0:63'),
  18                ('CR', 'cr_a', '0:3'),
  19                ('XER', 'xer_ca', '34,45'), # bit0: ca, bit1: ca32
  20                ('XER', 'xer_ov', '33,44'), # bit0: ov, bit1: ov32
  21                ('XER', 'xer_so', '32')]

hmmm i'm not seeing CA or CA32 in the output list.  going over the
page https://libre-soc.org/openpower/isa/fixedarith/ which comes from
the spec, that would seem to be the case (CA/CA32 not involved),
would you agree?

Comment 57 Jacob Lifshay 2020-07-09 09:49:50 BST

(In reply to Luke Kenneth Casson Leighton from comment #56)
> (In reply to Jacob Lifshay from comment #53)
> > it will output (last output line split over two lines:
> > {"rt":"0x0"}
> > {"rt":"0x0","so":true,"ov":true,"ov32":true}
> > {"rt":"0x0","cr0":{"lt":false,"gt":false,"eq":true,"so":false}}
> > {"rt":"0x0","so":true,"ov":true,"ov32":true,
> > "cr0":{"lt":false,"gt":false,"eq":true,"so":true}}
> 
> ah this is _really_ useful as it helps check the output regspec allocations
> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/pipe_data.py;
> hb=HEAD

I manually entered the output regs for each instruction, so I'm much more likely to have accidentally missed an output than if you use the csv files we derived from a working core.

Comment 58 Luke Kenneth Casson Leighton 2020-07-09 10:34:41 BST

(In reply to Jacob Lifshay from comment #57)

> I manually entered the output regs for each instruction, 

ah no problem.  well it is eyes-on, anyway

> so I'm much more
> likely to have accidentally missed an output than if you use the csv files
> we derived from a working core.

the csv files came from microwatt decode1.vhdl - they don't quite contain
the level of detail we need in all cases.

commit 5a5fbe7f8de03420334049c1b305c3eff46adde7 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Thu Jul 9 10:32:57 2020 +0100

    create new DivMulOutputData which does not have CA/CA32

Comment 59 Luke Kenneth Casson Leighton 2020-07-09 13:28:26 BST

jacob: whoops :)

index 2dc3196..0f1a549 100644
--- a/src/nmutil/divmod.py
+++ b/src/nmutil/divmod.py
@@ -3,8 +3,11 @@
 def trunc_div(n, d):
     abs_n = abs(n)
     abs_d = abs(d)
-    abs_q = n // d
+    abs_q = abs_n // abs_d

Comment 60 Luke Kenneth Casson Leighton 2020-07-10 21:17:26 BST

commit 9cde3002d533352fdf8ed464fa46d4acd548b051 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Fri Jul 10 21:09:30 2020 +0100

    add a DIVS function as separate and discrete from floor_div
    likewise for MODS and MULS

this requires re-running pywriter.py - a small cheat is to open
all.py, save it, just run "pywriter.py fixedarith" then copy all.py
back again.  (decoder/isa/*.py)

Comment 61 Luke Kenneth Casson Leighton 2020-07-24 11:42:18 BST

sorry, jacob: i tried building pia (see comments in #433) and it was
impossible to follow.

i've therefore reverted the commit that adds it as a critical dependency
because it completely terminates all and any possibility of running
not just the test_pipe_caller.py unit test but everything else in upstream
testing as well (test_issuer.py etc).

i will try to reproduce the divdeu bug without pia.

also, two things:

* please don't add "typing" directly to .py files.  add them to a .pyi file
  instead.

* remember to keep lines to below 80 chars several additions were not, which
  made reviewing the diffs harder than they should be:

-            with self.subTest(check="pia", sim_o=sim_o, pia_result=str(pia_resu
lt)): <<<---

* remember to single-purpose commit.  modifications to ISACaller had
  been added (without an explanatory comment) that had nothing to do
  with the addition of pia itself.

Comment 62 Luke Kenneth Casson Leighton 2020-07-24 12:56:38 BST

(In reply to Luke Kenneth Casson Leighton from comment #61)
> sorry, jacob: i tried building pia (see comments in #433) and it was
> impossible to follow.
> 
> i've therefore reverted the commit that adds it as a critical dependency
> because it completely terminates all and any possibility of running
> not just the test_pipe_caller.py unit test but everything else in upstream
> testing as well (test_issuer.py etc).
> 
> i will try to reproduce the divdeu bug without pia.

still investigating.  i've moved the long tests to test_div_pipe_caller_long.py 

encountered errors in just the non-long ones.

Comment 63 Luke Kenneth Casson Leighton 2020-07-24 14:52:16 BST

oof.  bit of a marathon clean-up / code-morph session, got through
most of it now.

* sim tests seem to work great.
* divpipe tests seem to work great
* FSM not so much, yet.

re-adding use of pia has to be done with some considerable care: not
just ensuring that other developers can (easily) install pia, there is
more to it.

test_div_compunit.py - which doesn't exist and i need to write it - is
a "mirror" of test_div_pipe_caller.py.

basically it does exactly the same job except this time it has a
MultiCompUnit "Manager" on the front-end.

this is a critically important test that makes sure that register
allocations are correct for both read and write, and in particular
that the pipeline sets "self.o.{something}.ok" correctly.

if those are _not_ correct we are in serious trouble because the
hardware will sit there - permanently - waiting for a register
flag to be raised that is never going to happen.


the issue is: the functions for checking the test output are used
by both test_div_pipe_caller *and* the test_compunit.

they have to be developed *in conjunction*.

in addition to that: test_core.py then takes aaallll the test_compunit*
tests and re-runs them *again*, this time actually connecting to register
files.

by not adding support fully for pia into the underlying infrastructure,
critically important unit tests could no longer be run.

therefore, first things first:

1)  bug #436.  full documentation and wait for at least one other
    person to repro the build / install instructions for pia

2)  test_div_compunit needs to be added

3)  *then* pia can be added as a build dependency, making sure that
    it works with both test_div_pipe_caller.py *and* test_div_compunit.py

Comment 64 Cole Poirier 2020-07-24 18:34:29 BST

(In reply to Luke Kenneth Casson Leighton from comment #63)
[snip]
> re-adding use of pia has to be done with some considerable care: not
> just ensuring that other developers can (easily) install pia, there is
> more to it.
[snip]
> by not adding support fully for pia into the underlying infrastructure,
> critically important unit tests could no longer be run.
> 
> therefore, first things first:
> 
> 1)  bug #436.  full documentation and wait for at least one other
>     person to repro the build / install instructions for pia

I'll take a look at this as my project for today. Documenting soclayout can be done later.

Comment 65 Luke Kenneth Casson Leighton 2020-07-24 18:55:34 BST

(In reply to Cole Poirier from comment #64)

> > 1)  bug #436.  full documentation and wait for at least one other
> >     person to repro the build / install instructions for pia
> 
> I'll take a look at this as my project for today. 

great, because that's holding me up.

Comment 66 Cole Poirier 2020-07-24 21:59:36 BST

(In reply to Cole Poirier from comment #64)
> (In reply to Luke Kenneth Casson Leighton from comment #63)
> [snip]
> > re-adding use of pia has to be done with some considerable care: not
> > just ensuring that other developers can (easily) install pia, there is
> > more to it.
> [snip]
> > by not adding support fully for pia into the underlying infrastructure,
> > critically important unit tests could no longer be run.
> > 
> > therefore, first things first:
> > 
> > 1)  bug #436.  full documentation and wait for at least one other
> >     person to repro the build / install instructions for pia
> 
> I'll take a look at this as my project for today. Documenting soclayout can
> be done later.

Luke, you should be able to get power_instruction_analyzer setup and installed properly by following https://libre-soc.org/HDL_workflow/ section 6.7 power_instruction_analyzer (pia)

Please let me know if you encounter bugs.

Comment 67 Luke Kenneth Casson Leighton 2020-07-24 22:15:20 BST

this should be discussed under the appropriate bug report.

Comment 68 Luke Kenneth Casson Leighton 2020-07-25 16:05:19 BST

ok that's added, cookie-cut from the alu compunit.

commit db6b993384779aa8e4580b1391ac9d3b384fae66 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jul 25 15:34:16 2020 +0100

    add div compunit test

interestingly i had to add an extra check which tested when the
pipeline n.ready_o was valid: it was pure coincidence with all
other (single-stage) compunit tests that the data *happened*
to be valid.

so when re-adding the pia checking, jacob, those *two* unit tests must
be run and must work.  actually i'll give it a go.

git diff b5245 9b501

Comment 69 Jacob Lifshay 2020-08-05 04:52:02 BST

As of soc.git commit 837d9fbdd54265a63a07e475b6d85313cadf2927, all tests pass, so I declare this complete.

I had to fix bug 449 for the div pipeline to get the tests to pass.

Comment 70 Luke Kenneth Casson Leighton 2020-08-05 12:38:17 BST

(In reply to Jacob Lifshay from comment #69)
> As of soc.git commit 837d9fbdd54265a63a07e475b6d85313cadf2927, all tests
> pass, so I declare this complete.

it's a leeettle premature: test_issuer.py needed to be run with the
DivTestCases

commit b6223dc2bd7125b6d12a5cd8a220fa76ca124549 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Aug 5 12:32:45 2020 +0100

    add div test cases into test_issuer.py


checking that it compiles in coriolis2 needed to be done (it does)

commit 4cb01cbff73a8e5cf74282f260be6a9a1f666b00 (origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Wed Aug 5 10:29:44 2020 +0100

    add div FSM as default for test_issuer in verilog and ilang gen


and the verilog simulation checked (still TODO)

and, really, i wanted the "skip if there are 8 bytes of zeros"
capability that's in microwatt's divide.vhdl FSM.  which can be
done as an enhancement under a separate bugreport, given that
it's strictly speaking an optimisation.

apaaart from all that... yes :)

remember to track it on your libre-soc homepage, cut/paste from
here https://libre-soc.org/lkcl

Comment 71 Jacob Lifshay 2020-08-05 22:44:59 BST

(In reply to Luke Kenneth Casson Leighton from comment #70)
> and, really, i wanted the "skip if there are 8 bytes of zeros"
> capability that's in microwatt's divide.vhdl FSM.  which can be
> done as an enhancement under a separate bugreport, given that
> it's strictly speaking an optimisation.

if I'm going to spend any time optimizing, I think it would be better spent converting the FSM to a radix 4 or 8 divider and/or adding support for skipping ahead by using find-first-set-bit on both remainder and divisor to skip runs of zeros. increasing the radix would multiply the area to several adders/subtractors instead of one. adding the run-skipping would require 2 find first set bit circuits and a 128-bit variable shifter for dividend_quotient, which I think might be a bit too much unless we limit the max shift amount to 7 bits or so.

> remember to track it on your libre-soc homepage, cut/paste from
> here https://libre-soc.org/lkcl

already done.

Comment 72 Luke Kenneth Casson Leighton 2020-08-05 23:34:22 BST

(In reply to Jacob Lifshay from comment #71)
> (In reply to Luke Kenneth Casson Leighton from comment #70)
> > and, really, i wanted the "skip if there are 8 bytes of zeros"
> > capability that's in microwatt's divide.vhdl FSM.  which can be
> > done as an enhancement under a separate bugreport, given that
> > it's strictly speaking an optimisation.
> 
> if I'm going to spend any time optimizing, I think it would be better spent
> converting the FSM to a radix 4 or 8 divider and/or adding support for
> skipping ahead by using find-first-set-bit on both remainder and divisor to
> skip runs of zeros.

that's an interesting idea.  microwatt just tests "is the next 8 bits of q == 0" if so skip 16 bits.

that is a fixed shift amount, so is particularly efficient.

let's raise a new bugreport and discuss it there.

> increasing the radix would multiply the area to several
> adders/subtractors instead of one. adding the run-skipping would require 2
> find first set bit circuits and a 128-bit variable shifter for
> dividend_quotient, which I think might be a bit too much unless we limit the
> max shift amount to 7 bits or so.

well, you'll find this hilarious: currently, the div FSM is so small that it is difficult to compute a block in coriolis2 which will take the inputs and outputs on one side.

we actually *need* it to be bigger :)

Comment 73 Jacob Lifshay 2020-08-18 02:04:50 BST

I think the value should be 1500 EUR split 2/3 to me and 1/3 to Luke.
Luke, does that sound about right?

Also, please decide where the funding should come from, since bug 383's funding appears to be all used up, though I have no clue where. We need to clean up our budget tracking.

Comment 74 Luke Kenneth Casson Leighton 2020-08-18 09:56:03 BST

(In reply to Jacob Lifshay from comment #73)
> I think the value should be 1500 EUR split 2/3 to me and 1/3 to Luke.
> Luke, does that sound about right?

i initially went "eek" because it has been very slow progress however going over everything actually the productivity matches that. 

> Also, please decide where the funding should come from, since bug 383's
> funding appears to be all used up, though I have no clue where.

the subtask budget field had not yet been filled in.

Comment 75 Jacob Lifshay 2020-09-13 23:15:39 BST

(In reply to Luke Kenneth Casson Leighton from comment #74)
> (In reply to Jacob Lifshay from comment #73)
> > Also, please decide where the funding should come from, since bug 383's
> > funding appears to be all used up, though I have no clue where.
> 
> the subtask budget field had not yet been filled in.

So, everything's completed for this bug in terms of getting paid? IIRC there was something else you wanted done first, Luke.