a DIV (and MOD) pipeline is needed similar to the other soc.fu pipelines.
register allocations, for fu.div.pipe_data.py looks to be identical to ALUInputData and ALUOutputData. https://libre-soc.org/openpower/isa/fixedarith/ # Divide XO-Form divw* RT,RA,RB Special Registers Altered: * CR0 (bits 0:2 undefined in 64-bit mode) (if Rc=1) * SO OV OV32 (if OE=1) # Modulo X-Form mod* RT,RA,RB no special regs altered
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/power_enums.py;hb=HEAD TODO, add DIV to Function enum, and back in the (annoyingly separated) CSV files change DIV and MOD operations from type ALU Function Unit to type DIV.
after looking through the spec some, it turns out that div and rem don't actually trap on division by zero, so that makes them much easier to implement. The numerical results are undefined on division-by-zero/overflow, so I will need to run some tests on a Power processor to see which results we should produce, hopefully they are the same as for RISC-V.
(In reply to Jacob Lifshay from comment #3) > after looking through the spec some, it turns out that div and rem don't > actually trap on division by zero, so that makes them much easier to > implement. The numerical results are undefined on division-by-zero/overflow, > so I will need to run some tests on a Power processor to see which results > we should produce, hopefully they are the same as for RISC-V. good idea.
I wrote a simple Rust program that runs different Power instructions (just div* and mod* for now) and put the results here: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/output-for-initial-2-ga7250d2.txt Source here: https://salsa.debian.org/Kazan-team/power-instruction-analyzer result_prev is assigned to the result register before executing the tested instruction since sometimes processors just skip writing to a register in exceptional cases From what I can tell, for all error cases for div and mod instructions, POWER9 just writes 0 to the result register. For the instructions with just 32-bit results, from what I can tell from a relatively small number of test cases, it zero extends for all signed/unsigned div instructions, zero extends for unsigned mod instructions, and sign extends for signed mod instructions.
I added a Rust model for the div and rem instructions as well as converted the output format to JSON since that's waay easier to ingest than the custom output format I had before. The model matches the native POWER9 instructions for all the cases I tested. I also added conditional compilation code that allows running just the model on any computer rather than just on POWER9: Run it with: cargo run --no-default-features Model functions: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/src/instr_models.rs This should make it easier to copy the behavior of POWER9 in the nmigen div pipeline.
(In reply to Jacob Lifshay from comment #6) > I added a Rust model for the div and rem instructions as well as converted > the output format to JSON since that's waay easier to ingest than the custom > output format I had before. :) > This should make it easier to copy the behavior of POWER9 in the nmigen div > pipeline. ah fantastic. yes, looking at microwatt, the synthesis of all the various options is quite obtuse (and brilliant). i've found it is essential to have some form of guide in a familiar language. it's really useful that these are also binary-executable on native POWER9 as well as really clear documented actual implementations.
Created the setup stage that translates from the input data to the format expected by DivPipeCore but didn't create tests yet, am going to be done for today. Still have to double check that I got the number of fractional bits correct.
(In reply to Jacob Lifshay from comment #8) > Created the setup stage that translates from the input data to the format > expected by DivPipeCore but didn't create tests yet, am going to be done for > today. Still have to double check that I got the number of fractional bits > correct. fantastic. as you can probably see i'm slowly going through the test_pipe_caller.py code removing code-duplication. for now i recommend cookie-cutting fu/alu/test_pipe_caller.py in its entirety, because i noticed that these use OE/Rc. * divdu RT,RA,RB (OE=0 Rc=0) Special Registers Altered: CR0 (if Rc=1) SO OV OV32 (if OE=1) that means they're exactly like fixedarith, for *both* input and output data, so you can straight-cookie-cut alu test_pipe_caller.py i will drop in an fu/compunits/test/test_div_compunit.py
(In reply to Luke Kenneth Casson Leighton from comment #9) > that means they're exactly like fixedarith, for *both* input and output > data, so you can straight-cookie-cut alu test_pipe_caller.py dropped in a copy for you (the one that was there already needed updating).
jacob i've added soc/simulator/test_sim_div.py and immediately ran into both pseudocode and spec issues. both divsw and modsw contain a "minus" symbol following the setup of the divisor: divisor [0:31] <- (RB)[32:63]- this has no explanation in the spec notation as to what it means. also, the variables "dividend" and "divisor" do not exist. given that this is the pseudocode we're talking about - the actual spec - i am reluctant to make significant modifications. we may however have to. i have already made this change: RT[0:31 ] <- undefined --> RT[0:31 ] <- undefined[0:31] however i am not happy with it.
i had to change decoder/isa/fixedarith.py to this, to get it to work: @inject() def op_divw(self, RA, RB, RT): dividend = RA[32:64] divisor = RB[32:64] RT[32:64] = dividend // divisor RT[0:32] = undefined[0:32] return (RT,) fixedarith.py is auto-generated by pywriter.py oh, i also updated the parser so that it will output floordiv not div, and updated SelectableInt so that it has an __floordiv__ function. the parser was previously outputting this: RT[32:64] = dividend / divisor it will be necessary to check the behaviour of signed/unsigned and potentially code up the correct behaviour in SelectableInt.__mod__ and SelectableInt.__floordiv__: >>> 5//0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero >>> >>> -1 % 0 Traceback (most recent call last): File "<stdin>", line 1, in <module> ZeroDivisionError: integer division or modulo by zero >>> yes, definitely. we can't have that. those will need to be caught in SelectableInt and the correct value returned. also this, from the spec for modsw: If an attempt is made to perform any of the divisions 0x8000_0000 % -1 <anything> % 0 and for modsd: If an attempt is made to perform any of the divisions <anything> % 0 0x8000_0000_0000_0000 % -1 that's likely going to have to be special-cased as well, or at least tested and we work out what is supposed to be put into RT. interestingly these do *not* result in overflow flags being set.
The spec pseudo-code doesn't contain all the required information to implement any of the div or mod operations since only the prose states the conditions for overflow (which *DO* show up in the overflow flags when div*o is used). Additionally, only the prose states if the division is to be interpreted as signed or unsigned. All divisions are truncating division, not floor division, so we should translate them to function calls to custom trunc_div and trunc_rem functions rather than python's // and % operators (since the mod* instructions implement the mathematical remainder of truncating division operation rather than the mathematical modulo operation). Code not tested: def trunc_div(n, d): f = getattr(n, "trunc_div", None) if f is not None: return f(d) fr = getattr(d, "rtrunc_div", None) if fr is not None: return fr(n) abs_n = abs(n) abs_d = abs(d) abs_q = n // d if (n < 0) == (d < 0): return abs_q return -abs_q def trunc_rem(n, d): f = getattr(n, "trunc_rem", None) if f is not None: return f(d) fr = getattr(d, "rtrunc_rem", None) if fr is not None: return fr(n) return n - d * trunc_div(n, d)
(In reply to Luke Kenneth Casson Leighton from comment #12) > also this, from the spec for modsw: > > If an attempt is made to perform any of the divisions > 0x8000_0000 % -1 > <anything> % 0 > > and for modsd: > > If an attempt is made to perform any of the divisions > <anything> % 0 0x8000_0000_0000_0000 % -1 Those are just the conditions where the corresponding div* instruction would overflow. > > that's likely going to have to be special-cased as well, or at least tested > and we work out what is supposed to be put into RT. already done: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L195 > interestingly these > do *not* result in overflow flags being set. To clarify my previous statement, the OV, OV32, and SO flags are only set when the div*o variant of the instructions is selected but that doesn't exist for the mod* instructions, that's because div* is XO-form and mod* is X-form. I'm assuming ALUOutputStage will handle that wrinkle, though didn't yet check.
(In reply to Jacob Lifshay from comment #14) > (In reply to Luke Kenneth Casson Leighton from comment #12) > > If an attempt is made to perform any of the divisions > > <anything> % 0 0x8000_0000_0000_0000 % -1 > > Those are just the conditions where the corresponding div* instruction would > overflow. ok. > > > > that's likely going to have to be special-cased as well, or at least tested > > and we work out what is supposed to be put into RT. > > already done: > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/ > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L195 brilliant. > > interestingly these > > do *not* result in overflow flags being set. > > To clarify my previous statement, the OV, OV32, and SO flags are only set > when the div*o variant of the instructions is selected but that doesn't > exist for the mod* instructions, that's because div* is XO-form and mod* is > X-form. I'm assuming ALUOutputStage will handle that wrinkle, though didn't > yet check. yes it does. microwatt's source (now in the CSV files) sets on a per-op basis whether followup OV/OV32/SO shall be actioned. the Simulator ISACaller uses the *exact* same PowerDecoder2 to determine this exact same information.
(In reply to Jacob Lifshay from comment #13) > The spec pseudo-code doesn't contain all the required information to > implement any of the div or mod operations since only the prose states the > conditions for overflow (which *DO* show up in the overflow flags when div*o > is used). answered this one. > All divisions are truncating division, not floor division, so we should > translate them to function calls to custom trunc_div and trunc_rem functions > rather than python's // and % operators (since the mod* instructions > implement the mathematical remainder of truncating division operation rather > than the mathematical modulo operation). urk. yuk. that's going to be a bad hack (based on the name of the variables involved - what is being assigned to). it's doable - just not pretty. > Code not tested: > > def trunc_div(n, d): > def trunc_rem(n, d): ok i've put both of these into soc.decoder.helpers will commit shortly. btw a trick, jacob: open a copy of decoder/isa/all.py in vim, run "pywriter.py fixedarith" then *overwrite* all.py restoring the former version. this will save several minutes when re-running pywriter.py. it would be nice for this process to be more efficient (all.py not to be statically written but to be dynamic instead).
(In reply to Luke Kenneth Casson Leighton from comment #16) > (In reply to Jacob Lifshay from comment #13) > > All divisions are truncating division, not floor division, so we should > > translate them to function calls to custom trunc_div and trunc_rem functions > > rather than python's // and % operators (since the mod* instructions > > implement the mathematical remainder of truncating division operation rather > > than the mathematical modulo operation). > > urk. yuk. that's going to be a bad hack (based on the name of the variables > involved - what is being assigned to). it's doable - just not pretty. done: [master 428ee14] add in really bad hack which calls trunc_div or trunc_mod https://bugs.libre-soc.org/show_bug.cgi?id=324#c16 i have one more thing left to do (one more bad hack) - identify where "undefined" is used and copy only the bits required from it.
(In reply to Luke Kenneth Casson Leighton from comment #17) > (In reply to Luke Kenneth Casson Leighton from comment #16) > > (In reply to Jacob Lifshay from comment #13) > > > All divisions are truncating division, not floor division, so we should > > > translate them to function calls to custom trunc_div and trunc_rem functions > > > rather than python's // and % operators (since the mod* instructions > > > implement the mathematical remainder of truncating division operation rather > > > than the mathematical modulo operation). > > > > urk. yuk. that's going to be a bad hack (based on the name of the variables > > involved - what is being assigned to). it's doable - just not pretty. > > done: > > [master 428ee14] add in really bad hack which calls trunc_div or trunc_mod > https://bugs.libre-soc.org/show_bug.cgi?id=324#c16 I was thinking that the spec's / and % operators would always translate to trunc_div and trunc_rem unless they are the FP operators -- if all inputs are non-negative there is no difference between trunc_* and floor div/mod. I don't think the hack with detecting if the variable name is `dividend` is necessary. are there any other uses of the div and rem operators?
(In reply to Luke Kenneth Casson Leighton from comment #17) > i have one more thing left to do (one more bad hack) - identify where > "undefined" is used and copy only the bits required from it. done. this is a good sign: RT[32:64] = trunc_div(dividend, divisor) File "/home/lkcl/src/libresoc/soc/src/soc/decoder/helpers.py", line 20, in trunc_div abs_n = abs(n) TypeError: bad operand type for abs(): 'SelectableInt' and appears to be sorted by this: diff --git a/src/soc/decoder/selectable_int.py b/src/soc/decoder/selectable_int.py index 7264090..eebada6 100644 --- a/src/soc/decoder/selectable_int.py +++ b/src/soc/decoder/selectable_int.py @@ -181,6 +181,8 @@ class SelectableInt: return self._op(or_, b) def __xor__(self, b): return self._op(xor, b) + def __abs__(self): + return SelectableInt(0, self.bits) - self
(In reply to Jacob Lifshay from comment #18) > (In reply to Luke Kenneth Casson Leighton from comment #17) > > done: > > > > [master 428ee14] add in really bad hack which calls trunc_div or trunc_mod > > https://bugs.libre-soc.org/show_bug.cgi?id=324#c16 > > I was thinking that the spec's / and % operators would always translate to > trunc_div and trunc_rem unless they are the FP operators -- if all inputs > are non-negative there is no difference between trunc_* and floor div/mod. > > I don't think the hack with detecting if the variable name is `dividend` is > necessary. > > are there any other uses of the div and rem operators? according to the spec, the integer div and rem operators are always truncating division: ÷ Division, with result truncated to integer % Remainder of integer division
(In reply to Luke Kenneth Casson Leighton from comment #19) > (In reply to Luke Kenneth Casson Leighton from comment #17) > > > i have one more thing left to do (one more bad hack) - identify where > > "undefined" is used and copy only the bits required from it. > > done. > > this is a good sign: > > RT[32:64] = trunc_div(dividend, divisor) > File "/home/lkcl/src/libresoc/soc/src/soc/decoder/helpers.py", line 20, in > trunc_div > abs_n = abs(n) > TypeError: bad operand type for abs(): 'SelectableInt' > > and appears to be sorted by this: > > diff --git a/src/soc/decoder/selectable_int.py > b/src/soc/decoder/selectable_int.py > index 7264090..eebada6 100644 > --- a/src/soc/decoder/selectable_int.py > +++ b/src/soc/decoder/selectable_int.py > @@ -181,6 +181,8 @@ class SelectableInt: > return self._op(or_, b) > def __xor__(self, b): > return self._op(xor, b) > + def __abs__(self): > + return SelectableInt(0, self.bits) - self I was thinking SelectableInt and other custom classes would implement the trunc_div and rtrunc_div member functions, which is the trunc_div version of __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem.
(In reply to Jacob Lifshay from comment #18) > I was thinking that the spec's / and % operators would always translate to > trunc_div and trunc_rem unless they are the FP operators -- if all inputs > are non-negative there is no difference between trunc_* and floor div/mod. > > I don't think the hack with detecting if the variable name is `dividend` is > necessary. well... it's in now :) > are there any other uses of the div and rem operators? don't know: didn't want to take the risk - something to keep an eye on though. $ python3 simulator/test_div_sim.py this now works. i'm just adding a quick moduw test as well. which also works. remember to run pywriter.py after a git pull.
(In reply to Jacob Lifshay from comment #21) > I was thinking SelectableInt and other custom classes would implement the > trunc_div and rtrunc_div member functions, which is the trunc_div version of > __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem. the problem is that if we do that, SelectableInt behaves unpredictably. SelectableInt is designed not to be a "POWER9 compliant integer arithmetic" class, it's designed to be a direct one-to-one and onto map of the behaviour of the python *int* class - just one that happens to have a bitlength associated with it. thus we genuinely expect SelectableInt's floordiv and truncdiv behaviour to be exactly that of int, and consequently if in any code anywhere we need to perform a div (or mod), we expect it to actually work. here's the thing: although you may not have been aware of it, trunc_div and trunc_rem are *behaving as expected* despite the fact that you wrote them assuming that they would take python ints as arguments! and that's *exactly* why SelectableInt.__floordiv__ and SelectableInt.__truncdiv__ should not implement POWER9 div behaviour :)
(In reply to Luke Kenneth Casson Leighton from comment #23) > (In reply to Jacob Lifshay from comment #21) > > > I was thinking SelectableInt and other custom classes would implement the > > trunc_div and rtrunc_div member functions, which is the trunc_div version of > > __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem. > > the problem is that if we do that, SelectableInt behaves unpredictably. > > SelectableInt is designed not to be a "POWER9 compliant integer arithmetic" > class, it's designed to be a direct one-to-one and onto map of the behaviour > of the python *int* class - just one that happens to have a bitlength > associated > with it. > > thus we genuinely expect SelectableInt's floordiv and truncdiv behaviour to > be exactly that of int, and consequently if in any code anywhere we need to > perform a div (or mod), we expect it to actually work. they behave the same as int even if the trunc_div and trunc_rem member functions are overridden with a custom implementation that calculates the same results. The member functions are designed to work for classes like Signal or other special classes that are sufficiently unlike int that they needs to be manually implemented. the trunc_div free function calls the member functions if they exist -- just like abs(a) calls a.__abs__() there's no reason trunc_div should or would match __floor_div__ -- they're different operations. I was thinking about implementing trunc_div as a monkey-patched member function of int, but was not sure that was a good idea or if int can even be monkey-patched. > > here's the thing: although you may not have been aware of it, trunc_div and > trunc_rem are *behaving as expected* despite the fact that you wrote them > assuming that they would take python ints as arguments! that's mostly expected. > > and that's *exactly* why SelectableInt.__floordiv__ and > SelectableInt.__truncdiv__ > should not implement POWER9 div behaviour :) __floor_div__ is a flooring-division operator, of course it shouldn't be truncating division, that's what trunc_div is for :)
(In reply to Jacob Lifshay from comment #24) > (In reply to Luke Kenneth Casson Leighton from comment #23) > > (In reply to Jacob Lifshay from comment #21) > > > > > I was thinking SelectableInt and other custom classes would implement the > > > trunc_div and rtrunc_div member functions, which is the trunc_div version of > > > __sub__ and __rsub__. Similarly for trunc_rem and rtrunc_rem. right. ok. i am with you now. this will have to happen later... but *not* as part *of* SelectableInt, but as a module (or class) that has the operations. in this way we will implement SimpleV at that level.... *not* in SelectableInt itself. basically we are missing a class named "Power9CompliantSelectableInt" which later will become: SimpleVAwarePower9IshSelectableInt > > > > the problem is that if we do that, SelectableInt behaves unpredictably. > > > > SelectableInt is designed not to be a "POWER9 compliant integer arithmetic" > > class, it's designed to be a direct one-to-one and onto map of the behaviour > > of the python *int* class - just one that happens to have a bitlength > > associated > > with it. > > > > thus we genuinely expect SelectableInt's floordiv and truncdiv behaviour to > > be exactly that of int, and consequently if in any code anywhere we need to > > perform a div (or mod), we expect it to actually work. > > they behave the same as int even if the trunc_div and trunc_rem member > functions are overridden with a custom implementation that calculates the > same results. i'm not understanding you, sorry. trunc_div and trunc_rem are standalone functions not members so cannot be overridden. the parser now specifically identifies the div and mod circumstances and calls them directly, *bypassing* the % and / operator overloads entirely. this is something we have to do anyway for all operators (in order to be able to do dynamic SIMD backend for SimpleV) > The member functions are designed to work for classes like Signal or other > special classes that are sufficiently unlike int that they needs to be > manually implemented. > > the trunc_div free function calls the member functions if they exist -- just > like abs(a) calls a.__abs__() > > there's no reason trunc_div should or would match __floor_div__ -- they're > different operations. yes sorry i meant rem > I was thinking about implementing trunc_div as a monkey-patched member > function of int, but was not sure that was a good idea or if int can even be > monkey-patched. it can't, and changing int behaviour would cause havoc on a global scale. we would need to create yet another int wrapper class and use it everywhere. this would be terribly slow (and unnecessary) > > > > here's the thing: although you may not have been aware of it, trunc_div and > > trunc_rem are *behaving as expected* despite the fact that you wrote them > > assuming that they would take python ints as arguments! > > that's mostly expected. > > > > > and that's *exactly* why SelectableInt.__floordiv__ and > > SelectableInt.__truncdiv__ > > should not implement POWER9 div behaviour :) > > __floor_div__ is a flooring-division operator, of course it shouldn't be > truncating division, that's what trunc_div is for :) sorry i meant rem.
(In reply to Luke Kenneth Casson Leighton from comment #25) > (In reply to Jacob Lifshay from comment #24) > > > > they behave the same as int even if the trunc_div and trunc_rem member > > functions are overridden with a custom implementation that calculates the > > same results. > > i'm not understanding you, sorry. trunc_div and trunc_rem are standalone > functions not members so cannot be overridden. The standalone functions tries to call the member functions if they exist, this is to emulate python's behavior where a - b calls a.__sub__(b) or b.__rsub__(a). So, if I had a custom class: class MyClass: def trunc_div(self, d): print("trunc_div:", self, "/", d) return 0 def rtrunc_div(self, n): print("rtrunc_div:", n, "/", self) return 0 def __repr__(self): return "MyClass" trunc_div(MyClass(), 123) trunc_div(MyClass(), MyClass()) trunc_div(456, MyClass()) outputs: trunc_div: MyClass / 123 trunc_div: MyClass / MyClass rtrunc_div: 456 / MyClass
(In reply to Jacob Lifshay from comment #26) > The standalone functions tries to call the member functions if they exist, > this is to emulate python's behavior where a - b calls a.__sub__(b) or > b.__rsub__(a). > > So, if I had a custom class: > > class MyClass: > def trunc_div(self, d): > print("trunc_div:", self, "/", d) > return 0 > def rtrunc_div(self, n): > print("rtrunc_div:", n, "/", self) > return 0 > def __repr__(self): > return "MyClass" > > trunc_div(MyClass(), 123) > trunc_div(MyClass(), MyClass()) > trunc_div(456, MyClass()) > > outputs: > trunc_div: MyClass / 123 > trunc_div: MyClass / MyClass > rtrunc_div: 456 / MyClass only the functions in the operator module work this way. the above is not how python works. or, if it is, it's literally the first time i've heard about it in 20 years! >>> trunc_div(MyClass(), 123) Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'trunc_div' is not defined yep, it's not how python works. you have to call MyClass().trunc_div(123) or define a trunc_div (global, non-class) function: def truncdiv(a, b): if isinstance(a, MyClass): return a.truncdiv(b) if isinstance(b, MyClass): return b.reverse_truncdiv(a) raise Exception() operator.truncdiv on the other hand is *specifically* written to do what you are expecting (except calling __truncdiv__, where operator.add calls __add__, etc. etc. for each function operator.{xxx} it calls __xxx__ ) so the function operator.add goes something like: def add(a, b): if hasattr(a, "__add__"): return a.__add__(b) else: return b.__radd__(a) or close to it. it *might* do some checking to see which is the largest (most accurate) class. in order this would be something like int, float, complex where float is capable of representing all ints, and complex is capable of representing all floats. therefore, we are *bypassing* that with the functions that are now in soc/decoder/helpers.py really we should not in any way be using the operator overloads __add__ etc. at all at the POWER9 level. at the POWER9 level they should all be done in the format MyClass and then have the parser.py generate AST which calls those functions directly as 2-argument functions. you can see this is done here for "!=" already, in parser.py: def make_ne_compare(arg): (left, right) = arg return ast.Call(ast.Name("ne", ast.Load()), (left, right), []) this will create python code "ne(RA, 0)" when it sees "RA != 0" in the pseudocode. *every* operator needs to be like this. if you want to make it "clearer", it should be called "POWER9_ne", and "POWER9_trunc_div" etc. however the reason we've not done this is because it makes the auto-generated code pretty much unreadable :) the places to be compliant with POWER9 are in those functions "POWER9_ne". *not* in SelectableInt. SelectableInt is very specifically designed to be that "int override" class that you mentioned (the monkey-patch one). *not* as a POWER9-compatible class. yes we need that POWER9-compatible class, but not right now. helpers.ne, helpers.trunc_div, helpers.trunc_rem "do that job" and the plan is to move them into a MyClass-style-formatted object... but not right now. when we add SimpleV it will be absolutely essential to add that MyClass-like object, because that MyClass-like object will be told what the SIMD partitioning currently is, what VL currently is, and it will also have to be passed the full Register File. but... not right now.
(In reply to Luke Kenneth Casson Leighton from comment #27) > (In reply to Jacob Lifshay from comment #26) > > > The standalone functions tries to call the member functions if they exist, > > this is to emulate python's behavior where a - b calls a.__sub__(b) or > > b.__rsub__(a). > > > > So, if I had a custom class: > > > > class MyClass: > > def trunc_div(self, d): > > print("trunc_div:", self, "/", d) > > return 0 > > def rtrunc_div(self, n): > > print("rtrunc_div:", n, "/", self) > > return 0 > > def __repr__(self): > > return "MyClass" > > > > trunc_div(MyClass(), 123) > > trunc_div(MyClass(), MyClass()) > > trunc_div(456, MyClass()) > > > > outputs: > > trunc_div: MyClass / 123 > > trunc_div: MyClass / MyClass > > rtrunc_div: 456 / MyClass > > only the functions in the operator module work this way. the above is not > how python works. or, if it is, it's literally the first time i've heard > about it in 20 years! no, what I posted is exactly how python works, assuming the trunc_div function which you copied to soc.git earlier today is in scope (apparently that wasn't obvious): https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/helpers.py;h=ef6610e5c89d7d0e0316c216bf51faf8486f46df;hb=HEAD#l13 result from python repl: ... >>> trunc_div(MyClass(), 123) trunc_div: MyClass / 123 0 >>> trunc_div(MyClass(), MyClass()) trunc_div: MyClass / MyClass 0 >>> trunc_div(456, MyClass()) rtrunc_div: 456 / MyClass 0 >>>
(In reply to Luke Kenneth Casson Leighton from comment #27) > if you want to make it "clearer", it should be called "POWER9_ne", > and "POWER9_trunc_div" etc. however the reason we've not done this is > because it makes the auto-generated code pretty much unreadable :) That trunc_div function isn't intended to be Power ISA compatible (or POWER9, which is not even what we're implementing -- we're implementing Libre-SOC's first core -- not POWER9, they are different cpus -- please avoid calling the Power ISA "POWER9", they are different things. That's like if AMD called their hypothetical new processor core "Sandy Bridge" or "Pentium", which are widely associated with Intel). trunc_div is intended to be the equivalent of python's // operator, except that it rounds differently -- the same way Rust, C, C++, Java, x86, and Power do. Similarly with trunc_rem.
(In reply to Jacob Lifshay from comment #28) > > only the functions in the operator module work this way. the above is not > > how python works. or, if it is, it's literally the first time i've heard > > about it in 20 years! > > no, what I posted is exactly how python works, assuming the trunc_div > function which you copied to soc.git earlier today is in scope (apparently > that wasn't obvious): not at all :) i simply committed it and moved rapidly on with the focus on the parser and pywriter :) thank you for pointing it out: i'm with you now. > https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/helpers.py; > h=ef6610e5c89d7d0e0316c216bf51faf8486f46df;hb=HEAD#l13 > > > result from python repl: > > ... > >>> trunc_div(MyClass(), 123) > trunc_div: MyClass / 123 > 0 > >>> trunc_div(MyClass(), MyClass()) > trunc_div: MyClass / MyClass > 0 > >>> trunc_div(456, MyClass()) > rtrunc_div: 456 / MyClass > 0 > >>> ok. right. so this would be merging some of the functionality and responsibility of the (future-needed) class i mentioned (something like Power9SimpleVSelectableInt) with something that's inspired by how operator.{xxx} work. at the moment the functions are called from parser.py, they're assumed to be SelectableInts(), those SelectableInts are assumed to function as integers not as Power9SimpleVCompatibleSelectableInt. we do very much need to move this on very quickly. can we leave the discussion of future work related to SimpleV for another time? unless you can see a way in which not having a Power9CompatibleSelectableInt class would be a major blocker to achieving the (extremely close) Oct 2020 tapeout deadline? for example, overflow exceptions. actually, even there, i see no reason why we should need a special Power9CompatibleSelectableInt for that: it should be possible to simply catch the exception and drop a flag into the SelectableInt. decoder/isa/caller.py could watch for that and set the overflow bits in XER if it was required to do so.
(In reply to Jacob Lifshay from comment #29) > That trunc_div function isn't intended to be Power ISA compatible ah then that will be the source of confusion: i was expecting it to provide the functionality of... of... something-that-i-don't-have-a-name-for... > (or > POWER9, which is not even what we're implementing -- we're implementing > Libre-SOC's first core -- not POWER9, they are different cpus -- please > avoid calling the Power ISA "POWER9", they are different things. That's like > if AMD called their hypothetical new processor core "Sandy Bridge" or > "Pentium", which are widely associated with Intel). urrrr... :) ok then we need to know the exact name. POWER ISA V3.0B is a bit of a handful. POWER 3.0B? we've been asked by OpenPOWER Foundation to be compatible with that. short-hand i was referring to it as POWER9. > trunc_div is intended to be the equivalent of python's // operator, except > that it rounds differently -- the same way Rust, C, C++, Java, x86, and > Power do. > > Similarly with trunc_rem. ok right: then it's... (POWER 3.0B?) compatible, would that be ok to call it? if so, it would be part of the (using the appropriate name, now?) Power30BCompatibleSelectableInt class. however, right now, we're going for a collection of *functions* that are Power 3.0B compatible, and those functions *use* SelectableInt, whose behavior is expected to be directly equal to python int class. when those functions require state (which they don't right now), it becomes not just appropriate but necessary to move them to a class. also if we move _those_ functions to a class, we have to also add (move) helpers.ne, eq, lt, gt, ge and then *also* add sub, add, mul, everything and then do a bit of a rewrite on parser.py to put it into use. and that's a lot of work which is quite a major distraction. i'm quite happy for that work to be done *if* you can see a good reason as to why not having it is a major blocker to the October 2020 tapeout?
(In reply to Luke Kenneth Casson Leighton from comment #31) > (In reply to Jacob Lifshay from comment #29) > > > That trunc_div function isn't intended to be Power ISA compatible > > ah then that will be the source of confusion: i was expecting it to > provide the functionality of... of... > something-that-i-don't-have-a-name-for... > > > (or > > POWER9, which is not even what we're implementing -- we're implementing > > Libre-SOC's first core -- not POWER9, they are different cpus -- please > > avoid calling the Power ISA "POWER9", they are different things. That's like > > if AMD called their hypothetical new processor core "Sandy Bridge" or > > "Pentium", which are widely associated with Intel). > > urrrr... :) ok then we need to know the exact name. POWER ISA V3.0B is a > bit of a handful. POWER 3.0B? we've been asked by OpenPOWER Foundation to > be compatible with that. short-hand i was referring to it as POWER9. If I were to pick a replacement name for POWER9 when you meant the Power ISA, I'd probably just pick Power, since are docs clearly state (or should state) that we're compatible with Power ISA v3.0B (or v3.1B). We can assume someone reading our code knows that since that's what we advertise (or at least knows that they should figure out which version we implement). > > trunc_div is intended to be the equivalent of python's // operator, except > > that it rounds differently -- the same way Rust, C, C++, Java, x86, and > > Power do. > > > > Similarly with trunc_rem. > > ok right: then it's... (POWER 3.0B?) compatible, would that be ok to call it? I was thinking we'd put it in nmutil, since it's useful for way more than the instruction simulator. As an example, I wrote a very similar function as part of ieee754.div_rem_sqrt_rsqrt.algorithm. if you want something strictly compatible with IBM's POWER9 cpu, then translate (or create python bindings to) the code in: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/src/instr_models.rs
(In reply to Jacob Lifshay from comment #32) > If I were to pick a replacement name for POWER9 when you meant the Power > ISA, I'd probably just pick Power, unfortunately, Power is a general english word. > since are docs clearly state (or should > state) that we're compatible with Power ISA v3.0B (or v3.1B). We can assume > someone reading our code knows that since that's what we advertise (or at > least knows that they should figure out which version we implement). heck, *i* can't :) or i do but not how to refer to it. "Power" is far too general. i have conversations with ordinary people, i have to refer them to PowerPC and mention "G4 Mac" and their eyes un-glaze at that point :) > > > trunc_div is intended to be the equivalent of python's // operator, except > > > that it rounds differently -- the same way Rust, C, C++, Java, x86, and > > > Power do. > > > > > > Similarly with trunc_rem. > > > > ok right: then it's... (POWER 3.0B?) compatible, would that be ok to call it? > > I was thinking we'd put it in nmutil, since it's useful for way more than > the instruction simulator. done. nmutil.divmod.py - feel free to move it you think of a better name, remember to update helpers.py though > As an example, I wrote a very similar function as > part of ieee754.div_rem_sqrt_rsqrt.algorithm. ahhhh okaaay > if you want something strictly compatible with IBM's POWER9 cpu, then > translate (or create python bindings to) the code in: > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/master/ > src/instr_models.rs right. ok. so that is tending to suggest that... in decode/helpers.py, we *do* need a function called power_trunc_div which does the detection of overflow. which is slightly different from the other cases (other arithmetic cases) in that you can't really post-analyse the *result*. ok i'll take care of that if you can concentrate on the hardware side. i may need help with the unit tests in test_div_sim.py (if you know what do to, there, do just throw some in and i will chew through them making sure that the simulator code is "correct" against them) somewhere in the middle of the hardware pipeline you'll need to generate that overflow condition and pass it through. whether that's a special flag or whether you actually use the xer_ov/32 data structure is up to you, but it has to come _out_ as xer_ov/32 and xer_ca/32 when it goes into ALUOutputStage. self.dive_abs_overflow_32 = Signal(reset_less=True) self.dive_abs_overflow_64 = Signal(reset_less=True) ahhh excellent, you're on the ball :)
fyi lkcl@fizzy:~/src/libresoc/soc/src/soc$ git commit -a -m 'whitespace update' [master 1cd6c71] whitespace update
(In reply to Luke Kenneth Casson Leighton from comment #33) > (In reply to Jacob Lifshay from comment #32) > > > If I were to pick a replacement name for POWER9 when you meant the Power > > ISA, I'd probably just pick Power, > > unfortunately, Power is a general english word. > > > since are docs clearly state (or should > > state) that we're compatible with Power ISA v3.0B (or v3.1B). We can assume > > someone reading our code knows that since that's what we advertise (or at > > least knows that they should figure out which version we implement). > > heck, *i* can't :) or i do but not how to refer to it. "Power" is far too > general. i have conversations with ordinary people, i have to refer them to > PowerPC and mention "G4 Mac" and their eyes un-glaze at that point :) Use PowerISA? PowerPC seems overly specific to old versions of the ISA, though that *is* what gcc, llvm, and rust uses (powerpc64). Maybe PPC64? > somewhere in the middle of the hardware pipeline you'll need to generate > that overflow condition and pass it through. whether that's a special > flag or whether you actually use the xer_ov/32 data structure is up to you, > but it has to come _out_ as xer_ov/32 and xer_ca/32 when it goes into > ALUOutputStage. > > self.dive_abs_overflow_32 = Signal(reset_less=True) > self.dive_abs_overflow_64 = Signal(reset_less=True) see DivOutputStage for the rest of the overflow calculation: https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/div/output_stage.py;h=88a9ec1eec4bbc4ebb3fece42bbe235a13ae57ff;hb=HEAD#l65 > > ahhh excellent, you're on the ball :) yup, only missing unit tests and formal proofs.
(In reply to Jacob Lifshay from comment #35) > Use PowerISA? i'm happy with that. > PowerPC seems overly specific to old versions of the ISA, > though that *is* what gcc, llvm, and rust uses (powerpc64). Maybe PPC64? sounds like it excludes 32bit. > > > > self.dive_abs_overflow_32 = Signal(reset_less=True) > > self.dive_abs_overflow_64 = Signal(reset_less=True) > > see DivOutputStage for the rest of the overflow calculation: > https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/div/output_stage.py; > h=88a9ec1eec4bbc4ebb3fece42bbe235a13ae57ff;hb=HEAD#l65 > > > > > ahhh excellent, you're on the ball :) > > yup, only missing unit tests and formal proofs. ok you should be able to just import the class in test_div_sim.py and use it in fu/div/test/test_pipe_caller.py can i leave that with you as it's 11pm here?
i have fu/div/test/test_pipe_caller.py running, i have no idea if it is correct or not. jacob can you please take a look?
argh, Chrome deleted all the text I was typing. Anyway, it looks like a good start, though I don't know for sure as I'm not familiar with the testing helper functions. It needs some additional testing: all the other div/mod instructions and their variants: divdeo divdeuo divdo divduo divweo divweuo divwo divwuo modsd modud modsw moduw It needs some non-random test cases to hit the edge cases: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/13dae100c6bc5685059195010ceb46ae68b9f306/src/main.rs#L210 soc.fu.div.setup_stage.DivSetupStage.elaborate was incorrectly changed: https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=8cc95f83fe437f4ac1eb5ebe57265bed30c0078c ... - # pass through op + # pass through core data - comb += self.o.op.eq(op) + comb += self.o.core.eq(core_o) That should be op since op is used in the last div stage. self.o.core.eq(core_o) is the same as self.o.core.eq(self.o.core), so is useless.
(In reply to Jacob Lifshay from comment #38) > argh, Chrome deleted all the text I was typing. > > Anyway, it looks like a good start, though I don't know for sure as I'm not > familiar with the testing helper functions. it's very simple, just cookie-cut test_rand_div and ignore everything else. > It needs some additional testing: > all the other div/mod instructions and their variants: > It needs some non-random test cases to hit the edge cases: > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/ > 13dae100c6bc5685059195010ceb46ae68b9f306/src/main.rs#L210 can you add these? there is so much to do and i am pretty much doing everything right now. > soc.fu.div.setup_stage.DivSetupStage.elaborate was incorrectly changed: > https://git.libre-soc.org/?p=soc.git;a=commitdiff; > h=8cc95f83fe437f4ac1eb5ebe57265bed30c0078c > ... > - # pass through op > + # pass through core data > > - comb += self.o.op.eq(op) > + comb += self.o.core.eq(core_o) > > That should be op since op is used in the last div stage. > self.o.core.eq(core_o) is the same as self.o.core.eq(self.o.core), so is > useless. self.o.op is not defined, which is why i tried to find something to set it to. if you can run the unit test "python3 fu/div/test/test_pipe_caller.py" and investigate that would be really helpful.
(In reply to Luke Kenneth Casson Leighton from comment #39) > self.o.op is not defined, which is why i tried to find something to set it > to. > if you can run the unit test "python3 fu/div/test/test_pipe_caller.py" and > investigate that would be really helpful. i went over everything that gets set up as part of self.o.core and it looks like all parts of DivPipeCoreInputData are established: class DivPipeCoreInputData: self.dividend = Signal(bw + fw, reset_less=reset_less) self.divisor_radicand = Signal(bw, reset_less=reset_less) self.operation = DP.create_signal(reset_less=reset_less) operation: comb += core_o.operation.eq(int(DivPipeCoreOperation.UDivRem)) dividend and divisor: dividend_o = core_o.dividend divisor_o = core_o.divisor_radicand these also get set up. so i believe the self.o.op.eq() - presumably intended to be self.o.core.eq() - is redundant.
btw jacob i noticed that on qemu, divw sets RT=RA when RB=0. can you check that behaviour on POWER9?
btw several combinatorial stages chained together is absolutely fine as long as it's not too many. i'm going to try 8 (it's a 180nm ASIC)
(In reply to Luke Kenneth Casson Leighton from comment #40) > (In reply to Luke Kenneth Casson Leighton from comment #39) > > > self.o.op is not defined, which is why i tried to find something to set it > > to. > > if you can run the unit test "python3 fu/div/test/test_pipe_caller.py" and > > investigate that would be really helpful. > > i went over everything that gets set up as part of self.o.core and > it looks like all parts of DivPipeCoreInputData are established: > <snip> > these also get set up. so i believe the self.o.op.eq() - presumably > intended to be self.o.core.eq() - is redundant. No, self.o.op.eq() was intended to copy the PowerISA opcode since that's needed after DivPipeCore finishes its work. Notice that CoreInputData derives from CoreBaseData, which derives from DivInputData, which should contain a field for the PowerISA opcode, which is what I assumed self.o.op is.
(In reply to Luke Kenneth Casson Leighton from comment #41) > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0. > > can you check that behaviour on POWER9? IIRC, on POWER9, division by zero results in RT=0 for all div*/mod* instructions. I specifically included that as a test case in power-instruction-analyzer since that's a case where the result is undefined according to the PowerISA spec. For divwo, see: https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153
(In reply to Luke Kenneth Casson Leighton from comment #42) > btw several combinatorial stages chained together is absolutely fine > as long as it's not too many. i'm going to try 8 (it's a 180nm ASIC) Seems excessive since then there would be a gate depth of on the order of 100 per pipeline stage, I seriously doubt we can get any decent clock speed from something *that* deep.
On Thu, Jul 2, 2020 at 8:36 PM bugzilla-daemon--- via libre-soc-bugs <libre-soc-bugs@lists.libre-riscv.org> wrote: > No, self.o.op.eq() was intended to copy the PowerISA opcode since that's needed > after DivPipeCore finishes its work. i said already: it doesn't exist. self.o.op does not exist. if you ran the unit test as i suggested, you would have found this out. > Notice that CoreInputData derives from > CoreBaseData, which derives from DivInputData, which derives from IntegerData, which contains "ctx" (pipeline context). ctx contains "op", which is where the operation information comes from > which should contain a field for > the PowerISA opcode, which is what I assumed self.o.op is. self.i.ctx (and self.o.ctx) contains the internal op from PowerDecode2, actually a subset (CompTrapOpSubset). therefore, this gets the InternalOp (a member of CompTrapOpSubset): # convenience variables op, a, b = self.i.ctx.op, self.i.a, self.i.b and therefore this copies the operation subset information needed down the pipeline chain: ###### sticky overflow and context, both pass-through ##### comb += self.o.xer_so.eq(self.i.xer_so) --> comb += self.o.ctx.eq(self.i.ctx) <----
(In reply to Luke Kenneth Casson Leighton from comment #46) > On Thu, Jul 2, 2020 at 8:36 PM bugzilla-daemon--- via libre-soc-bugs > <libre-soc-bugs@lists.libre-riscv.org> wrote: > > > No, self.o.op.eq() was intended to copy the PowerISA opcode since that's needed > > after DivPipeCore finishes its work. > > i said already: it doesn't exist. self.o.op does not exist. yes, I know. I was explaining why I wrote the code the way it was. Thank you for fixing it. :)
(In reply to Jacob Lifshay from comment #44) > (In reply to Luke Kenneth Casson Leighton from comment #41) > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0. > > > > can you check that behaviour on POWER9? > > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod* > instructions. I specifically included that as a test case in > power-instruction-analyzer since that's a case where the result is undefined > according to the PowerISA spec. For divwo, see: > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/ > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153 qemu definitely sets RT=RA for divw when RB=0. note: not divwo - *divw*. which is why i asked if you could check. hugh and paul mackerras mentioned that this *might* have been old behaviour on e.g. POWER7 or POWER8. if you run the unit test simulator/test_div_sim.py (and edit it accordingly) you will be able to see for yourself. ====================================================================== FAIL: test_0_divw (__main__.DivZeroDecoderTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "simulator/test_div_sim.py", line 80, in test_0_divw self.run_tst_program(program, [1, 2, 3]) File "/home/lkcl/src/libresoc/soc/src/soc/simulator/test_sim.py", line 252, in run_tst_program self.qemu_register_compare(simulator, q, reglist) File "/home/lkcl/src/libresoc/soc/src/soc/simulator/test_sim.py", line 290, in qemu_register_compare self.assertEqual(qemu_val, sim_val) AssertionError: 22136 != 0
(In reply to Jacob Lifshay from comment #47) > yes, I know. I was explaining why I wrote the code the way it was. ahh ok :)
(In reply to Jacob Lifshay from comment #45) > (In reply to Luke Kenneth Casson Leighton from comment #42) > > btw several combinatorial stages chained together is absolutely fine > > as long as it's not too many. i'm going to try 8 (it's a 180nm ASIC) > > Seems excessive since then there would be a gate depth of on the order of > 100 per pipeline stage, I seriously doubt we can get any decent clock speed > from something *that* deep. yehh i don't honestly know, should be running ltp to find out (which is currently having a hard time with the number of register "loops" detected...)
(In reply to Luke Kenneth Casson Leighton from comment #48) > (In reply to Jacob Lifshay from comment #44) > > (In reply to Luke Kenneth Casson Leighton from comment #41) > > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0. > > > > > > can you check that behaviour on POWER9? > > > > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod* > > instructions. I specifically included that as a test case in > > power-instruction-analyzer since that's a case where the result is undefined > > according to the PowerISA spec. For divwo, see: > > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/ > > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153 > > qemu definitely sets RT=RA for divw when RB=0. note: not divwo - *divw*. > > which is why i asked if you could check. hugh and paul mackerras mentioned > that this *might* have been old behaviour on e.g. POWER7 or POWER8. Ok, I'll check once I write the code. I was assuming that the div*o instructions produced the same results as all other div* variants. I'll also check "div*." and "div*o." variants (which I find their naming to be annoying due to confusion with the end of a sentence).
(In reply to Jacob Lifshay from comment #51) > Ok, I'll check once I write the code. I was assuming that the div*o > instructions produced the same results as all other div* variants. I'll also > check "div*." and "div*o." variants (which I find their naming to be > annoying due to confusion with the end of a sentence). sigh i know. and can't be a python function name. we converted to underscore
(In reply to Luke Kenneth Casson Leighton from comment #48) > (In reply to Jacob Lifshay from comment #44) > > (In reply to Luke Kenneth Casson Leighton from comment #41) > > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0. > > > > > > can you check that behaviour on POWER9? > > > > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod* > > instructions. I specifically included that as a test case in > > power-instruction-analyzer since that's a case where the result is undefined > > according to the PowerISA spec. For divwo, see: > > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/ > > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153 > > qemu definitely sets RT=RA for divw when RB=0. note: not divwo - *divw*. > > which is why i asked if you could check. hugh and paul mackerras mentioned > that this *might* have been old behaviour on e.g. POWER7 or POWER8. I just updated power-instruction-analyzer to include all div*[o][.] variants, they all match the original instr_models.rs -- all of `divw`, `divwo`, `divw.`, and `divwo.` all provide the exact same results when tested on POWER9. So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior (which is to be expected, since that part is undefined). I also added all instructions to the Python API, so you can just call the functions in instr_model.rs: (requires a recent version of Rust -- a nightly version is no longer required since PyO3 v0.11) $ git clone https://salsa.debian.org/Kazan-team/power-instruction-analyzer.git $ source <path-to-venv>/bin/activate $ cd power-instruction-analyzer $ pip install . $ python >>> import power_instruction_analyzer as pia >>> inputs = pia.InstructionInput(ra=0x1234, rb=0, rc=0) >>> print(pia.divw(inputs)) >>> print(pia.divwo(inputs)) >>> print(pia.divw_(inputs)) >>> print(pia.divwo_(inputs)) it will output (last output line split over two lines: {"rt":"0x0"} {"rt":"0x0","so":true,"ov":true,"ov32":true} {"rt":"0x0","cr0":{"lt":false,"gt":false,"eq":true,"so":false}} {"rt":"0x0","so":true,"ov":true,"ov32":true, "cr0":{"lt":false,"gt":false,"eq":true,"so":true}}
(In reply to Jacob Lifshay from comment #53) > So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior > (which is to be expected, since that part is undefined). Turns out I forgot to enable running using native assembly instructions when testing on the POWER9 server --- oops
(In reply to Jacob Lifshay from comment #54) > (In reply to Jacob Lifshay from comment #53) > > So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior > > (which is to be expected, since that part is undefined). > > Turns out I forgot to enable running using native assembly instructions when > testing on the POWER9 server --- oops Got that fixed. it still doesn't match qemu. I also added the mul* and madd* instructions. The instruction models now match POWER9 for all tested cases for all of: divde divdeo divde. divdeo. divdeu divdeuo divdeu. divdeuo. divd divdo divd. divdo. divdu divduo divdu. divduo. divwe divweo divwe. divweo. divweu divweuo divweu. divweuo. divw divwo divw. divwo. divwu divwuo divwu. divwuo. modsd modud modsw moduw mullw mullwo mullw. mullwo. mulhw mulhw. mulhwu mulhwu. mulld mulldo mulld. mulldo. mulhd mulhd. mulhdu mulhdu. maddhd maddhdu maddld
(In reply to Jacob Lifshay from comment #53) > (In reply to Luke Kenneth Casson Leighton from comment #48) > > (In reply to Jacob Lifshay from comment #44) > > > (In reply to Luke Kenneth Casson Leighton from comment #41) > > > > btw jacob i noticed that on qemu, divw sets RT=RA when RB=0. > > > > > > > > can you check that behaviour on POWER9? > > > > > > IIRC, on POWER9, division by zero results in RT=0 for all div*/mod* > > > instructions. I specifically included that as a test case in > > > power-instruction-analyzer since that's a case where the result is undefined > > > according to the PowerISA spec. For divwo, see: > > > https://salsa.debian.org/Kazan-team/power-instruction-analyzer/-/blob/ > > > 13dae100c6bc5685059195010ceb46ae68b9f306/src/instr_models.rs#L153 > > > > qemu definitely sets RT=RA for divw when RB=0. note: not divwo - *divw*. > > > > which is why i asked if you could check. hugh and paul mackerras mentioned > > that this *might* have been old behaviour on e.g. POWER7 or POWER8. > > I just updated power-instruction-analyzer to include all div*[o][.] > variants, they all match the original instr_models.rs -- all of `divw`, > `divwo`, `divw.`, and `divwo.` all provide the exact same results when > tested on POWER9. ah fantastic. this is really helpful > So, unless we made a mistake somewhere, qemu doesn't match POWER9's behavior > (which is to be expected, since that part is undefined). still annoying though as it means manual checking > it will output (last output line split over two lines: > {"rt":"0x0"} > {"rt":"0x0","so":true,"ov":true,"ov32":true} > {"rt":"0x0","cr0":{"lt":false,"gt":false,"eq":true,"so":false}} > {"rt":"0x0","so":true,"ov":true,"ov32":true, > "cr0":{"lt":false,"gt":false,"eq":true,"so":true}} ah this is _really_ useful as it helps check the output regspec allocations https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/pipe_data.py;hb=HEAD 17 regspec = [('INT', 'o', '0:63'), 18 ('CR', 'cr_a', '0:3'), 19 ('XER', 'xer_ca', '34,45'), # bit0: ca, bit1: ca32 20 ('XER', 'xer_ov', '33,44'), # bit0: ov, bit1: ov32 21 ('XER', 'xer_so', '32')] hmmm i'm not seeing CA or CA32 in the output list. going over the page https://libre-soc.org/openpower/isa/fixedarith/ which comes from the spec, that would seem to be the case (CA/CA32 not involved), would you agree?
(In reply to Luke Kenneth Casson Leighton from comment #56) > (In reply to Jacob Lifshay from comment #53) > > it will output (last output line split over two lines: > > {"rt":"0x0"} > > {"rt":"0x0","so":true,"ov":true,"ov32":true} > > {"rt":"0x0","cr0":{"lt":false,"gt":false,"eq":true,"so":false}} > > {"rt":"0x0","so":true,"ov":true,"ov32":true, > > "cr0":{"lt":false,"gt":false,"eq":true,"so":true}} > > ah this is _really_ useful as it helps check the output regspec allocations > https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/alu/pipe_data.py; > hb=HEAD I manually entered the output regs for each instruction, so I'm much more likely to have accidentally missed an output than if you use the csv files we derived from a working core.
(In reply to Jacob Lifshay from comment #57) > I manually entered the output regs for each instruction, ah no problem. well it is eyes-on, anyway > so I'm much more > likely to have accidentally missed an output than if you use the csv files > we derived from a working core. the csv files came from microwatt decode1.vhdl - they don't quite contain the level of detail we need in all cases. commit 5a5fbe7f8de03420334049c1b305c3eff46adde7 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Thu Jul 9 10:32:57 2020 +0100 create new DivMulOutputData which does not have CA/CA32
jacob: whoops :) index 2dc3196..0f1a549 100644 --- a/src/nmutil/divmod.py +++ b/src/nmutil/divmod.py @@ -3,8 +3,11 @@ def trunc_div(n, d): abs_n = abs(n) abs_d = abs(d) - abs_q = n // d + abs_q = abs_n // abs_d
commit 9cde3002d533352fdf8ed464fa46d4acd548b051 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Fri Jul 10 21:09:30 2020 +0100 add a DIVS function as separate and discrete from floor_div likewise for MODS and MULS this requires re-running pywriter.py - a small cheat is to open all.py, save it, just run "pywriter.py fixedarith" then copy all.py back again. (decoder/isa/*.py)
sorry, jacob: i tried building pia (see comments in #433) and it was impossible to follow. i've therefore reverted the commit that adds it as a critical dependency because it completely terminates all and any possibility of running not just the test_pipe_caller.py unit test but everything else in upstream testing as well (test_issuer.py etc). i will try to reproduce the divdeu bug without pia. also, two things: * please don't add "typing" directly to .py files. add them to a .pyi file instead. * remember to keep lines to below 80 chars several additions were not, which made reviewing the diffs harder than they should be: - with self.subTest(check="pia", sim_o=sim_o, pia_result=str(pia_resu lt)): <<<--- * remember to single-purpose commit. modifications to ISACaller had been added (without an explanatory comment) that had nothing to do with the addition of pia itself.
(In reply to Luke Kenneth Casson Leighton from comment #61) > sorry, jacob: i tried building pia (see comments in #433) and it was > impossible to follow. > > i've therefore reverted the commit that adds it as a critical dependency > because it completely terminates all and any possibility of running > not just the test_pipe_caller.py unit test but everything else in upstream > testing as well (test_issuer.py etc). > > i will try to reproduce the divdeu bug without pia. still investigating. i've moved the long tests to test_div_pipe_caller_long.py encountered errors in just the non-long ones.
oof. bit of a marathon clean-up / code-morph session, got through most of it now. * sim tests seem to work great. * divpipe tests seem to work great * FSM not so much, yet. re-adding use of pia has to be done with some considerable care: not just ensuring that other developers can (easily) install pia, there is more to it. test_div_compunit.py - which doesn't exist and i need to write it - is a "mirror" of test_div_pipe_caller.py. basically it does exactly the same job except this time it has a MultiCompUnit "Manager" on the front-end. this is a critically important test that makes sure that register allocations are correct for both read and write, and in particular that the pipeline sets "self.o.{something}.ok" correctly. if those are _not_ correct we are in serious trouble because the hardware will sit there - permanently - waiting for a register flag to be raised that is never going to happen. the issue is: the functions for checking the test output are used by both test_div_pipe_caller *and* the test_compunit. they have to be developed *in conjunction*. in addition to that: test_core.py then takes aaallll the test_compunit* tests and re-runs them *again*, this time actually connecting to register files. by not adding support fully for pia into the underlying infrastructure, critically important unit tests could no longer be run. therefore, first things first: 1) bug #436. full documentation and wait for at least one other person to repro the build / install instructions for pia 2) test_div_compunit needs to be added 3) *then* pia can be added as a build dependency, making sure that it works with both test_div_pipe_caller.py *and* test_div_compunit.py
(In reply to Luke Kenneth Casson Leighton from comment #63) [snip] > re-adding use of pia has to be done with some considerable care: not > just ensuring that other developers can (easily) install pia, there is > more to it. [snip] > by not adding support fully for pia into the underlying infrastructure, > critically important unit tests could no longer be run. > > therefore, first things first: > > 1) bug #436. full documentation and wait for at least one other > person to repro the build / install instructions for pia I'll take a look at this as my project for today. Documenting soclayout can be done later.
(In reply to Cole Poirier from comment #64) > > 1) bug #436. full documentation and wait for at least one other > > person to repro the build / install instructions for pia > > I'll take a look at this as my project for today. great, because that's holding me up.
(In reply to Cole Poirier from comment #64) > (In reply to Luke Kenneth Casson Leighton from comment #63) > [snip] > > re-adding use of pia has to be done with some considerable care: not > > just ensuring that other developers can (easily) install pia, there is > > more to it. > [snip] > > by not adding support fully for pia into the underlying infrastructure, > > critically important unit tests could no longer be run. > > > > therefore, first things first: > > > > 1) bug #436. full documentation and wait for at least one other > > person to repro the build / install instructions for pia > > I'll take a look at this as my project for today. Documenting soclayout can > be done later. Luke, you should be able to get power_instruction_analyzer setup and installed properly by following https://libre-soc.org/HDL_workflow/ section 6.7 power_instruction_analyzer (pia) Please let me know if you encounter bugs.
this should be discussed under the appropriate bug report.
ok that's added, cookie-cut from the alu compunit. commit db6b993384779aa8e4580b1391ac9d3b384fae66 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Jul 25 15:34:16 2020 +0100 add div compunit test interestingly i had to add an extra check which tested when the pipeline n.ready_o was valid: it was pure coincidence with all other (single-stage) compunit tests that the data *happened* to be valid. so when re-adding the pia checking, jacob, those *two* unit tests must be run and must work. actually i'll give it a go. git diff b5245 9b501
As of soc.git commit 837d9fbdd54265a63a07e475b6d85313cadf2927, all tests pass, so I declare this complete. I had to fix bug 449 for the div pipeline to get the tests to pass.
(In reply to Jacob Lifshay from comment #69) > As of soc.git commit 837d9fbdd54265a63a07e475b6d85313cadf2927, all tests > pass, so I declare this complete. it's a leeettle premature: test_issuer.py needed to be run with the DivTestCases commit b6223dc2bd7125b6d12a5cd8a220fa76ca124549 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Aug 5 12:32:45 2020 +0100 add div test cases into test_issuer.py checking that it compiles in coriolis2 needed to be done (it does) commit 4cb01cbff73a8e5cf74282f260be6a9a1f666b00 (origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Wed Aug 5 10:29:44 2020 +0100 add div FSM as default for test_issuer in verilog and ilang gen and the verilog simulation checked (still TODO) and, really, i wanted the "skip if there are 8 bytes of zeros" capability that's in microwatt's divide.vhdl FSM. which can be done as an enhancement under a separate bugreport, given that it's strictly speaking an optimisation. apaaart from all that... yes :) remember to track it on your libre-soc homepage, cut/paste from here https://libre-soc.org/lkcl
(In reply to Luke Kenneth Casson Leighton from comment #70) > and, really, i wanted the "skip if there are 8 bytes of zeros" > capability that's in microwatt's divide.vhdl FSM. which can be > done as an enhancement under a separate bugreport, given that > it's strictly speaking an optimisation. if I'm going to spend any time optimizing, I think it would be better spent converting the FSM to a radix 4 or 8 divider and/or adding support for skipping ahead by using find-first-set-bit on both remainder and divisor to skip runs of zeros. increasing the radix would multiply the area to several adders/subtractors instead of one. adding the run-skipping would require 2 find first set bit circuits and a 128-bit variable shifter for dividend_quotient, which I think might be a bit too much unless we limit the max shift amount to 7 bits or so. > remember to track it on your libre-soc homepage, cut/paste from > here https://libre-soc.org/lkcl already done.
(In reply to Jacob Lifshay from comment #71) > (In reply to Luke Kenneth Casson Leighton from comment #70) > > and, really, i wanted the "skip if there are 8 bytes of zeros" > > capability that's in microwatt's divide.vhdl FSM. which can be > > done as an enhancement under a separate bugreport, given that > > it's strictly speaking an optimisation. > > if I'm going to spend any time optimizing, I think it would be better spent > converting the FSM to a radix 4 or 8 divider and/or adding support for > skipping ahead by using find-first-set-bit on both remainder and divisor to > skip runs of zeros. that's an interesting idea. microwatt just tests "is the next 8 bits of q == 0" if so skip 16 bits. that is a fixed shift amount, so is particularly efficient. let's raise a new bugreport and discuss it there. > increasing the radix would multiply the area to several > adders/subtractors instead of one. adding the run-skipping would require 2 > find first set bit circuits and a 128-bit variable shifter for > dividend_quotient, which I think might be a bit too much unless we limit the > max shift amount to 7 bits or so. well, you'll find this hilarious: currently, the div FSM is so small that it is difficult to compute a block in coriolis2 which will take the inputs and outputs on one side. we actually *need* it to be bigger :)
I think the value should be 1500 EUR split 2/3 to me and 1/3 to Luke. Luke, does that sound about right? Also, please decide where the funding should come from, since bug 383's funding appears to be all used up, though I have no clue where. We need to clean up our budget tracking.
(In reply to Jacob Lifshay from comment #73) > I think the value should be 1500 EUR split 2/3 to me and 1/3 to Luke. > Luke, does that sound about right? i initially went "eek" because it has been very slow progress however going over everything actually the productivity matches that. > Also, please decide where the funding should come from, since bug 383's > funding appears to be all used up, though I have no clue where. the subtask budget field had not yet been filled in.
(In reply to Luke Kenneth Casson Leighton from comment #74) > (In reply to Jacob Lifshay from comment #73) > > Also, please decide where the funding should come from, since bug 383's > > funding appears to be all used up, though I have no clue where. > > the subtask budget field had not yet been filled in. So, everything's completed for this bug in terms of getting paid? IIRC there was something else you wanted done first, Luke.