adding the vector opcodes is needed, to the CSV files, unit tests, etc. https://libre-soc.org/openpower/sv/vector_ops/
andrey can you do carry-prop (cprop) first, as i took a look last night at the page jacob found https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation) and noticed there are patterns * pattern 1: x / ~x * pattern 2: x+1 / x-1 / ~(x+1) / -x * pattern 3: | / & / ^ and from that it becomes possible to create a suite of instructions covering every possible combination of those 3 patterns (5 bits) so i will need time to sort that. carry-prop, however, is clear and is dead-easy as well: one line of pseudo-code (ok, 3): P = (RA) G = (RB) RT = ((P|G)+G)^P the relevant line from the table on the bitmanip page is this: https://libre-soc.org/openpower/sv/bitmanip/ NN RT RA RB 0 11 0001 110 Rc vec cprop X-Form from which you can construct the appropriate XO-Field to go into minor_22.csv
andrey if you cookie-cut say maxs from here and replace the pseudocode (and description) you'll have everything you need https://libre-soc.org/openpower/isa/av/
we'll also want shifting by 1 bit to cover finding up to and including/excluding lowest set bit. x ^ (x - 1) => set up to lowest set bit inclusive (x ^ (x - 1)) >> 1 => set up to lowest set bit exclusive we'll also want the option to bit-reverse both input and output so we can do first set msb rather than first set lsb.
The nomenclature for pseudo-code is in the PowerISA spec, sections 1.3.2 onwards. These are the instructions Luke gave at the end of the call yesterday: pywriter add av.mdwn pywriter noall av I added the changes to av.mdwn and minor_22.csv (don't have write permission to openpower-isa, will push once given). Now on to some question: Does cprop stand for Carry Propagate? What does it actually do? Does it take bits lower down, and shift them up? I tried calculating the pseudo-code with two 4-bit numbers (RA:1011, RB:0110, result: 1111) on paper, didn't understand the signifance of the result. Also is cprop a bitmanip instruction? If so, does it need to go into bitmanip.mdwn? In the minor_22.csv, the entries are: opcode,unit,internal op,in1,in2,in3,out,CR in,CR out,inv A,inv out,cry in,cry ou From the pseudo-code alone I can't tell if carry in/out are being used. It looks like there are only two inputs: RA, RB; one output RT. After looking at other instructions, Rc seems to determine something (1-bit bitfield).
(In reply to Jacob Lifshay from comment #3) > we'll also want shifting by 1 bit to cover finding up to and > including/excluding lowest set bit. that's 6 mode bits > x ^ (x - 1) => set up to lowest set bit inclusive > (x ^ (x - 1)) >> 1 => set up to lowest set bit exclusive > > we'll also want the option to bit-reverse both input and output so we can do > first set msb rather than first set lsb. that's 8 mode bits. this needs 5 bits: +def bmask(mode, RA, RB=None, zero=False): + RT = RA if RB is not None and not zero else 0 + mask = RB if RB is not None else 0xffffffffffffffff + a1 = RA if mode&1 else ~RA + mode2 = (mode >> 1) & 0b11 + if mode2 == 0: + a2 = -RA + if mode2 == 1: + a2 = RA-1 + if mode2 == 2: + a2 = RA+1 + if mode2 == 3: + a2 = ~(RA+1) + a1 = a1 & mask + a2 = a2 & mask + mode3 = (mode >> 3) & 0b11 + if mode3 == 0: + RT = a1 | a2 + if mode3 == 1: + RT = a1 & a2 + if mode3 == 2: + RT = a1 ^ a2 + return RT & mask * 10-bits XO is the "norm" for X-Form * 5-bits XO is the "norm" for high-cost (VA-Form for example), leaving * 5-bits for Mode however with a budget of only 10-bits for XO: * 6-bits mode leaves only 4 bits for XO * 8-bits mode leaves only 2 bits for XO the table on the bitmanip page has room - barely - for more opcodes unless grevlogw is removed https://libre-soc.org/openpower/sv/bitmanip/ and even then, it would be without an Rc=1 option. also i was planning to add a "merge" option L=1 (zero=True/False in the pseudocode above) if practical which leaves only 1 bit and that's an entire Major Opcode for the entire instruction. the only other alternative is to start absorbing some of the 5-XO-bit portions of Major 19, Major 31 etc. which if we propose too many of those the ISA WG is going to get pissed.
(In reply to Andrey Miroshnikov from comment #4) > The nomenclature for pseudo-code is in the PowerISA spec, sections 1.3.2 > onwards. > > These are the instructions Luke gave at the end of the call yesterday: > pywriter > add av.mdwn > pywriter noall av > > I added the changes to av.mdwn and minor_22.csv (don't have write permission > to openpower-isa, will push once given). > > Now on to some question: > > Does cprop stand for Carry Propagate? yes. > What does it actually do? computes the carry bit(s) needed for big-integer math in a single instruction. > Does it take > bits lower down, and shift them up? > I tried calculating the pseudo-code with two 4-bit numbers (RA:1011, > RB:0110, result: 1111) on paper, didn't understand the signifance of the > result. > > Also is cprop a bitmanip instruction? yes. > If so, does it need to go into bitmanip.mdwn? doesn't matter for now > In the minor_22.csv, the entries are: > opcode,unit,internal op,in1,in2,in3,out,CR in,CR out,inv A,inv out,cry > in,cry ou > > From the pseudo-code alone I can't tell if carry in/out are being used. look at fixedarith.mdwn. > It > looks like there are only two inputs: RA, RB; one output RT. and a co-result, CR0 (which comes from the Rc=1 option) hence why the page needs two entries "cprop RT,RA,RB" *and* "cprop. RT,RA,RB" > After looking > at other instructions, Rc seems to determine something (1-bit bitfield). yes. remember i said "just cookie-cut maxs literally", that includes its entry in minor_22.csv (sorry forgot to emphasise that) just cut/paste that line, update column 1 (---NNNNNN), update column 2 (s/maxs/cprop), and the rest is good.
https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/bmask.py;hb=HEAD this is currently producing the right answer for the first example params when mode=0b001110 but not the others, when mask is non-zero 30 m = 0b11000011 31 v3 = 0b10010100 # vmsbf.m v2, v3 32 v2 = 0b01000011 # v2 it's important to replicate the full functionality of sof/sif/sbf and that includes having a "predicate mask" (aka, a GPR which might happen to be r3, r10 or 31) i'm currently brute-force experimenting with bmask.py to find something vaguely resembling the output of sbf.py https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/sbf.py;hb=HEAD
ha! found them! https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=7b6eb743caafb5bc6846d2d47c3040025a961460 --- a/openpower/sv/bmask.py +++ b/openpower/sv/bmask.py @@ -1,6 +1,7 @@ def bmask(mode, RA, RB=None, zero=False): RT = RA if RB is not None and not zero else 0 mask = RB if RB is not None else 0xffffffffffffffff + RA = RA & mask a1 = RA if mode&1 else ~RA mode2 = (mode >> 1) & 0b11 if mode2 == 0: @@ -22,7 +23,9 @@ def bmask(mode, RA, RB=None, zero=False): RT = a1 ^ a2 return RT & mask -SBF = 0b001110 +SBF = 0b01010 +SOF = 0b01001 +SIF = 0b10000 # 10011 also works no idea why yet i'm so happy and full of great joy. w00t. etc. that's masking working properly *and* covering the entirety of the x86 BMI1 and TBM bitmanip set, *in a way that can be used for Vector Masks*. frickin cool. now, there's one set of mode-bits (0b11000 -> 0b11111) which in *theory* could be used for another mode, like you suggest in comment #3 (shift-down-by-one) although, to be honest, if it's *really* just "shift-down by one" or "invert" input or invert output i'm inclined to suggest just using an extra 32-bit instruction for that. sradi. it depends on whether mask interacts with things and makes life more complex than just "do a shift afterwards"
https://libre-soc.org/openpower/sv/vector_ops/discussion/ bmask pseudocode draft at the top, BM2-Form has been added, all pieces in place to add this in.
me on irc: > lkcl, imho sv.adde is sufficient for biginteger add, cprop is rendered > redundant because you can just do the trick of having your 256-bit > simd unit do a 256-bit add and forward co from the previous clock cycle > to ci in the current cycle to get full-speed bigint add > so imho we should remove cprop
(In reply to Jacob Lifshay from comment #10) > me on irc: > > lkcl, imho sv.adde is sufficient for biginteger add, cprop is rendered > > redundant because you can just do the trick of having your 256-bit > > simd unit do a 256-bit add and forward co from the previous clock cycle > > to ci in the current cycle to get full-speed bigint add > > > so imho we should remove cprop lkcl: > programmerjake, i was kinda thinking either well beyond 256, 512 or > 1024, and also of other circumstances invlving carry > and, also, for other vector mask purposes, problem being it was 20 > years ago i worked with the Aspex ASP me: > beyond 1024 bits? just use the CA register to hold carry between > one vector add and the next. also, scalar adde can be used as a > carry propagate instruction like cprop, but with the inputs encoded > differently. > for adde RT, RA, RB: set the bit in RA when the element add > produces >= 0xFFFF...FFFF, set the bit in RB when the element add overflows. > the same sv.adde 256-bit and carry forwarding tricks work for sv.subfe > so, imho cprop is still rendered unnecessary
(In reply to Luke Kenneth Casson Leighton from comment #9) > https://libre-soc.org/openpower/sv/vector_ops/discussion/ > > bmask pseudocode draft at the top, BM2-Form has been added, > all pieces in place to add this in. I tried to make a minor_22.csv entry for bmask based on the info in https://libre-soc.org/openpower/sv/bitmanip/ but I don't really understand this well enough as the instruction bitfields are different: 10001,L,mode,ALU,OP_BMASK,RA,RB,NONE,RT,NONE,NONE,0,0,ZERO,0,NONE,0,0,0,0,0,0,1,0,0,bmask,X,,1,unofficial until submitted and approved/renumbered by the opf isa wg
(In reply to Andrey Miroshnikov from comment #12) > I tried to make a minor_22.csv entry for bmask based on the info in > https://libre-soc.org/openpower/sv/bitmanip/ > > but I don't really understand this well enough as the instruction bitfields > are different: > 10001,L,mode,ALU,OP_BMASK,RA,RB,NONE,RT,NONE,NONE,0,0,ZERO,0,NONE,0,0,0,0,0, > 0,1,0,0,bmask,X,,1,unofficial until submitted and approved/renumbered by the > opf isa wg rright, ok, look at the CSV headings https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isatables/minor_22.csv;hb=HEAD opcode,unit,internal op,in1,in2,in3,out,CR in,CR out, 10001,L,mode,ALU,OP_BMASK,RA,RB,NONE opcode=10001 unit=L?? internal op = mode?? in1 = ALU?? that cant' be right, can it? how about this: opcode,unit,internal op,in1,in2,in3,out,CR in,CR out, 10001,ALU,OP_BMASK,RA,RB,NONE opcode=10001 good unit=ALU ah ha! that's making sense internal op=OP_BMASK ok that's better in1=RA looking more like it so that must be close. what about the rest (at the end)? sgn,rc,lk,sgl pipe,comment,form,CONDITIONS,unofficial,comment2 > 0,1,0,0,bmask,X,,1,unofficial un sgn=0 # ok rc=1 # wrong, it's not an Rc=1. lk=0 # ok sgl pipe=0 # ok comment=bmask # correct form=X # wrong, it's listed as BM2-Form so that last bit should be: 0,NONE,0,0,bmask,BM@,,1,unofficial un... now, there's *one* more thing, which is slightly complicated. look closely at the OP_SETVL and e.g. OP_MINMAX entries: -----11011-,VL,OP_SETVL, -----011001,VL,OP_SVSHAPE, -----111001,VL,OP_SVREMAP, -----10011-,VL,OP_SVSTEP, 0111001110-,ALU,OP_MINMAX, 0011001110-,ALU,OP_MINMAX, ... now let's look at the corresponding bitmanip table: https://libre-soc.org/openpower/sv/bitmanip/ setvl: 0.5 26....30 31 name Form NN 11 011 Rc setvl SVL-Form av max: 0.5 21..25 26....30 31 name Form NN 01110 01110 Rc avmax X-Form can you see how in the bit-positions "21..25" for setvl, there is "------"? this says to the PowerDecoder "don't try to match against those bits". so we need to do the same thing for bmask, ***BUT***, look again at the table: bmask: 0.5 26....30 31 name Form NN L 1000 1 bmask BM2-Form so that's going to be: * five "-"s in bitpositions 21..25 * one "-" in bitposition 26 (for the "L") * four bits "1000" in 27..30 * one "1" in bit 31 to give: ------10001 so where you had this: opcode,unit,internal op,in1,in2,in3,out,CR in,CR out, 10001,ALU,OP_BMASK,RA,RB,NONE.... it should in fact be this: opcode,unit,internal op,in1,in2,in3,out,CR in,CR out, ------10001,ALU,OP_BMASK,RA,RB,NONE.... that says "match ONLY bits 27..31 against 10001 but IGNORE 21..26" a *second* job of PowerDecoder is to look up the av.mdwn file, and get the "Form" (BM2) and the line "bmask RT,RA,RB,mode,L", then the job of power_fields.py is to decode fields.txt, look at the BM2 and find the bit-positions of L and mode (oh, and RT, RA and RB)
(In reply to Jacob Lifshay from comment #11) > for adde RT, RA, RB: set the bit in RA when the element add as a Vectorised instruction to produce a vector of carry-propagation bits that's ultra-expensive, triggering an astounding number of register hazards and causing huge numbers of 64-bit registers to be utilised for the sole purpose of storing binary single-digit values. cprop is one single 32-bit scalar instruction that produces up to 64 bits of carry-propagation results.
follow-on for context: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder.py;h=2b4799c6dcf53a43879a167250d52389bd83ee7a;hb=8d1e13117cc677247b93542cec6adcf6fc7fd841#l739 739 Subdecoder(pattern=22, opcodes=get_csv("minor_22.csv"), 740 opint=False, bitsel=(0, 11), suffix=None, that says that the pattern-matcher is looking for a string (opint=False --> "---NN-NN---") and that it's looking for a pattern of length 11 in MSB0 bit-positions 21..31 (python range 0,11). yes. i know. because MSB0 because LSB0 because python range-numbering the end is +1, sigh. so that's why the minor_22.csv has opcodes involving "-" don't cares, and why it has to be exactly 11 long
I made the pseudo-code, by PyWriter doesn't like it: if (RB) = 0 then RT <- 0 else RT <- (RA) if (RB) = 0 then mask <- (RB) else mask <- 0xffffffffffffffff RA <- RA & mask if (mode&1) = 1 then a1 <- (RA) else a1 <- (~RA) mode2 <- (mode >> 1) & 0b11 if mode2 = 0 then a2 <- -(RA) if mode2 = 1 then a2 <- (RA)-1 if mode2 = 2 then a2 <- (RA)+1 if mode2 = 3 then a2 <- ~((RA)+1) a1 <- a1 & mask a2 <- a2 & mask mode3 <- (mode >> 3) & 0b11 if mode3 = 0 then RT <- a1 | a2 if mode3 = 1 then RT <- a1 & a2 if mode3 = 2 then RT <- a1 ^ a2 The PowerISA doc said switch statements are supported, but I haven't checked if PyWriter supports them. I'll continue on this tomorrow.
(In reply to Luke Kenneth Casson Leighton from comment #14) > (In reply to Jacob Lifshay from comment #11) > > > for adde RT, RA, RB: set the bit in RA when the element add > > as a Vectorised instruction I'm referring to *scalar* adde, all those references to vector elements are to illustrate how to set the bits in the input registers to make adde do what you want. > > cprop is one single 32-bit scalar instruction that produces up to > 64 bits of carry-propagation results. adde is one single 32-bit scalar instruction that produces 65 bits of carry-propagation results (64 in RT, 1 in CA)
(In reply to Andrey Miroshnikov from comment #16) > I made the pseudo-code, by PyWriter doesn't like it: (In reply to Luke Kenneth Casson Leighton from comment #9) vvvvvvvvvvvvvvvvv > https://libre-soc.org/openpower/sv/vector_ops/discussion/ ^^^^^^^^^^^^^^^^^ vvvvvvvvvvvvvvvvvvvvvv > bmask pseudocode draft at the top, ^^^^^^^^^^^^^^^^^^^^^^
I thought I had replied before, but apparently I forgot to click the submit button. (In reply to Luke Kenneth Casson Leighton from comment #5) > (In reply to Jacob Lifshay from comment #3) > > we'll also want shifting by 1 bit to cover finding up to and > > including/excluding lowest set bit. > > that's 6 mode bits > > > > x ^ (x - 1) => set up to lowest set bit inclusive > > (x ^ (x - 1)) >> 1 => set up to lowest set bit exclusive > > > > we'll also want the option to bit-reverse both input and output so we can do > > first set msb rather than first set lsb. > > that's 8 mode bits. it's actually 7, bit-reverse only happens on both or neither of the input and output. > > this needs 5 bits: > > +def bmask(mode, RA, RB=None, zero=False): > + RT = RA if RB is not None and not zero else 0 > + mask = RB if RB is not None else 0xffffffffffffffff > + a1 = RA if mode&1 else ~RA > + mode2 = (mode >> 1) & 0b11 > + if mode2 == 0: > + a2 = -RA > + if mode2 == 1: > + a2 = RA-1 > + if mode2 == 2: > + a2 = RA+1 this is redundant since RA + 1 == -(~RA) > + if mode2 == 3: > + a2 = ~(RA+1) this is redundant since ~(RA + 1) = (~RA) - 1 removing both of those saves 1 more bit, making it 6 bits with all of my proposed additions.
(In reply to Jacob Lifshay from comment #19) > (In reply to Luke Kenneth Casson Leighton from comment #5) > > +def bmask(mode, RA, RB=None, zero=False): > > + RT = RA if RB is not None and not zero else 0 > > + mask = RB if RB is not None else 0xffffffffffffffff > > + a1 = RA if mode&1 else ~RA > > + mode2 = (mode >> 1) & 0b11 > > + if mode2 == 0: > > + a2 = -RA > > + if mode2 == 1: > > + a2 = RA-1 > > + if mode2 == 2: > > + a2 = RA+1 > > this is redundant since RA + 1 == -(~RA) > > > + if mode2 == 3: > > + a2 = ~(RA+1) > > this is redundant since ~(RA + 1) = (~RA) - 1 > > removing both of those saves 1 more bit, making it 6 bits with all of my > proposed additions. thinking about a bit more, imho the mode2 options should be `RA - 1` and `RA + 1` since that saves gates, not requiring xor gates on the output of the add.
(In reply to Jacob Lifshay from comment #20) > thinking about a bit more, imho the mode2 options should be `RA - 1` and `RA > + 1` since that saves gates, not requiring xor gates on the output of the > add. so i am sort-of getting it, but only because OP_ADD, copied from microwatt, is already subdivided down into * select a or neg-input-a * select add 1/0/CA * select output or neg-output and the end result is to create an amazing number of arithmetic ops with the exact same add hardware. here is the a / neg-a selection anyway: + a1 = RA if mode&1 else ~RA if i understand correctly, what you are saying is that the mode-bits can be "morphed" to do the same thing? saving one bit to add one bit, doesn't totally make sense: if they are totally equivalent there's not much point BUT if things can be morphed such that it fits *directly* with the existing OP_ADD (ok except the OR, AND and XOR) that's worth pursuing because it saves gates.
(In reply to Luke Kenneth Casson Leighton from comment #21) > (In reply to Jacob Lifshay from comment #20) > here is the a / neg-a selection anyway: > > + a1 = RA if mode&1 else ~RA that's bitwise-not, not neg -- I get you point anyway... > > if i understand correctly, what you are saying is > that the mode-bits can be "morphed" to do the same > thing? sorta...I'm saying you only need 1 mode2 bit. The idea is that, currently add/subf/etc. are basically: a = ~RA if subtracting else RA carry_in = 0 if subtracting: carry_in = 1 RT = a + RB + carry_in bmask would (ignoring mask and bit-reverse and shifting) do: a = ~RA if imm & 0b1 else RA b = 1 if imm & 0b10 else -1 # mode2 carry_in = 0 y = a + b + carry_in v00 = 0 v01 = v10 = bool(imm & 0b100) v11 = bool(imm & 0b1000) if v00 == v01 == v10 == v11 == 0: raise IllegalInstruction("other instructions can use the spare space") # 64x 4-in muxes -- basically a binlog operation: # probably saves gates over muxing over and, or, and xor table = [v00, v01, v10, v11] RT = 0 for i in range(64): ra_bit = bool(RA & (1 << i)) y_bit = bool(y & (1 << i)) RT |= table[(ra_bit << 1) | y_bit] << i
(In reply to Jacob Lifshay from comment #22) > > + a1 = RA if mode&1 else ~RA > > that's bitwise-not, not neg -- yes. that's directly from the pseudocode explressions, which took me a while to stop, it's so similar in small fonts. https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation) XOP.LZ.09 01 /1 BLCFILL Fill from lowest clear bit x & (x + 1) XOP.LZ.09 02 /6 BLCI Isolate lowest clear bit x | ~(x + 1) XOP.LZ.09 01 /5 BLCIC Isolate lowest clear bit and complement ~x & (x + 1) XOP.LZ.09 02 /1 BLCMSK Mask from lowest clear bit x ^ (x + 1) XOP.LZ.09 01 /3 BLCS Set lowest clear bit x | (x + 1) XOP.LZ.09 01 /2 BLSFILL Fill from lowest set bit x | (x - 1) XOP.LZ.09 01 /6 BLSIC Isolate lowest set bit n compl. ~x | (x - 1) XOP.LZ.09 01 /7 T1MSKC Inverse mask from trailing ones ~x | (x + 1) XOP.LZ.09 01 /4 TZMSK Mask from trailing zeros ~x & (x - 1) and, further up, BMI1 VEX.LZ.0F38 F3 /3 BLSI Extract lowest set isolated bit x & -x VEX.LZ.0F38 F3 /2 BLSMSK Get mask up to lowest set bit x ^ (x - 1) VEX.LZ.0F38 F3 /1 BLSR Reset lowest set bit x & (x - 1) so this separates out 3 expression groups: 1. x / ~x - this is a1 2. & / ^ / | - this is mode3 3. -x / x-1 / x+1 / ~(x+1) - this is a2 however, on top of that, to get the same set-before-first, set-only-first and set-including-first effect, an *additional* mask is added. > I get you point anyway... so relieved you can interpret fuzzy-logic :) > The idea is that, currently add/subf/etc. are basically: > > a = ~RA if subtracting else RA > carry_in = 0 > if subtracting: > carry_in = 1 > RT = a + RB + carry_in (and an output-invert) if inverted_out: RT = ~RT > bmask would (ignoring mask and bit-reverse and shifting) do: mask is quite important (critical to include), and also i found it... difficult to work out (sotto voice, i had to guess, and eventually found it) > a = ~RA if imm & 0b1 else RA > b = 1 if imm & 0b10 else -1 # mode2 > carry_in = 0 > y = a + b + carry_in ok so this calculates expression (3) is that correct? (with some of the equivalence-conversions (~RA)+1 i believe it is) > v00 = 0 > v01 = v10 = bool(imm & 0b100) > v11 = bool(imm & 0b1000) ahh, a LUT2... it looks like... it's doing and/or/xor. so that's expression (2) > # 64x 4-in muxes -- basically a binlog operation: > # probably saves gates over muxing over and, or, and xor > table = [v00, v01, v10, v11] > RT = 0 > for i in range(64): > ra_bit = bool(RA & (1 << i)) > y_bit = bool(y & (1 << i)) > RT |= table[(ra_bit << 1) | y_bit] << i and the ra input here is not expression (1) which is where the equivalence chain falls over for me. i *suspect* that if an extra bit for output-inversion is included then that might work as above: v00 = 0 v01 = v10 = bool(imm & 0b100) v11 = bool(imm & 0b1000) (out-inversion built-in to LUT2?) v00 ^= bool(imm^0b10000) v01 ^= bool(imm^0b10000) v10 ^= bool(imm^0b10000) v11 ^= bool(imm^0b10000)
https://libre-soc.org/irclog/%23libre-soc.2022-06-24.log.html#t2022-06-24T23:07:05 Andrey, i'm answering here: > just made another test case for bmask, but as I was about to run it, > noticed the generated python function in av.py is bad ah no it isn't. > it generates these statements "if eq(_RB, 0)" that's correct. it's from: "if _RB = 0" > but now noticed "mode" is not defined. it's called "bm" and i renamed it in the demo-code bmask.py to help there https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=c7e7ea66564c9c2ba1a1b1f931b6d37f0269b72a > It's part of the opcode, but does it need to be > one of the input arguments? rright, ok, so this is... a complex part of python programming. the pseudo-code is "global" in nature (it was never intended to be used as an *actual* programming language: we were literally the first people in the world to *make* it a strictly-defined deterministic programming language) the key word there is "global" python is different: there is in fact *local* and global variables, and there are strict rules about how those are separated. in particular, in class functions. now, so as not to make an absolute pig's ear out of the pseudocode compiler, i made a deliberate decision to bypass some of the rules set by python. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=941a89f0bb2eedfc323b3a51c52464bd8cbd03ef;hb=a7f3fa7ab2c87d75d0c562eb12d73e01d19095f1#l1950 that's based on a stackoverflow question "how do i explicitly inject variables into a function namespace" and as jacob later says, it basically allows the X-Form / BM2-Form / whatever-Form opcode fields to be "injected" into the Simulator function. look here, in av.py: from openpower.decoder.isa.caller import inject vvvvvv -> @inject() <- ^^^^^ def op_bmask(self, RB, RA): if eq(_RB, 0): mask = concat(1, repeat=self.XLEN) else: mask = RB > Also in your pseudo-code in *the* pseudocode. > you're skipping bit 0 of "mode" (starting with mode[1]) (bear in mind i renamed "mode" to "bm") no, i haven't. you've misinterpreted this: a1 = ra if bm&1 else ~ra that's an *INTEGER* (bm) and the expression "bm&1" is testing BIT ZERO it helps to view it as this: a1 = ra if bm&0b00001 else ~ra or this: a1 = ra if bm&(1<<0) else ~ra the "0" there refers to "bit 0". if you were correct (which you're not), then it would be: a1 = ra if bm&2 else ~ra *that* would be testing bit 1. BUT butbutbut PLEASE REMEMBER that whilst the python code bmask.py is in "normal" order (LSB0), the av.mdwn is in ***MSB0 ORDER*** thus, these two pseudocode lines from av.mdwn here: a1 <- ra if bm[4] = 0 then a1 <- ¬ra *ARE* repeat *ARE* repeat *ARE* directly and EXACTLY equivalent to this from bmask.py: a1 = ra if bm&1 else ~ra why? because bm is 5 bits long, and therefore bm[4] refers to the LEAST significant bit **NOT** repeat **NOT** to the most significant bit. bottom line *do not* modify the pseudocode, it is correct. if you think you have to modify it, please stop and think "*why* is this correct, what have i not understood about the utter mind-melting weirdness that takes everyone who ever sees MSB0 ordering for the first time 6 months to get used to".
unit test added for bmask, pseudocode confirmed functional. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=c642f6cbdba0c5eeb2e327735f3f58f145c6363a
unit tests all good. closing. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e2ced8a9c0db4853e216a19a96e40569241823b3