This is the next logical step after a temporary "magic" instruction. Things work with the magic instruction, but disassembler does not get us a real listing. Once we support the real prefix, this would make objdump work correctly.
Exactly as I predicted in https://bugs.libre-soc.org/show_bug.cgi?id=849#c2, in scope of task 849, I already successfully introduced some changes related to prefix. For assembler, we already operate with the first part as with prefix; it's therefore only the disassembler which will need more tuning to be able to recognize the instructions.
The rest of this task will need more tuning. We'll basically have to implement what we have for assembler, but in an opposite direction. Let's assume the code below. sv.addi %r3, %r1, 125 sv.extsw./ff=eq 5, 67 With the current objdump, this code is printed as below: 0: 00 00 40 05 .long 0x5400000 4: 7d 00 61 38 addi r3,r1,125 8: 04 02 40 05 .long 0x5400204 c: b5 07 65 7c extsw. r5,r3 I already managed to make objdump recognize this as a whole instruction, and even added our prefix: 0: 00 00 40 05 sv.addi r3,r1,125 4: 7d 00 61 38 8: 04 02 40 05 sv.extsw. r5,r3 c: b5 07 65 7c The rest, like reconstructing the operands and qualifiers, will take much more time. We'll basically have to invent custom "extraction" routines for each and every operand, and also perhaps hack other stuff around to affect the instruction name with qualifiers. Stay tuned.
Here's what we have in binutils: #define A(op, xop, rc) \ (OP (op) \ | ((((uint64_t)(xop)) & 0x1f) << 1) \ | (((uint64_t)(rc)) & 1)) #define A_MASK A (0x3f, 0x1f, 1) {"fsub", A(63,20,0), AFRC_MASK, PPCCOM, PPCEFS|PPCVLE, {FRT, FRA, FRB}}, {"fsub.", A(63,20,1), AFRC_MASK, PPCCOM, PPCEFS|PPCVLE, {FRT, FRA, FRB}}, Notice A(63,20,0) and A(63,20,1) fields. They serve the purpose of instantiating the "opcode" field; this field is exactly what we use for lookup in disassembler code. We can get the following upon calling openpower.decoder.power_svp64.SVP64RM.get_svp64_csv: minor_63.csv shows that the major opcode is 63 (op == 63). 20 can be obtained from -----11100 record for fsub at minor_63.csv. 0/1 corresponds to RC field. My question is, do we have some common technique to construct the whole "opcode" stuff? I mean, I understand the logic for cases like major.csv (where the OP is a simple integer) and for cases like minor_#.csv (where OP is # and XOP is determined by the entry). But how do we address other CSVs? Perhaps there's a simple way to get the "opcode" immediately in Python? For the record, here are CSVs we currently handle: paths = ( "minor_19.csv", "minor_30.csv", "minor_31.csv", "minor_58.csv", "minor_62.csv", "minor_22.csv", "minor_5.csv", "minor_63.csv", "minor_59.csv", "major.csv", "extra.csv", )
We can cheat and, for each SVP64 instruction name, find it in vanilla PPC entries, and get opcode there. The issues here are: 1) disassembler doesn't build the hash table for names, it only hashes opcodes; 2) even if we add it, it wastes time upon startup.
(In reply to Dmitry Selyutin from comment #3) > My question is, do we have some common technique to construct the whole > "opcode" stuff? yyees... sort-of, i mean it's part of that wossname bugreport the one for cleaning up (automated) decode svp64.py ah! bug #838. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder.py;hb=HEAD#l702 returns an object of type TopPowerDecoder which contains the pattern-recognition list of XOs and associated bit-positions where they start from. just... err... in LSB0 order not MSB0. > I mean, I understand the logic for cases like major.csv > (where the OP is a simple integer) and for cases like minor_#.csv (where OP > is # and XOP is determined by the entry). But how do we address other CSVs? > Perhaps there's a simple way to get the "opcode" immediately in Python? well, they're bit-pattern-matching, a mask plus a match. in *theory* they could be turned into a single match plus start/end because i don't believe there's any that are "---111---00101---" i.e. all bits are contiguous. i think. have to check.
739 Subdecoder(pattern=22, opcodes=get_csv("minor_22.csv"), 740 opint=False, bitsel=(0, 11), suffix=None, subdecoders=[]), 20 0110001110-,ALU,OP_CPROP, 21 ------10001,ALU,OP_BMASK, 22 -----00011-,ALU,OP_FMVIS, 23 -----01011-,ALU,OP_FISHMV, so here you can parse the minor_22.cvs bitpattern -------10001 and go "oh, bitsel starts from 0 to 11 but actually we are only interested in 0...4 because the rest is '-'s. so actually this turns into [0:4] == 0b10001 and for fmvis that would be: "oh, "-----00011-" bitpattern starts at 1 so actually we want 1...5 this turns [1:5] == 0b00011 something like that.
yep none of these are "----1111----000" they are all contiguous 22.csv:-----11011-, 22.csv:-----011001, 22.csv:-----101001, 22.csv:-----111001, 22.csv:-----10011-, 22.csv:0111001110-, 22.csv:0011001110-, 22.csv:0101001110-, 22.csv:0001001110-, 22.csv:1101001110-, 22.csv:1011110110-, 22.csv:1001110110-, 22.csv:1111110110-, 22.csv:0111110110-, 22.csv:0110001110-, 22.csv:------10001, 22.csv:-----00011-, 22.csv:-----01011-, 59.csv:-----10010,F 59.csv:-----10100,F 59.csv:-----10101,F 59.csv:-----10110,F 59.csv:-----11000,F 59.csv:-----11001,F 59.csv:-----11010,F 59.csv:-----11100,F 59.csv:-----11101,F 59.csv:-----11110,F 59.csv:-----11111,F 59.csv:-----00100,F 59.csv:-----00101,F 59.csv:-----00110,F 59.csv:-----00111,F 59.csv:-----01101,F 59.csv:-----01111,F 5.csv:--------00-,S 5.csv:0010010110-,S 5.csv:-011010110-,S 5.csv:0010110110-,S 5.csv:0011110110-,S 63.csv:-----10010,F 63.csv:-----10100,F 63.csv:-----10101,F 63.csv:-----10110,F 63.csv:-----10111,F 63.csv:-----11000,F 63.csv:-----11001,F 63.csv:-----11010,F 63.csv:-----11100,F 63.csv:-----11101,F 63.csv:-----11110,F 63.csv:-----11111,F
What really always pisses me off is that many our sources mix pure-Python concepts and stuff like Signal and similar crap. The pure-Python code which generates the C code for binutils should not really expect stuff like Signal or verilog or rtutil. This seriously complicates the options to use the code in a simple way. Why, for God's sake, if I want to generate binutils code, should I install verilog or nmigen!? I know, the reason is that nmigen and verilog needs were there way before; still we should care about it now. But, putting this rage aside for a while (I'm not yet done with it, we'll return to this question eventually, because this is fundamentally wrong): divide_opcodes has most of what we need. I'll think of how to handle this. Either I'll create a PowerDecoder (duplicating some of create_pdecode stuff) or will handle it in a bit different way.
(In reply to Dmitry Selyutin from comment #8) > What really always pisses me off is that many our sources mix pure-Python > concepts and stuff like Signal and similar crap. The pure-Python code which > generates the C code for binutils should not really expect stuff like Signal > or verilog or rtutil. This seriously complicates the options to use the code > in a simple way. Why, for God's sake, if I want to generate binutils code, > should I install verilog or nmigen!? I know, the reason is that nmigen and > verilog needs were there way before; still we should care about it now. because when PowerDecoder was first written 2 years ago it was initially only intended for HDL. i then wanted to use it in the ISACaller Simulator and thought, "ah HA! if i pass in an abstract class instance which has the exact same capability as nmigen Signal, nmigen Switch, nmigen Case and nmigen Mux, i can make a PowerDecoder which does NOT depend on nmigen" it should be fairly obvious from the fact that this has not *yet* happened that it was decided that (a) there wasn't time and (b) PowerDecoder is already so horrendously abstract that it would cause people to run away screaming... > But, putting this rage aside for a while (I'm not yet done with it, we'll > return to this question eventually, because this is fundamentally wrong): > divide_opcodes has most of what we need. I'll think of how to handle this. > Either I'll create a PowerDecoder (duplicating some of create_pdecode stuff) > or will handle it in a bit different way. no please REALLY do not duplicate the functionality of PowerDecoder, it is extremely sophisticated and it would be a nightmare to have two sets of code. that's my first instinctive reaction. the second reaction is: actually it may only be about 250 lines so you might get away with it, but it is really complex data structures so making me nervous. my third reaction is, you've completely missed the point (aside from the import, which could be solved by splitting out the appropriate code) and it is this: you don't *actually* need to call any nmigen at all, you only need to walk the data structures (the static hierarchy of Named Tuples) to get the XOs. in other words what you are seeing as a dependency on nmigen is ONLY caused by these lines: 767 return TopPowerDecoder(32, dec, name=name, col_subset=col_subset, 768 row_subset=row_subset, 769 conditions=conditions) 770 if you move the entire contents of create_pdecode except that one line into a submodule and move Subdecoder (line 110) to the same file i think you'll find the problem (for you) has completely gone away. sorry i meant to spell that out. yes v. likely that divide_opcodes() creates the mask needed for spotting XO patterns, feel free to create a class that duplicates the recursive walk of PowerDecoder but only after moving the majority of create_pdecode and Subdecoder to a separate module, calling that new function from power_decoder.py:create_pdecode(), and making damn sure everything still works, first. the last thing we need is two duplicate places of supported opcodes. bottom line is, the information you are after is actually obtainable and does not need nmigen, at all.
(In reply to Luke Kenneth Casson Leighton from comment #9) > i then wanted to use it in the ISACaller Simulator and thought, > "ah HA! if i pass in an abstract class instance which has the exact > same capability as nmigen Signal, nmigen Switch, nmigen Case and nmigen > Mux, i can make a PowerDecoder which does NOT depend on nmigen" Noooooo, this'd be mad. I meant quite the opposite: there's no need to implement the decoder here, at all. The opcode should simply be part of instruction data. More details below. > it should be fairly obvious from the fact that this has not *yet* > happened that it was decided that (a) there wasn't time and (b) PowerDecoder > is already so horrendously abstract that it would cause people to run > away screaming... The thing I'm talking about is _not_ PowerDecoder. I'm talking of basic instruction-related information. Providing the way to quickly calculate the opcode (both with and without the operands, but at least without them) is natural. My point is, it's not expected to be buried that deep inside the decoder. The opcode is expected to be part of data which is associated with the instruction. When we parse the tables and output some stuff about the instructions, the basic opcode (operand-less) must be part of this information. > if you move the entire contents of create_pdecode except that one line > into a submodule and move Subdecoder (line 110) to the same file i think > you'll find the problem (for you) has completely gone away. Subdecoders are just named tuples; the logic is buried inside TopPowerDecoder, or, more correctly, inside its base class, PowerDecoder. Both are Elaboratable, both do way more than obtaining the simple information (building dynamic switches, creating some Signals and whatever stuff is under the hood). My point is, for code generation, there's no need to rely on any of these: basically the only stuff needed is what divide_opcodes produces. And my point is, this information should be associated with instruction, not be hidden. > sorry i meant to spell that out. No, this is actually perfect that you do it, and I'm really grateful you're doing this. First, we must discuss it; second, I'm really glad that you shed the light to many topics, because historical background and architectural details are priceless. > yes v. likely that divide_opcodes() creates the mask needed for spotting > XO patterns, feel free to create a class that duplicates the recursive > walk of PowerDecoder but only after moving the majority of create_pdecode > and Subdecoder to a separate module, calling that new function from > power_decoder.py:create_pdecode(), and making damn sure everything still > works, first. I'm thinking of having the opcode coming as part of classes which are returned by get_svp64_csv wrapper. That is, for each instruction, it's perfectly doable. That is, think of it as "parse each and every CSV with the sacred knowledge of bitsels, suffices, etc., but don't create the decoder, just calculate all the information about the instructions that can be calculated statically, associating the data on per-instruction basis". Basically, yes, create_pdecode w/o the decoder, kinda like a big map of instructions (again, each instruction is a dict-like stuff, opcode included). > bottom line is, the information you are after is actually obtainable > and does not need nmigen, at all. This exactly what I'm talking of. Good to know we're on the same page.
So, TL;DR: what I've been thinking of is kinda like InstructionDatabase class, which parses all CSVs, a collection of instructions (likely searchable by both name/opcode). This instruction database parses all CSVs and also has some intimate knowledge about bitsels, suffixes, etc. InstructionDatabase would keep a collection of Instruction instances; each Instruction would provide the data, including stuff which needs some calculations. For example, we calculate some SVP64 fields based on CSVs, but the calculations are hidden inside CSV parser (get_svp64_csv). Instead, this should be hidden inside Instruction class. Same applies to the opcode. Also, the logic with bitsels and similar stuff should be wrapped too, exactly like create_pdecode does, except that it doesn't need to create the actual decoder. :-) Also, I should apologize, I think the overall tone of the first message was unnecessarily harsh. I had to dig into this for almost two consecutive days (sleep and food excluded), so it made me somewhat impatient and nervous. :-) I'm sorry, please don't consider it personal.
rright. ok. so the CSV files originally came from microwatt decode1.vhdl. the missing information which is provided by create_decode() is, "what the hell does column 1 (opcode) actually mean". that's all. and rhe information that is missing is: 1. bitsel (start,end) 2. opint that's it. that's all. that's the *only* information that each CSV file is missing. strictly speaking even opint can be determined "does this opcode have minus-symbols in it, yes no". now, about 6 weeks ago we added the ability to put comments into the CSV files. i have no problem *at all* with putting bitsel into the 1st line of the CSV file as a comment: # bitsel=0:5 and then: 1) adapting create_pdecode to read it 2) forgetting about create_pdecode() entirely for what you want. once thar bitsel info is associated with the CSV file itself then power_decode.py no longer is the canonical location for being able to interpret the CSV files. then get_csv_file() can add two extra columns on read, one is an XO field, the other XO_mask. it should be obvious what they are.
(In reply to Dmitry Selyutin from comment #11) > So, TL;DR: what I've been thinking of is kinda like InstructionDatabase > class, which parses all CSVs, a collection of instructions (likely > searchable by both name/opcode). This instruction database parses all CSVs > and also has some intimate knowledge about bitsels, suffixes, etc. > InstructionDatabase would keep a collection of Instruction instances; each > Instruction would provide the data, including stuff which needs some > calculations. i like it. a better version of get_csv(). > For example, we calculate some SVP64 fields based on CSVs, but the > calculations are hidden inside CSV parser (get_svp64_csv). Instead, this > should be hidden inside Instruction class. Same applies to the opcode. Also, > the logic with bitsels and similar stuff should be wrapped too, exactly like > create_pdecode does, except that it doesn't need to create the actual > decoder. :-) bear in mind the,stuff i wrote in comment #12 applies to scalar 32bit ops, absolutely nothing to do with svp64 at all... but yes, i agree. > Also, I should apologize, I think the overall tone of the first message was > unnecessarily harsh. I had to dig into this for almost two consecutive days > (sleep and food excluded), so it made me somewhat impatient and nervous. :-) > I'm sorry, please don't consider it personal. pffh i recognise a stress-out when i see one because i do it all the time. if you'd said "stress stress blah blah oh and it's YOUR fault totally" then my alarms would be hitting the defcon 4 button, but i saw nothing to get concerned about at all. just a perfectly normal "aaaaaaa!" moment.
Good news folks! With the recent code which I implemented in openpower-isa, the opcode value and mask generation, we're able to retrieve the whole set of stuff needed to decode everything, including the corresponding `struct svp64_record`. I'm already marking the instructions as 8-byte, and output the correct name. The next step is to decode the operands values as needed.
Adding new enum to sv_binutils is easy. The patch... https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=783d7c6c2ab159b4ca38d6fa6254697e8151939f ...the results: enum svp64_function { SVP64_FUNCTION_NONE, SVP64_FUNCTION_ALU, SVP64_FUNCTION_LDST, SVP64_FUNCTION_SHIFT_ROT, SVP64_FUNCTION_LOGICAL, SVP64_FUNCTION_BRANCH, SVP64_FUNCTION_CR, SVP64_FUNCTION_TRAP, SVP64_FUNCTION_MUL, SVP64_FUNCTION_DIV, SVP64_FUNCTION_SPR, SVP64_FUNCTION_MMU, SVP64_FUNCTION_SV, SVP64_FUNCTION_VL, SVP64_FUNCTION_FPU, }; struct svp64_desc { uint64_t function : 4; /* snip */ }; { .name = "sv.ldx", .opcode = { .value = 0x7c00002a, .mask = 0xfc0007fe, }, .desc = { .function = SVP64_FUNCTION_LDST, /* snip */ }, },
As of today, the development version of binutils is already capable of producing this: $ ./objdump -dr -Mlibresoc /tmp/test.o /tmp/test.o: file format elf64-powerpcle Disassembly of section .text: 0000000000000000 <.text>: 0: e0 18 40 05 sv.add r127,r1,*r67 4: 14 82 e1 7f
Completed extra and started working on specifiers. I already took the decision on how the modes handling would look, and fixed multiple issues in scope of it, the most important are: 1. We generated the Function emum instead SVMode enum. This is not what we want, we differ by SVMode categories. Since there already was Mode in sv_binutils, that one was renamed to ModeConst: this is a candidate to be deprecated anyway. 2. We'll need Rc value which binutils don't have (they simply have both "add" and "add." entries). We, however, need this to affect the mode matching. Therefore yet another bit from Desc was occupied, leaving us only 5 unused bits. Note, however, that this bit was freed by switch from Function (4 bits) to SVMode (3 bits). The changes can be found in binutils branch at openpower-isa.
I haven't checked everything yet, but normal modes seem to work. 0: e0 3f 4c 05 sv.add./dw=8 *r3,*r7,*r11 4: 15 12 01 7c 8: e0 3f 48 05 sv.add./dw=16 *r3,*r7,*r11 c: 15 12 01 7c 10: e0 3f 44 05 sv.add./dw=32 *r3,*r7,*r11 14: 15 12 01 7c 18: e0 3f 43 05 sv.add./sw=8 *r3,*r7,*r11 1c: 15 12 01 7c 20: e0 3f 42 05 sv.add./sw=16 *r3,*r7,*r11 24: 15 12 01 7c 28: e0 3f 41 05 sv.add./sw=32 *r3,*r7,*r11 2c: 15 12 01 7c 30: e0 3f 4e 05 sv.add./dw=8/sw=16 *r3,*r7,*r11 34: 15 12 01 7c 38: e0 3f 49 05 sv.add./dw=16/sw=32 *r3,*r7,*r11 3c: 15 12 01 7c 40: e0 3f 47 05 sv.add./dw=32/sw=8 *r3,*r7,*r11 44: 15 12 01 7c 48: e0 3f 45 05 sv.add./w=32 *r3,*r7,*r11 4c: 15 12 01 7c 50: e0 3f 4f 05 sv.add./w=8 *r3,*r7,*r11 54: 15 12 01 7c 58: e0 3f 4a 05 sv.add./w=16 *r3,*r7,*r11 5c: 15 12 01 7c 60: e0 7f 4d 05 sv.add./dw=8/sw=32/vec2 *r3,*r7,*r11 64: 15 12 01 7c 68: e0 3f 60 05 sv.add./m=r3 *r3,*r7,*r11 6c: 15 12 01 7c 70: e0 3f 50 05 sv.add./m=1<<r3 *r3,*r7,*r11 74: 15 12 01 7c 78: e0 3f d0 05 sv.add./m=~r10 *r3,*r7,*r11 7c: 15 12 01 7c 80: e0 3f e0 07 sv.add./m=so *r3,*r7,*r11 84: 15 12 01 7c 88: e0 3f d0 07 sv.add./m=ne *r3,*r7,*r11 8c: 15 12 01 7c 90: e0 3f 40 07 sv.add. *r3,*r7,*r11 94: 15 12 01 7c 98: e0 3f 40 05 sv.add. *r3,*r7,*r11 9c: 15 12 01 7c
0000000000000000 <vpx_get4x4sse_cs_svp64_real>: 0: 00 00 e0 38 li r7,0 4: a4 0f 84 78 sldi r4,r4,1 8: a4 0f c6 78 sldi r6,r6,1 c: b6 07 00 58 setvl r0,r0,4,0,1,1 10: 00 30 40 05 sv.lha *r10,0(r3) 14: 00 00 43 a8 18: 14 22 63 7c add r3,r3,r4 1c: 00 30 40 05 sv.lha *r14,0(r3) 20: 00 00 63 a8 24: 14 22 63 7c add r3,r3,r4 28: 00 30 40 05 sv.lha *r18,0(r3) 2c: 00 00 83 a8 30: 14 22 63 7c add r3,r3,r4 34: 00 30 40 05 sv.lha *r22,0(r3) 38: 00 00 a3 a8 3c: 00 00 00 60 nop 40: 00 30 40 05 sv.lha *r26,0(r5) 44: 00 00 c5 a8 48: 14 32 a5 7c add r5,r5,r6 4c: 00 30 40 05 sv.lha *r30,0(r5) 50: 00 00 e5 a8 54: 14 32 a5 7c add r5,r5,r6 58: 00 30 40 05 sv.lha *r34,0(r5) 5c: 00 00 05 a9 60: 14 32 a5 7c add r5,r5,r6 64: 00 30 40 05 sv.lha *r38,0(r5) 68: 00 00 25 a9 6c: b6 1f 00 58 setvl r0,r0,16,0,1,1 70: c0 36 40 05 sv.subf *r42,*r10,*r26 74: 50 30 42 7d 78: c0 36 40 05 sv.mulld *r58,*r42,*r42 7c: d2 51 ca 7d 80: 04 06 40 05 sv.add/mr r7,*r58,r7 84: 14 3a ee 7c 88: 78 3b e3 7c mr r3,r7 8c: 20 00 80 4e blr