The current binutils implementation acts in a way similar to pysvp64asm. This, however, has a huge limitation: we parse operands on our own, and this somewhat duplicates an existing ecosystem. Since we don't use the vanilla code, we also lack support for some its features, like macro substitution. Ideally we should refactor the ppc_assemble (originally md_assemble) in a way so that it deals with SVP instructions directly. This would, in turn, inflict some changes to lexer. The vanilla binutils code is not ready for even dealing with SVP64 operands, since the way we treat the operand depend on operation modes, and binutils don't even have way to pass this context (other than via global variable, which is bad, since we deal on per-instruction basis).
i honestly thought that the parsing of macro substitution and arithmetic computation of expressions were a completely separate pre-procesing phase. what you are saying is that the macro substitution involving register names is *not* separate, which retrospectively makes sense it is worthwhile checking that arithmetic pre-processing is separate or not. i believe it is valid to have this: addi 8/4, r0, 9-5 because those numbers may come from #defines and .set declarations, which must be parse out and eventually become: addi 2, r0, 4 if the arithmetic macro substitution is *not* a pre-processing phase then there exists the opportunity to ensure that this does not get borked: sv.addi/vec2 2, r0, 4 because the "/" is wrongly interpreted as a divide operation if that cannot be solved then sigh we have to use a different character. discussed previously, "?" is free and is not used as an arithmetic op. personally i think this looks silly but hey sv.addi?vec2?sm=r3?ew=8 2, r0, 4
OK, I had to spend some hours in binutils code and I think I came up with the solution. We will simply let the underlying PPC machinery handle overall operands parsing, but, at the same time, will extend it with the special flag. This means that we will also behave differently than pysvp64asm: we'll no longer have a separate stage where we deal with plain old PPC instruction as string. Instead, we'll fetch the special `powerpc_opcode`, but will let the code know it's an SVP64-extended instruction. Bad news: I need time for refactoring, rebasing and code cleanup. Good news: we'll use everything PPC provides us, and also will be much closer to 845 (because the prefix handling already assumes we don't have a two-stage pipeline).
(In reply to Dmitry Selyutin from comment #2) > flag. This means that we will also behave differently than pysvp64asm: we'll > no longer have a separate stage where we deal with plain old PPC instruction > as string. hey, as long as the ".set" macro-hack keeps working and the assembly files are compatible between the two, it's all good > Bad news: I need time for refactoring, rebasing and code cleanup. > Good news: we'll use everything PPC provides us, and also will be much > closer to 845 (because the prefix handling already assumes we don't have a > two-stage pipeline). do consider giving alan and peter a heads-up. also if it means that there's a solution to using "/" in "sv.xxx/yyy/zzz" because it's explicitly avoided, that would be really good. i just tried this in a test.s and it actually worked addi 4/2,2,1 "powerpc64le-linux-gnu-as test.s" did not barf on that.
I recently wrote to Alan, several days ago, to ask whether I can publish patches for a preliminary review. That was before I found the problem with macros. :-) A quick question: we don't need macros support in the instruction modifiers, do we? Vanilla binutils don't evaluate instruction names, and for us all these modifiers act as an instruction name, right? That would simplify stuff a lot, to be honest.
(In reply to Dmitry Selyutin from comment #4) > I recently wrote to Alan, several days ago, to ask whether I can publish > patches for a preliminary review. That was before I found the problem with > macros. :-) > > A quick question: we don't need macros support in the instruction modifiers, > do we? sv.instruction/ew=8 for example? mmm... drat, i can see a use-case for being able to macro-substitute the "8" in "ew=8". and also "sm=something". but, not much else. not the "ew=" itself, not the "sm=" itself, but the arguments. why? because whilst "ew=" is part of the instruction name, the "8" is like an operand. (the only reason it wasn't added *as* an operand RT,RA,RB,ew=8,sm=r3 was because that would seriously interfere with the decoding of operands)
It seems that the support is ready, except for pseudo-operands in expressions like `${A}=${B}`, e.g. `ew=8`.
(In reply to Dmitry Selyutin from comment #6) > It seems that the support is ready, except for pseudo-operands in > expressions like `${A}=${B}`, e.g. `ew=8`. the "ew=" side, being part of the instruction, we don't want to macro-fy. however the "8", definitely. NN means "yes macrofy" other things: * vec2/3/4 no... maybe. a case could be made for vecN * ew=NN yes elwidths * sm=NN dm=NN yes predicate mask sources r3 r10 r31 CR fields all options * modes: no. - no mr - no satu/sats - etc err... really that's it. * elwidths including selelecting "default" * predicate sources including selecting "all 1s" * vec2/3/4 maybe, including selecting "no subvec"
(In reply to Luke Kenneth Casson Leighton from comment #7) > > * vec2/3/4 no... maybe. a case could be made for vecN This must follow all-or-nothing paradigm. Otherwise, at best, the vec qualifiers are ill-formed, or the whole parsing is inconsistent. Options are: a) allow every qualifier to be replaced with macro, that is, every mode can be a macro; 2) replace vec2/vec3/vec4 with vec=N. I'm more inclined to option 2. Modes look like an essential part of the instruction itself. Also, allowing to substitute modes opens a Pandora's box.
(In reply to Dmitry Selyutin from comment #8) > 2) replace vec2/vec3/vec4 with vec=N. like the principle (consistency) but is too long and not recognised by 3D developers. nuts to it. consider it a mode. will explain why on irc latwr
(In reply to Luke Kenneth Casson Leighton from comment #9) > (In reply to Dmitry Selyutin from comment #8) > > > 2) replace vec2/vec3/vec4 with vec=N. > > like the principle (consistency) but is too long and not recognised by 3D > developers. > > nuts to it. consider it a mode. will explain why on irc latwr i originally proposed subvl=1-4, where subvl=1 is the default value
(In reply to Jacob Lifshay from comment #10) > i originally proposed subvl=1-4, where subvl=1 is the default value you have a habit from lack of experience in proposing assembler mnemonics that are far too long. people used to computing from the 1980s and 1990s consider CPU time to be precious. russian assembler writers of the time were amongst some of the best in the world, out of necessity: due to the berlin wall, large capacity memory and high end cpus were impossible to obtain. hard no on 6 letter mnemonics when 2-4 will do.
(In reply to Luke Kenneth Casson Leighton from comment #11) > people used to computing from > the 1980s and 1990s consider CPU time to be precious. russian > assembler writers of the time were amongst some of the best in > the world, out of necessity: due to the berlin wall, large > capacity memory and high end cpus were impossible to obtain. that reason is no longer valid: the amount of time and memory used by assemblers is a very small fraction of what a standard compiler uses, i'd expect >99% of the runtime is used for optimizations rather than in the assembler. increasing the time it would take to assemble all of chromium (1GB of binary with debug info) by 1/10 of a second is an acceptable tradeoff for increased clarity imho. > > hard no on 6 letter mnemonics when 2-4 will do. i disagree...have you ever looked at nvidia's ptx or amdgpu assembler? they use pretty long mnemonics I assume because they're easier to read. wasm also has long mnemonics: i32x4.extadd_pairwise_i16x8_u basically a lot of recent ISAs have longer mnemonics because they're easier to read/understand.
(In reply to Jacob Lifshay from comment #12) > that reason is no longer valid: the amount of time and memory used by > assemblers is a very small fraction of what a standard compiler uses, i'd > expect >99% of the runtime is used for optimizations rather than in the > assembler. increasing the time it would take to assemble all of chromium > (1GB of binary with debug info) by 1/10 of a second is an acceptable > tradeoff for increased clarity imho. that's at the proportion compared to total compile time that you deciding to light a candle in your room probably has a bigger effect due to decreased clock speeds from higher temperatures than having longer mnemonics would.
(In reply to Jacob Lifshay from comment #10) > (In reply to Luke Kenneth Casson Leighton from comment #9) > > (In reply to Dmitry Selyutin from comment #8) > > > > > 2) replace vec2/vec3/vec4 with vec=N. > > i originally proposed subvl=1-4, where subvl=1 is the default value if you want a 3-char mnemonic, how about svl=N imho it's better because it explicitly contains sub-vector which is the part you're actually setting, not the vector length which imho is implied (to an uninformed reader) by vec=N.
I'm OK with any good name for it. My point is that, regardless of the name, it must be consistent per se and be consistent with anything around. Either all modes must allow macros, or none of them. If only "value" part of the mode is allowed to be a macro, then, to be a macro, vec2/vec3/vec4 must be converted to a key-value form. It's not hard to change the code around to make this place work, but this would introduce an inconsistency. Allowing macros in modes would be kinda opening the Pandora's box and make the assembly code barely readable. So, let's either have some key-value mnemonic for vecs, or let's drop the option to allow macros for these.
(In reply to Dmitry Selyutin from comment #15) > I'm OK with any good name for it. dmitry, see comment #9
sw/ew are ready, thinking of how to handle the predicates. Outlined some general considerations in IRC: https://libre-soc.org/irclog/%23libre-soc.2022-07-09.log.html#t2022-07-09T14:50:14.
(In reply to Dmitry Selyutin from comment #17) > general considerations in IRC: > https://libre-soc.org/irclog/%23libre-soc.2022-07-09.log.html#t2022-07-09T14: > 50:14. The code I checked was: .set coco, 8 .set jum, coco .set bo, 32 sv.setb/sw=jum/ew=bo 5, 31 Note that this works in a bit hackish fashion, binutils don't like '/' as separator and think I'm trying to divide something. ¯\_(ツ)_/¯ I therefore discard the separator for a while, until we're done with parsing this expression.
Raised yet another on binutils mailing list about tricky cases like 1<<r3, ~r3 etc.
This proves to be more difficult than I initially thought, since PPC doesn't allow any registers. This code fails: .set REG, %r0 extsw REG, 2 ...and so does this code: .set REG, r0 extsw REG, 2 I've raised the discussion on this as part of the original thread: https://sourceware.org/pipermail/binutils/2022-July/121731.html. Luke, a side note, unless -mregnames is given, this means that we should use 1<<%r3 (1<<r3 also works once -mregnames is active).
(In reply to Dmitry Selyutin from comment #20) > Luke, a side note, unless -mregnames is given, this means that we should use > 1<<%r3 (1<<r3 also works once -mregnames is active). bleuch, far too many random characters. stuff it, let's do "u3" instead, and !sm=r10 * sm=r3 (no change) * sm=u3 (old 1<<r3) * !sm=r31 (old sm=~r31) but keep all 8 cr field options and if people use "!sm=ne" issue a warning, "use sm=eq instead"
(In reply to Luke Kenneth Casson Leighton from comment #21) > (In reply to Dmitry Selyutin from comment #20) > bleuch, far too many random characters. These characters are _not_ random. These characters come from the way this assembly work in PPC. My point is, if 1<<r3 is the most obvious and evident choice, the way it's described in GNU assembly is 1<<%r3, or via -mregnames flag. I suggest at least keeping this as an alias for u3 (which, in my opinion, is not obvious at all). > * sm=r3 (no change) > * sm=u3 (old 1<<r3) > * !sm=r31 (old sm=~r31) I think we will be able to handle ~r31 (well, again, in GNU assembly parlour it's ~%r31, unless -mregnames is present). Actually once the registers are supported it boils down to handling a little expression tree. I'm working on it.
I had to spend a damn lot of time there, but it seems I managed to get registers working in macros. With this applied, expressions like ~%r3/~r3 and 1<<%r3/1<<r3 can be implemented as well. https://sourceware.org/pipermail/binutils/2022-July/121772.html
Yet another day lost on macro wars, this time to adopt the vector notation to the recent register support in macros. Also more iterations on that patch. https://sourceware.org/pipermail/binutils/2022-July/121777.html I hope that tomorrow I'll be able to at least some of notations we wanted. Perhaps we'll need to introduce some special pseudo-operand for this purpose, though.
Remark: what I mean by pseudo-operand is something which makes context only for binutils internals. Syntactically nothing is changed.
(In reply to Dmitry Selyutin from comment #22) > (In reply to Luke Kenneth Casson Leighton from comment #21) > > (In reply to Dmitry Selyutin from comment #20) > > bleuch, far too many random characters. > > These characters are _not_ random. i know: i mean, i'm counting the number, ignoring what they actually are, and 9 characters is exceeding a threshold i feel comfortable with. (10 including /) / 1 sm= 3 1<< 3 %r3 3 total of 10, it's way too much. > These characters come from the way this > assembly work in PPC. My point is, if 1<<r3 is the most obvious and evident > choice, the way it's described in GNU assembly is 1<<%r3, or via -mregnames > flag. I suggest at least keeping this as an alias for u3 (which, in my > opinion, is not obvious at all). agreed: i just can't think of a good short mnemonic. does 1<<%3 work? how about dropping the 1 and using one "<"? sm=<%3 or "^"? sm=^r3 sm=^%3 would a special-case pattern in the lexer work (if "^" has not been used), i am wondering if the unary-bit form is going too far in the macro substitution. if it should be excluded as a special case. if you want sm=1<<r3 you *know* you want it, and you cannot use 1<<r10 or 1<<r31 so macros will not help you.
(In reply to Luke Kenneth Casson Leighton from comment #26) > total of 10, it's way too much. ...so those who feel uncomfortable can use the macro? :-) > does 1<<%3 work? > how about dropping the 1 and using one "<"? sm=<%3 > or "^"? sm=^r3 sm=^%3 These all break the principle of the least astonishment. The last one to the lesser extent, perhaps. Again, in GNU lingo, it is sm=^%r3. I like this option. + consists of the register and unary operand, as things like ~%r3; + really distinguishable; + follows the same principles as operands around; + short and sweet, easy to remember; - I'm not sure whether anyone in PPC world has ever seen anything similar. So I suggest we indeed use ^%r3 (^r3 with -mregnames).
(In reply to Dmitry Selyutin from comment #27) > So I suggest we indeed use ^%r3 (^r3 with -mregnames). like it. and it can be recognised in the lexer, even.
This took damn long time, but it seems to work now: .set coco, 1<<%r3 .set jum, eq .set bo, %r3 sv.extsw./ff=coco 5, 31 sv.extsw./pr=jum/m=bo *5, 31
If you want, I can play with that more and introduce an alias ^r3. I must say that gas expr system is somewhat stubborn and not really flexible, so even what we have now works only with some hacks (minor ones, but still hacks). Also, with -mregnames, even the following works: .set coco, 1<<r3 .set jum, eq .set bo, r3 sv.extsw./ff=coco 5, 31 sv.extsw./pr=jum/m=bo *5, 31
(In reply to Dmitry Selyutin from comment #30) > If you want, I can play with that more and introduce an alias ^r3. yes please, i know it took a long time, i should have been clearer: the length of "/sm=1<<%r3" is 10 characters which is far too many.
On svp64-ng branch, the following works: .set coco, eq .set jum, ^%r3 .set bo, 1<<%r3 sv.extsw./ff=coco *85, *%r84 sv.extsw./pr=jum/m=bo *5, *%r72 ...as well as does the following, with -mregnames: .set coco, eq .set jum, ^r3 .set bo, 1<<r3 sv.extsw./ff=coco *85, *r84 sv.extsw./pr=jum/m=bo *5, *r72
(In reply to Dmitry Selyutin from comment #32) > On svp64-ng branch, the following works: > .set coco, eq > .set jum, ^r3 > .set bo, 1<<r3 > sv.extsw./ff=coco *85, *r84 > sv.extsw./pr=jum/m=bo *5, *r72 hoorah, much shorter. ah. sorry it's taken me a while to spot this, ff and pr must only take CR-fields (inv + eq/lt/gt/so). there's no bit here to specify to use integer regs r3/r10/r31: https://libre-soc.org/openpower/sv/normal/#index2h1 0-1 2 3 4 description 01 inv CR-bit Rc=1: ffirst CR sel 11 inv CR-bit Rc=1: pred-result CR sel https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc-svp64.c;h=7f27093cde7fff236e4d254b54072e8430a1aab5;hb=0abcbc8220d812de3261fc44c5c65989882c0cec#l304 304 svp64_decode_predicate (char *str, bool *cr, unsigned int *mask, bool crmode) 305 { ... 372 predicate = (enum svp64_predicate)exp.X_add_number; 373 *cr = (table[predicate].cr ? true : false); if (crmode && !(*cr)) { // pr= and ff= only allow CRs return NULL; } then ff and pr decode set true (others set false) 640 svp64_decode_ff (char *str, struct svp64_ctx *svp64) 641 { ... 650 iter = svp64_decode_predicate (str, &cr, &mask, true); 662 svp64_decode_pr (char *str, struct svp64_ctx *svp64) 663 { ... 672 iter = svp64_decode_predicate (str, &cr, &mask, true);
(In reply to Luke Kenneth Casson Leighton from comment #33) > (In reply to Dmitry Selyutin from comment #32) > > ah. sorry it's taken me a while to spot this, ff and pr must only take > CR-fields (inv + eq/lt/gt/so). there's no bit here to specify to use > integer regs r3/r10/r31: I guess I broke this between revisions, perhaps when I needed some code to be checked after implementing ^%r3 support. In svp64 branch, this is OK, I broke it for svp64-ng where I experimented with macros. Should be fine now.
(In reply to Dmitry Selyutin from comment #34) > I guess I broke this between revisions, perhaps when I needed some code to > be checked after implementing ^%r3 support. In svp64 branch, this is OK, I > broke it for svp64-ng where I experimented with macros. Should be fine now. 632 iter = svp64_decode_predicate (str, &cr, &mask); 633 if (!iter || !(cr || ((mask & SVP64_RC1_ACTIVE) != 0))) !cr - perfect, that stops it just as well.