a format needs to be agreed on for sv assembly mnemonics which is acceptable draft implemented at https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/trans/svp64.py;hb=HEAD#l597 gcc list post: https://gcc.gnu.org/pipermail/gcc/2021-March/234992.html binutils list post: https://sourceware.org/pipermail/binutils/2021-March/115755.html the goal here is to find the SVP64 assembly mnemonic format that is acceptable to all of: * gcc developers * binutils developers * SV's developers and creators * assembly code developers clarity and "least conceptual disruption" should be given high priority. example: actually changing the number of arguments should be given a lower priority given that it implies that the underlying v3.0B mnemonic is entirely different when prefixed by SVP64. the full set of modes is here: https://libre-soc.org/openpower/sv/svp64/ the overview describing the Simple-V concept is here: https://libre-soc.org/openpower/sv/overview/
clarifying the characteristics and requirements here (edit as required) 1 SVP64 embeds v3.0B scalar instructions as a hardware-level for-loop 2 implications: the scalar instruction needs to be "preserved" i.e. obvious from the SV-augmentation 3 the SV-augmentation must pass through gcc macro parsing without interference in that parsing 4 the SV-augmentation must meet binutils parsing requirements or at least not need significant changes to binutils 5 numerical-only register numbering should be possible
from the above, we can evaluate an idea kindly suggested by Segher, to fulfil requirement (5): that SVP64 augmentation of registers as Vector or scalar be done as an additional argument. instead of: svadd 5.v, 6.v, 7.v it would be: svadd 5, 6, 7, 1, 1, 1 or svadd 5, 6, 7, 0b111 the issue with this is that it breaks requirement (2), in multiple ways. firstly, it gives the impression that SV is adding extra arguments to the *scalar* v3.0B instruction, when it is not (SVP64 is *augmenting* the registers) secondly: some instructions have optional argiments as scalar (sync is an alias for sync 0). if Vectorised it becomes ambiguous as to whether the optional argument applies to the underlying scalar operation or to the Vectorisation Augmentation/Embedding. thirdly: adding new scalar instructions becomes problematic, in that every new instruction added, by having this lack of abstraction, now has to be evaluated carefully by inventing a new syntax not just for the scalar variant but also for the Vectorised variant.
the "fallback position": ".long xxxxx; op N,N,N" where .long contains a v3.1 EXT01 major opcode clearly, having gcc output such is not exactly desirable. an alternative is the addition of an svp64 32 bit instruction: svp64 fields,qualifying,next,op this would become very confusing due to the fact that the qualified v3.0 32 bit instruction following it will have non-obvious register numbers (SVP64 augments 5-bit register numbers RA etc, FRA etc and BA etc, and also 3-bit CR fields BFA etc)
what about just modifying the mnemonic: what was: sv.add. 5.v, 10.s, 12.v sv.add 120.s, 120.s, 121.s sv.ldu 20.v, 8(23.s) becomes: sv.add.vsv. 5, 10, 12 sv.add.sss 120, 120, 121 sv.ldu.vs 20, 8(23)
(In reply to Jacob Lifshay from comment #4) > sv.add.vsv. 5, 10, 12 > sv.add.sss 120, 120, 121 > sv.ldu.vs 20, 8(23) not keen on these as they separate out the vector-note from the register numbers. this makes it really hard to write assembly code (including hard to write unit tests). additionally when immediate fields (L, sh, mb) are involved it becomes horribly confusing also for LD/ST there is a notation for unit stride and element stride where it is clear which one is which by marking the immediate-offset rather than the register 4.v(r3) vs 4(r3.v) separating out svv to be part of the op is, well, if we're forced to, then we're forced to. the lack of clarity means it's definitely low on the list.
(In reply to Luke Kenneth Casson Leighton from comment #5) > 4.v(r3) I'd say this is waay less clear...
if the dot in register numbers is the only problem, we might as well drop it. v and s will do just fine ending the register number. but I'm curious as to what motivated segher's objections. was it because '.' could be part of a number?
(In reply to Alexandre Oliva from comment #7) > if the dot in register numbers is the only problem, we might as well drop > it. v and s will do just fine ending the register number. but I'm curious > as to what motivated segher's objections. was it because '.' could be part > of a number? yes, he pointed out that "." anywhere will not be well received. i suspect the only reason that "." is allowed at all is because it's part of the OpenPOWER v3.0B spec for mnemonics: add. => set Rc=1 add => set Rc=0 he did say we need some way to support numerical-only register numbers. to achieve that i am inclined there to say "screw it" and simply have: svadd/extra=0xNN N1, N2, N3 as an "option" where the N1, N2, N3 is *verbatim* v3.0B *not* the *SV-augmented* register numbers i.e. svadd/extra=0xNN N1, N2, N3 would translate to: .long 0x...NN # that NN is the same bits from extra above add N1, N2, N3 # these do not change in other words the actual v3.0B augmented opcode is clearly unmodified
(In reply to Jacob Lifshay from comment #6) > (In reply to Luke Kenneth Casson Leighton from comment #5) > > 4.v(r3) > > I'd say this is waay less clear... there does exist an alternative which is to have the ld/st be qualified with a mode. i thiiink... no i haven't put LD/ST modes into SVP64Asm yet so can't point you at it. example (made up): svld/els/ew=8 RS, D(RA) this says "els for element-strided)" and the hw knows to deploy this: for i in range(VL) # element-strided - the very unclear D.v(RA) Effective_Address = (i * D) + GPR[RA] not: for i in range(VL) # unit strided Effective_Address = D + (i * elwidth) + GPR[RA]
also in conversation with Segher i suggested the prefixing of reg names "vr0" --> Vector-augmented r0 "vf0" --> Vector-augmented f0 unfortunately he said that "vrNN" is already taken (an alias for vsrNN) however now that i think about it, given that SV applies to *scalar* operations only there's no actual conflict, there.
commit 477bc257449d3679a5ffb9807609da1304606163 (HEAD -> master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sun Mar 14 14:54:45 2021 +0000 remove "sv." and replace with "sv" in all SVP64Asm i'm not keen on the lack of a separator. it tends to imply that "svadd" is a completely different instruction from "add".
just looking at binutils gnu-as tc-ppc.c https://github.com/gitGNU/gnu_as/blob/29cc6aaccf8e8492046aa303b0282a4ee36d829c/config/tc-ppc.c#L2692 this is where the opcode parsing starts. the syntax we came up with (svadd/qualifiers/x/y/z operand1, operand2, ...) should be pretty easy to implement. step 1: see if the opcode starts with "sv", if so, call a function that parses any "/" separated SVP64 qualifiers. these get inserted into the "upper" bits of insn (32-63) to be extracted later at the end, replace "/" by "\0" and move str on a bit step 2: gather operands as normal https://github.com/gitGNU/gnu_as/blob/29cc6aaccf8e8492046aa303b0282a4ee36d829c/config/tc-ppc.c#L2748 the key function here seems to be ppc_optional_operand_value also that ppc_insert_operand seems to be where the "magic" happens (actually putting the operand into the required location in the assembly opcode). there exists an "override" mechanism in the powerpc_operands table which ppc_insert_operand can call for doing "special" stuff. this is where register_names are identified by dropping a structure into "ex" (type ExpressionS) https://github.com/gitGNU/gnu_as/blob/29cc6aaccf8e8492046aa303b0282a4ee36d829c/config/tc-ppc.c#L2946 therefore it should be possible to hook into "register_names" and either spot the suffix "v" (or ".v", or whatever) or otherwise extend it to support EXTRA2/3. step 3: extract the upper bits (32-64), which could even have EXT01 pre-inserted into them. output two assembled instructions (64 bit) rather than one (32 bit) this is all actually pretty straightforward.
an idea came up which would remove the need to modify gcc's rs6000.md opcode mnemonic generators significantly or even at all: have gcc create the svp64 prefix as a new 32 bit "fake" instruction which binutils picks up and outputs with an EXT01 Primary Opcode. this is "odd" to say the least but would require no alteration of *any* of the mnemonic generators that create v3.0B scalar instructions, except perhaps to provide the option to output the prefix. needs more thought but would be considerably less intrusive than altering every single v3.0B scalar assembly generator to optionally output "sv." in front of the mnemonic or, worse, duplicating every single mnemonic generator to put an SV-augmented variant in an already overloaded file.
(In reply to Luke Kenneth Casson Leighton from comment #13) > an idea came up which would remove the need to modify gcc's rs6000.md opcode > mnemonic generators significantly or even at all: > > have gcc create the svp64 prefix as a new 32 bit "fake" instruction which > binutils picks up and outputs with an EXT01 Primary Opcode. That could definitely work -- Arm does something kinda like that with their Thumb-mode IT instruction which sets up a predicate to conditionally execute the next 1-4 instructions.
(In reply to Jacob Lifshay from comment #14) > > have gcc create the svp64 prefix as a new 32 bit "fake" instruction which > > binutils picks up and outputs with an EXT01 Primary Opcode. > > That could definitely work -- Arm does something kinda like that with their > Thumb-mode IT instruction which sets up a predicate to conditionally execute > the next 1-4 instructions. ahh iiinteresting! there is a keyword "parallel" in the macro language which associates groups of other macros, typically predicates. cntlz simple pattern https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.md;h=c0d7b1aff96801acea581c026c06c9be0b4a8cbd;hb=7b900dca607dceaae2db372365f682a4979c7826#l2379 there are attributes "predicable" and "multiple" https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/arm/thumb2.md;h=5772f4d0b76d23b48804f1dead36734a4ebc82e5;hb=7b900dca607dceaae2db372365f682a4979c7826#l159 then... urk. i started looking through the arm11 md file, no joy finding anything obvious.
David Edelsohn to Luke 2 minutes agoDetails Hi, Luke My suggestion is that the GCC support for SV look at print_operand() case '0' to adjust the register names and rs6000_asm_output_opcode() to adjust the mnemonic instead of changing the output template of all patterns. One can mark patterns as allowing SV, similar to prefixed instructions attributes. I'm requesting that any changes be as localized and inconspicuous as possible. Adding attributes and updating the patterns that emit instructions keeps the changes less visible in the impact on the rest of the port. Thanks, David
alain modra's helped advise on whether "/" would be acceptable, and given that we're not using "/" in immediate-operand computations, it should be fine. "=" on the other hand is not. an available character that is unused by binutils macro-expansion is "?".
(In reply to Luke Kenneth Casson Leighton from comment #17) > alain modra's helped advise on whether "/" would be acceptable, and given > that we're not using "/" in immediate-operand computations, it should be > fine. > > "=" on the other hand is not. an available character that is unused by > binutils macro-expansion is "?". I think we should use something other than "?" due to it's very unintuitive nature: https://libre-soc.org/irclog/%23libre-soc.2021-03-19.log.html#t2021-03-19T15:42:24
(In reply to Jacob Lifshay from comment #18) > I think we should use something other than "?" due to it's very unintuitive > nature: > https://libre-soc.org/irclog/%23libre-soc.2021-03-19.log.html#t2021-03-19T15: > 42:24 there are very few choices. alain modra - the 15+ year experienced maintainer of the powerpc-binutils port - would not have suggested it if there were any other options.