Bug 615 - talk to binutils and gcc developers about acceptable sv assembly format
Summary: talk to binutils and gcc developers about acceptable sv assembly format
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL: https://libre-soc.org/openpower/sv/im...
Depends on:
Blocks:
 
Reported: 2021-03-12 18:58 GMT by Luke Kenneth Casson Leighton
Modified: 2021-03-26 11:49 GMT (History)
3 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-03-12 18:58:48 GMT
a format needs to be agreed on for sv assembly mnemonics which is acceptable

draft implemented at
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/sv/trans/svp64.py;hb=HEAD#l597

gcc list post:
https://gcc.gnu.org/pipermail/gcc/2021-March/234992.html

binutils list post:
https://sourceware.org/pipermail/binutils/2021-March/115755.html

the goal here is to find the SVP64 assembly mnemonic format that is acceptable to all of:

* gcc developers
* binutils developers
* SV's developers and creators
* assembly code developers

clarity and "least conceptual disruption" should be given high priority.
example: actually changing the number of arguments should be given a lower
priority given that it implies that the underlying v3.0B mnemonic is entirely
different when prefixed by SVP64.

the full set of modes is here:
https://libre-soc.org/openpower/sv/svp64/

the overview describing the Simple-V concept is here:
https://libre-soc.org/openpower/sv/overview/
Comment 1 Luke Kenneth Casson Leighton 2021-03-13 23:22:45 GMT
clarifying the characteristics and requirements here (edit as required)

1 SVP64 embeds v3.0B scalar instructions
  as a hardware-level for-loop
2 implications: the scalar instruction
  needs to be "preserved" i.e. obvious
  from the SV-augmentation
3 the SV-augmentation must pass through
  gcc macro parsing without interference
  in that parsing
4 the SV-augmentation must meet binutils
  parsing requirements or at least not
  need significant changes to binutils
5 numerical-only register numbering should
  be possible
Comment 2 Luke Kenneth Casson Leighton 2021-03-13 23:33:07 GMT
from the above, we can evaluate an idea kindly suggested by Segher, to fulfil requirement (5): that SVP64 augmentation of registers as Vector or scalar be done as an additional argument.

instead of:

     svadd 5.v, 6.v, 7.v

it would be:

     svadd 5, 6, 7, 1, 1, 1
or   svadd 5, 6, 7, 0b111


the issue with this is that it breaks requirement (2), in multiple ways.

firstly, it gives the impression that SV is adding extra arguments to the *scalar* v3.0B instruction, when it is not (SVP64 is *augmenting* the registers)

secondly: some instructions have optional argiments as scalar (sync is an alias for sync 0).  if Vectorised it becomes ambiguous as to whether the optional argument applies to the underlying scalar operation or to the Vectorisation Augmentation/Embedding.

thirdly: adding new scalar instructions becomes problematic, in that every new instruction added, by having this lack of abstraction, now has to be evaluated carefully by inventing a new syntax not just for the scalar variant but also for the Vectorised variant.
Comment 3 Luke Kenneth Casson Leighton 2021-03-14 00:57:27 GMT
the "fallback position": ".long xxxxx; op N,N,N" where .long contains a v3.1 EXT01 major opcode

clearly, having gcc output such is not exactly desirable.

an alternative is the addition of an svp64 32 bit instruction:

     svp64 fields,qualifying,next,op

this would become very confusing due to the fact that the qualified v3.0 32 bit instruction following it will have non-obvious register numbers

(SVP64 augments 5-bit register numbers RA etc, FRA etc and BA etc, and also 3-bit CR fields BFA etc)
Comment 4 Jacob Lifshay 2021-03-14 02:59:57 GMT
what about just modifying the mnemonic:
what was:
sv.add. 5.v, 10.s, 12.v
sv.add 120.s, 120.s, 121.s
sv.ldu 20.v, 8(23.s)
becomes:
sv.add.vsv. 5, 10, 12
sv.add.sss 120, 120, 121
sv.ldu.vs 20, 8(23)
Comment 5 Luke Kenneth Casson Leighton 2021-03-14 05:06:32 GMT
(In reply to Jacob Lifshay from comment #4)

> sv.add.vsv. 5, 10, 12
> sv.add.sss 120, 120, 121
> sv.ldu.vs 20, 8(23)

not keen on these as they separate out the vector-note from the register numbers.  this makes it really hard to write assembly code (including hard to write unit tests).

additionally when immediate fields (L, sh, mb) are involved it becomes horribly confusing

also for LD/ST there is a notation for unit stride and element stride where it is clear which one is which by marking the immediate-offset rather than the register

   4.v(r3)

vs

   4(r3.v)

separating out svv to be part of the op is, well, if we're forced to, then we're forced to.  the lack of clarity means it's definitely low on the list.
Comment 6 Jacob Lifshay 2021-03-14 05:29:35 GMT
(In reply to Luke Kenneth Casson Leighton from comment #5)
>    4.v(r3)

I'd say this is waay less clear...
Comment 7 Alexandre Oliva 2021-03-14 06:54:11 GMT
if the dot in register numbers is the only problem, we might as well drop it.  v and s will do just fine ending the register number.  but I'm curious as to what motivated segher's objections.  was it because '.' could be part of a number?
Comment 8 Luke Kenneth Casson Leighton 2021-03-14 13:06:16 GMT
(In reply to Alexandre Oliva from comment #7)

> if the dot in register numbers is the only problem, we might as well drop
> it.  v and s will do just fine ending the register number.  but I'm curious
> as to what motivated segher's objections.  was it because '.' could be part
> of a number?

yes, he pointed out that "." anywhere will not be well received.
i suspect the only reason that "." is allowed at all is because
it's part of the OpenPOWER v3.0B spec for mnemonics:

   add.   => set Rc=1
   add    => set Rc=0


he did say we need some way to support numerical-only register numbers.
to achieve that i am inclined there to say "screw it" and simply have:

   svadd/extra=0xNN  N1, N2, N3

as an "option" where the N1, N2, N3 is *verbatim* v3.0B *not* the
*SV-augmented* register numbers

i.e. 

   svadd/extra=0xNN  N1, N2, N3

would translate to:

   .long 0x...NN    # that NN is the same bits from extra above
   add  N1, N2, N3  # these do not change

in other words the actual v3.0B augmented opcode is clearly unmodified
Comment 9 Luke Kenneth Casson Leighton 2021-03-14 13:10:50 GMT
(In reply to Jacob Lifshay from comment #6)
> (In reply to Luke Kenneth Casson Leighton from comment #5)
> >    4.v(r3)
> 
> I'd say this is waay less clear...

there does exist an alternative which is to have the ld/st be qualified
with a mode.  i thiiink... no i haven't put LD/ST modes into SVP64Asm yet
so can't point you at it.

example (made up):

    svld/els/ew=8  RS, D(RA)

this says "els for element-strided)" and the hw knows to deploy this:

    for i in range(VL)
         # element-strided - the very unclear D.v(RA)
         Effective_Address = (i * D) + GPR[RA]

not:

    for i in range(VL)
         # unit strided
         Effective_Address = D + (i * elwidth) + GPR[RA]
Comment 10 Luke Kenneth Casson Leighton 2021-03-14 13:14:03 GMT
also in conversation with Segher i suggested the prefixing of reg names

    "vr0" --> Vector-augmented r0
    "vf0" --> Vector-augmented f0

unfortunately he said that "vrNN" is already taken (an alias for vsrNN)

however now that i think about it, given that SV applies to *scalar* operations only there's no actual conflict, there.
Comment 11 Luke Kenneth Casson Leighton 2021-03-14 14:57:22 GMT
commit 477bc257449d3679a5ffb9807609da1304606163 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sun Mar 14 14:54:45 2021 +0000

    remove "sv." and replace with "sv" in all SVP64Asm

i'm not keen on the lack of a separator. it tends to imply that
"svadd" is a completely different instruction from "add".
Comment 12 Luke Kenneth Casson Leighton 2021-03-14 16:44:23 GMT
just looking at binutils gnu-as tc-ppc.c
https://github.com/gitGNU/gnu_as/blob/29cc6aaccf8e8492046aa303b0282a4ee36d829c/config/tc-ppc.c#L2692

this is where the opcode parsing starts.  the syntax we came up with
(svadd/qualifiers/x/y/z operand1, operand2, ...) should be pretty easy
to implement.

step 1:

   see if the opcode starts with "sv", if so, call a function
   that parses any "/" separated SVP64 qualifiers.  these get
   inserted into the "upper" bits of insn (32-63) to be extracted
   later

   at the end, replace "/" by "\0" and move str on a bit

step 2:

   gather operands as normal
   https://github.com/gitGNU/gnu_as/blob/29cc6aaccf8e8492046aa303b0282a4ee36d829c/config/tc-ppc.c#L2748

   the key function here seems to be ppc_optional_operand_value

   also that ppc_insert_operand seems to be where the "magic" happens
   (actually putting the operand into the required location in the
   assembly opcode).

   there exists an "override" mechanism in the powerpc_operands
   table which ppc_insert_operand can call for doing "special"
   stuff.

   this is where register_names are identified by dropping a
   structure into "ex" (type ExpressionS)
   https://github.com/gitGNU/gnu_as/blob/29cc6aaccf8e8492046aa303b0282a4ee36d829c/config/tc-ppc.c#L2946

   therefore it should be possible to hook into "register_names"
   and either spot the suffix "v" (or ".v", or whatever) or otherwise
   extend it to support EXTRA2/3.

step 3:

   extract the upper bits (32-64), which could even have EXT01
   pre-inserted into them.

   output two assembled instructions (64 bit) rather than one
   (32 bit)

this is all actually pretty straightforward.
Comment 13 Luke Kenneth Casson Leighton 2021-03-16 01:39:49 GMT
an idea came up which would remove the need to modify gcc's rs6000.md opcode mnemonic generators significantly or even at all:

have gcc create the svp64 prefix as a new 32 bit "fake" instruction which binutils picks up and outputs with an EXT01 Primary Opcode.

this is "odd" to say the least but would require no alteration of *any* of the mnemonic generators that create v3.0B scalar instructions, except perhaps to provide the option to output the prefix.

needs more thought but would be considerably less intrusive than altering every single v3.0B scalar assembly generator to optionally output "sv." in front of the mnemonic

or, worse, duplicating every single mnemonic generator to put an SV-augmented variant in an already overloaded file.
Comment 14 Jacob Lifshay 2021-03-16 03:20:05 GMT
(In reply to Luke Kenneth Casson Leighton from comment #13)
> an idea came up which would remove the need to modify gcc's rs6000.md opcode
> mnemonic generators significantly or even at all:
> 
> have gcc create the svp64 prefix as a new 32 bit "fake" instruction which
> binutils picks up and outputs with an EXT01 Primary Opcode.

That could definitely work -- Arm does something kinda like that with their Thumb-mode IT instruction which sets up a predicate to conditionally execute the next 1-4 instructions.
Comment 15 Luke Kenneth Casson Leighton 2021-03-16 12:43:57 GMT
(In reply to Jacob Lifshay from comment #14)

> > have gcc create the svp64 prefix as a new 32 bit "fake" instruction which
> > binutils picks up and outputs with an EXT01 Primary Opcode.
> 
> That could definitely work -- Arm does something kinda like that with their
> Thumb-mode IT instruction which sets up a predicate to conditionally execute
> the next 1-4 instructions.

ahh iiinteresting! there is a keyword "parallel" in the macro language which associates groups of other macros, typically predicates.

cntlz simple pattern

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/rs6000/rs6000.md;h=c0d7b1aff96801acea581c026c06c9be0b4a8cbd;hb=7b900dca607dceaae2db372365f682a4979c7826#l2379

there are attributes "predicable" and "multiple"

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/arm/thumb2.md;h=5772f4d0b76d23b48804f1dead36734a4ebc82e5;hb=7b900dca607dceaae2db372365f682a4979c7826#l159

then... urk.  i started looking through the arm11 md file, no joy finding anything obvious.
Comment 16 Luke Kenneth Casson Leighton 2021-03-16 18:44:51 GMT
David Edelsohn
to Luke
2 minutes agoDetails
Hi, Luke

My suggestion is that the GCC support for SV look at print_operand()
case '0' to adjust the register names and rs6000_asm_output_opcode()
to adjust the mnemonic instead of changing the output template of all
patterns.  One can mark patterns as allowing SV, similar to prefixed
instructions attributes.

I'm requesting that any changes be as localized and inconspicuous as
possible.  Adding attributes and updating the patterns that emit
instructions keeps the changes less visible in the impact on the rest
of the port.

Thanks, David
Comment 17 Luke Kenneth Casson Leighton 2021-03-19 16:12:00 GMT
alain modra's helped advise on whether "/" would be acceptable, and given
that we're not using "/" in immediate-operand computations, it should be
fine.

"=" on the other hand is not.  an available character that is unused by
binutils macro-expansion is "?".
Comment 18 Jacob Lifshay 2021-03-19 16:37:36 GMT
(In reply to Luke Kenneth Casson Leighton from comment #17)
> alain modra's helped advise on whether "/" would be acceptable, and given
> that we're not using "/" in immediate-operand computations, it should be
> fine.
> 
> "=" on the other hand is not.  an available character that is unused by
> binutils macro-expansion is "?".

I think we should use something other than "?" due to it's very unintuitive nature:
https://libre-soc.org/irclog/%23libre-soc.2021-03-19.log.html#t2021-03-19T15:42:24
Comment 19 Luke Kenneth Casson Leighton 2021-03-19 17:21:24 GMT
(In reply to Jacob Lifshay from comment #18)

> I think we should use something other than "?" due to it's very unintuitive
> nature:
> https://libre-soc.org/irclog/%23libre-soc.2021-03-19.log.html#t2021-03-19T15:
> 42:24

there are very few choices.  alain modra - the 15+ year experienced maintainer
of the powerpc-binutils port - would not have suggested it if there were any
other options.