849 – binutils: parse macros in SVP64 and refactor code

Bug 849 - binutils: parse macros in SVP64 and refactor code

Summary: binutils: parse macros in SVP64 and refactor code

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Dmitry Selyutin

URL:	https://git.libre-soc.org/?p=binutils...

Depends on:
Blocks:

Reported:	2022-06-09 14:38 BST by Dmitry Selyutin
Modified:	2022-09-15 17:43 BST (History)
CC List:	3 users (show)

See Also:
NLnet milestone:	NLNet.2019.10.032.Formal
total budget (EUR) for completion of task and all subtasks:	2400
budget (EUR) for this task, excluding subtasks' budget:	2400
parent task for budget allocation:	550
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	[ghostmansd] amount=2100 submitted=2022-07-25 paid=2022-08-10 [lkcl] amount=300 submitted=2022-06-16 paid = 2022-09-06

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dmitry Selyutin 2022-06-09 14:38:01 BST

The current binutils implementation acts in a way similar to pysvp64asm. This, however, has a huge limitation: we parse operands on our own, and this somewhat duplicates an existing ecosystem. Since we don't use the vanilla code, we also lack support for some its features, like macro substitution.

Ideally we should refactor the ppc_assemble (originally md_assemble) in a way so that it deals with SVP instructions directly. This would, in turn, inflict some changes to lexer. The vanilla binutils code is not ready for even dealing with SVP64 operands, since the way we treat the operand depend on operation modes, and binutils don't even have way to pass this context (other than via global variable, which is bad, since we deal on per-instruction basis).

Comment 1 Luke Kenneth Casson Leighton 2022-06-10 12:45:56 BST

i honestly thought that the parsing of macro substitution and arithmetic
computation of expressions were a completely separate pre-procesing phase.
what you are saying is that the macro substitution involving register
names is *not* separate, which retrospectively makes sense

it is worthwhile checking that arithmetic pre-processing is separate
or not.  i believe it is valid to have this:

     addi 8/4, r0, 9-5

because those numbers may come from #defines and .set declarations,
which must be parse out and eventually become:

     addi 2, r0, 4

if the arithmetic macro substitution is *not* a pre-processing
phase then there exists the opportunity to ensure that this does not
get borked:

    sv.addi/vec2 2, r0, 4

because the "/" is wrongly interpreted as a divide operation

if that cannot be solved then sigh we have to use a different character.
discussed previously, "?" is free and is not used as an arithmetic op.
personally i think this looks silly but hey


    sv.addi?vec2?sm=r3?ew=8 2, r0, 4

Comment 2 Dmitry Selyutin 2022-06-14 14:14:20 BST

OK, I had to spend some hours in binutils code and I think I came up with the solution. We will simply let the underlying PPC machinery handle overall operands parsing, but, at the same time, will extend it with the special flag. This means that we will also behave differently than pysvp64asm: we'll no longer have a separate stage where we deal with plain old PPC instruction as string. Instead, we'll fetch the special `powerpc_opcode`, but will let the code know it's an SVP64-extended instruction.

Bad news: I need time for refactoring, rebasing and code cleanup.
Good news: we'll use everything PPC provides us, and also will be much closer to 845 (because the prefix handling already assumes we don't have a two-stage pipeline).

Comment 3 Luke Kenneth Casson Leighton 2022-06-14 15:43:04 BST

(In reply to Dmitry Selyutin from comment #2)

> flag. This means that we will also behave differently than pysvp64asm: we'll
> no longer have a separate stage where we deal with plain old PPC instruction
> as string. 

hey, as long as the ".set" macro-hack keeps working and the
assembly files are compatible between the two, it's all good

> Bad news: I need time for refactoring, rebasing and code cleanup.
> Good news: we'll use everything PPC provides us, and also will be much
> closer to 845 (because the prefix handling already assumes we don't have a
> two-stage pipeline).

do consider giving alan and peter a heads-up. also if it means that
there's a solution to using "/" in "sv.xxx/yyy/zzz" because it's
explicitly avoided, that would be really good.  i just tried this
in a test.s and it actually worked

     addi 4/2,2,1

"powerpc64le-linux-gnu-as test.s" did not barf on that.

Comment 4 Dmitry Selyutin 2022-06-14 17:42:41 BST

I recently wrote to Alan, several days ago, to ask whether I can publish patches for a preliminary review. That was before I found the problem with macros. :-)

A quick question: we don't need macros support in the instruction modifiers, do we? Vanilla binutils don't evaluate instruction names, and for us all these modifiers act as an instruction name, right? That would simplify stuff a lot, to be honest.

Comment 5 Luke Kenneth Casson Leighton 2022-06-24 10:44:53 BST

(In reply to Dmitry Selyutin from comment #4)
> I recently wrote to Alan, several days ago, to ask whether I can publish
> patches for a preliminary review. That was before I found the problem with
> macros. :-)
> 
> A quick question: we don't need macros support in the instruction modifiers,
> do we? 

sv.instruction/ew=8 for example?

mmm... drat, i can see a use-case for being able to macro-substitute
the "8" in "ew=8".  and also "sm=something".  but, not much else.
not the "ew=" itself, not the "sm=" itself, but the arguments.

why? because whilst "ew=" is part of the instruction name, the
"8" is like an operand.

(the only reason it wasn't added *as* an operand RT,RA,RB,ew=8,sm=r3
 was because that would seriously interfere with the decoding of
 operands)

Comment 6 Dmitry Selyutin 2022-07-07 20:59:33 BST

It seems that the support is ready, except for pseudo-operands in expressions like `${A}=${B}`, e.g. `ew=8`.

Comment 7 Luke Kenneth Casson Leighton 2022-07-07 22:43:05 BST

(In reply to Dmitry Selyutin from comment #6)
> It seems that the support is ready, except for pseudo-operands in
> expressions like `${A}=${B}`, e.g. `ew=8`.

the "ew=" side, being part of the instruction, we don't want to macro-fy.
however the "8", definitely.  NN means "yes macrofy"

other things:

* vec2/3/4 no... maybe.  a case could be made for vecN
* ew=NN  yes elwidths
* sm=NN dm=NN yes predicate mask sources r3 r10 r31 CR fields all options
* modes: no.
  - no mr
  - no satu/sats
  - etc

err... really that's it.

* elwidths including selelecting "default"
* predicate sources including selecting "all 1s"
* vec2/3/4 maybe, including selecting "no subvec"

Comment 8 Dmitry Selyutin 2022-07-08 02:07:03 BST

(In reply to Luke Kenneth Casson Leighton from comment #7)
> 
> * vec2/3/4 no... maybe.  a case could be made for vecN

This must follow all-or-nothing paradigm. Otherwise, at best, the vec qualifiers are ill-formed, or the whole parsing is inconsistent. Options are:

a) allow every qualifier to be replaced with macro, that is, every mode can be a macro;
2) replace vec2/vec3/vec4 with vec=N.

I'm more inclined to option 2. Modes look like an essential part of the instruction itself. Also, allowing to substitute modes opens a Pandora's box.

Comment 9 Luke Kenneth Casson Leighton 2022-07-08 02:34:05 BST

(In reply to Dmitry Selyutin from comment #8)

> 2) replace vec2/vec3/vec4 with vec=N.

like the principle (consistency) but is too long and not recognised by 3D
developers.

nuts to it. consider it a mode.  will explain why on irc latwr

Comment 10 Jacob Lifshay 2022-07-08 02:39:12 BST

(In reply to Luke Kenneth Casson Leighton from comment #9)
> (In reply to Dmitry Selyutin from comment #8)
> 
> > 2) replace vec2/vec3/vec4 with vec=N.
> 
> like the principle (consistency) but is too long and not recognised by 3D
> developers.
> 
> nuts to it. consider it a mode.  will explain why on irc latwr

i originally proposed subvl=1-4, where subvl=1 is the default value

Comment 11 Luke Kenneth Casson Leighton 2022-07-08 02:46:26 BST

(In reply to Jacob Lifshay from comment #10)

> i originally proposed subvl=1-4, where subvl=1 is the default value

you have a habit from lack of experience in proposing assembler
mnemonics that are far too long.  people used to computing from
the 1980s and 1990s consider CPU time to be precious. russian
assembler writers of the time were amongst some of the best in
the world, out of necessity: due to the berlin wall, large
capacity memory and high end cpus were impossible to obtain.

hard no on 6 letter mnemonics when 2-4 will do.

Comment 12 Jacob Lifshay 2022-07-08 03:06:42 BST

(In reply to Luke Kenneth Casson Leighton from comment #11)
> people used to computing from
> the 1980s and 1990s consider CPU time to be precious. russian
> assembler writers of the time were amongst some of the best in
> the world, out of necessity: due to the berlin wall, large
> capacity memory and high end cpus were impossible to obtain.

that reason is no longer valid: the amount of time and memory used by assemblers is a very small fraction of what a standard compiler uses, i'd expect >99% of the runtime is used for optimizations rather than in the assembler. increasing the time it would take to assemble all of chromium (1GB of binary with debug info) by 1/10 of a second is an acceptable tradeoff for increased clarity imho.
> 
> hard no on 6 letter mnemonics when 2-4 will do.

i disagree...have you ever looked at nvidia's ptx or amdgpu assembler? they use pretty long mnemonics I assume because they're easier to read.

wasm also has long mnemonics:
i32x4.extadd_pairwise_i16x8_u

basically a lot of recent ISAs have longer mnemonics because they're easier to read/understand.

Comment 13 Jacob Lifshay 2022-07-08 03:12:36 BST

(In reply to Jacob Lifshay from comment #12)
> that reason is no longer valid: the amount of time and memory used by
> assemblers is a very small fraction of what a standard compiler uses, i'd
> expect >99% of the runtime is used for optimizations rather than in the
> assembler. increasing the time it would take to assemble all of chromium
> (1GB of binary with debug info) by 1/10 of a second is an acceptable
> tradeoff for increased clarity imho.

that's at the proportion compared to total compile time that you deciding to light a candle in your room probably has a bigger effect due to decreased clock speeds from higher temperatures than having longer mnemonics would.

Comment 14 Jacob Lifshay 2022-07-08 03:22:59 BST

(In reply to Jacob Lifshay from comment #10)
> (In reply to Luke Kenneth Casson Leighton from comment #9)
> > (In reply to Dmitry Selyutin from comment #8)
> > 
> > > 2) replace vec2/vec3/vec4 with vec=N.
> 
> i originally proposed subvl=1-4, where subvl=1 is the default value

if you want a 3-char mnemonic, how about svl=N

imho it's better because it explicitly contains sub-vector which is the part you're actually setting, not the vector length which imho is implied (to an uninformed reader) by vec=N.

Comment 15 Dmitry Selyutin 2022-07-08 06:15:12 BST

I'm OK with any good name for it. My point is that, regardless of the name, it must be consistent per se and be consistent with anything around. Either all modes must allow macros, or none of them. If only "value" part of the mode is allowed to be a macro, then, to be a macro, vec2/vec3/vec4 must be converted to a key-value form.
It's not hard to change the code around to make this place work, but this would introduce an inconsistency. Allowing macros in modes would be kinda opening the Pandora's box and make the assembly code barely readable. So, let's either have some key-value mnemonic for vecs, or let's drop the option to allow macros for these.

Comment 16 Luke Kenneth Casson Leighton 2022-07-08 13:00:34 BST

(In reply to Dmitry Selyutin from comment #15)
> I'm OK with any good name for it. 

dmitry, see comment #9

Comment 17 Dmitry Selyutin 2022-07-09 15:04:42 BST

sw/ew are ready, thinking of how to handle the predicates. Outlined some general considerations in IRC: https://libre-soc.org/irclog/%23libre-soc.2022-07-09.log.html#t2022-07-09T14:50:14.

Comment 18 Dmitry Selyutin 2022-07-09 15:33:52 BST

(In reply to Dmitry Selyutin from comment #17)
> general considerations in IRC:
> https://libre-soc.org/irclog/%23libre-soc.2022-07-09.log.html#t2022-07-09T14:
> 50:14.

The code I checked was:

.set coco, 8
.set jum, coco
.set bo, 32

sv.setb/sw=jum/ew=bo 5, 31

Note that this works in a bit hackish fashion, binutils don't like '/' as separator and think I'm trying to divide something. ¯\_(ツ)_/¯ I therefore discard the separator for a while, until we're done with parsing this expression.

Comment 19 Dmitry Selyutin 2022-07-09 18:07:08 BST

Raised yet another on binutils mailing list about tricky cases like 1<<r3, ~r3 etc.

Comment 20 Dmitry Selyutin 2022-07-11 15:32:48 BST

This proves to be more difficult than I initially thought, since PPC doesn't allow any registers. This code fails:

.set REG, %r0
extsw REG, 2

...and so does this code:

.set REG, r0
extsw REG, 2

I've raised the discussion on this as part of the original thread: https://sourceware.org/pipermail/binutils/2022-July/121731.html.

Luke, a side note, unless -mregnames is given, this means that we should use 1<<%r3 (1<<r3 also works once -mregnames is active).

Comment 21 Luke Kenneth Casson Leighton 2022-07-12 01:47:58 BST

(In reply to Dmitry Selyutin from comment #20)

> Luke, a side note, unless -mregnames is given, this means that we should use
> 1<<%r3 (1<<r3 also works once -mregnames is active).

bleuch, far too many random characters.
stuff it, let's do "u3" instead, and !sm=r10

* sm=r3   (no change)
* sm=u3   (old 1<<r3)
* !sm=r31  (old sm=~r31)

but keep all 8 cr field options and if people use
"!sm=ne" issue a warning, "use sm=eq instead"

Comment 22 Dmitry Selyutin 2022-07-12 12:22:19 BST

(In reply to Luke Kenneth Casson Leighton from comment #21)
> (In reply to Dmitry Selyutin from comment #20)
> bleuch, far too many random characters.

These characters are _not_ random. These characters come from the way this assembly work in PPC. My point is, if 1<<r3 is the most obvious and evident choice, the way it's described in GNU assembly is 1<<%r3, or via -mregnames flag. I suggest at least keeping this as an alias for u3 (which, in my opinion, is not obvious at all).

> * sm=r3   (no change)
> * sm=u3   (old 1<<r3)
> * !sm=r31  (old sm=~r31)

I think we will be able to handle ~r31 (well, again, in GNU assembly parlour it's ~%r31, unless -mregnames is present). Actually once the registers are supported it boils down to handling a little expression tree. I'm working on it.

Comment 23 Dmitry Selyutin 2022-07-12 14:54:05 BST

I had to spend a damn lot of time there, but it seems I managed to get registers working in macros. With this applied, expressions like ~%r3/~r3 and 1<<%r3/1<<r3 can be implemented as well.

https://sourceware.org/pipermail/binutils/2022-July/121772.html

Comment 24 Dmitry Selyutin 2022-07-12 22:46:18 BST

Yet another day lost on macro wars, this time to adopt the vector notation to the recent register support in macros. Also more iterations on that patch.

https://sourceware.org/pipermail/binutils/2022-July/121777.html

I hope that tomorrow I'll be able to at least some of notations we wanted.
Perhaps we'll need to introduce some special pseudo-operand for this purpose, though.

Comment 25 Dmitry Selyutin 2022-07-12 22:47:22 BST

Remark: what I mean by pseudo-operand is something which makes context only for binutils internals. Syntactically nothing is changed.

Comment 26 Luke Kenneth Casson Leighton 2022-07-13 04:54:44 BST

(In reply to Dmitry Selyutin from comment #22)
> (In reply to Luke Kenneth Casson Leighton from comment #21)
> > (In reply to Dmitry Selyutin from comment #20)
> > bleuch, far too many random characters.
> 
> These characters are _not_ random.

i know: i mean, i'm counting the number, ignoring what they actually are, and
9 characters is exceeding a threshold i feel comfortable with.
(10 including /)

/ 1
sm= 3
1<< 3
%r3 3

total of 10, it's way too much.

> These characters come from the way this
> assembly work in PPC. My point is, if 1<<r3 is the most obvious and evident
> choice, the way it's described in GNU assembly is 1<<%r3, or via -mregnames
> flag. I suggest at least keeping this as an alias for u3 (which, in my
> opinion, is not obvious at all).

agreed: i just can't think of a good short mnemonic.

does 1<<%3 work?

how about dropping the 1 and using one "<"? sm=<%3

or "^"? sm=^r3 sm=^%3

would a special-case pattern in the lexer work (if "^" has not
been used),

i am wondering if the unary-bit form is going too far in the
macro substitution.  if it should be excluded as a special
case.  if you want sm=1<<r3 you *know* you want it, and you
cannot use 1<<r10 or 1<<r31 so macros will not help you.

Comment 27 Dmitry Selyutin 2022-07-13 06:24:37 BST

(In reply to Luke Kenneth Casson Leighton from comment #26)
> total of 10, it's way too much.

...so those who feel uncomfortable can use the macro? :-)

> does 1<<%3 work?
> how about dropping the 1 and using one "<"? sm=<%3
> or "^"? sm=^r3 sm=^%3

These all break the principle of the least astonishment. The last one to the lesser extent, perhaps. Again, in GNU lingo, it is sm=^%r3.

I like this option.

+ consists of the register and unary operand, as things like ~%r3;
+ really distinguishable;
+ follows the same principles as operands around;
+ short and sweet, easy to remember;
- I'm not sure whether anyone in PPC world has ever seen anything similar.

So I suggest we indeed use ^%r3 (^r3 with -mregnames).

Comment 28 Luke Kenneth Casson Leighton 2022-07-13 15:21:36 BST

(In reply to Dmitry Selyutin from comment #27)

> So I suggest we indeed use ^%r3 (^r3 with -mregnames).

like it.  and it can be recognised in the lexer, even.

Comment 29 Dmitry Selyutin 2022-07-24 10:03:18 BST

This took damn long time, but it seems to work now:

.set coco, 1<<%r3
.set jum, eq
.set bo, %r3
sv.extsw./ff=coco 5, 31
sv.extsw./pr=jum/m=bo *5, 31

Comment 30 Dmitry Selyutin 2022-07-24 10:06:06 BST

If you want, I can play with that more and introduce an alias ^r3. I must say that gas expr system is somewhat stubborn and not really flexible, so even what we have now works only with some hacks (minor ones, but still hacks).

Also, with -mregnames, even the following works:

.set coco, 1<<r3
.set jum, eq
.set bo, r3
sv.extsw./ff=coco 5, 31
sv.extsw./pr=jum/m=bo *5, 31

Comment 31 Luke Kenneth Casson Leighton 2022-07-24 10:27:29 BST

(In reply to Dmitry Selyutin from comment #30)
> If you want, I can play with that more and introduce an alias ^r3. 

yes please, i know it took a long time, i should have been clearer:
the length of "/sm=1<<%r3" is 10 characters which is far too many.

Comment 32 Dmitry Selyutin 2022-07-24 19:22:31 BST

On svp64-ng branch, the following works:

.set coco, eq
.set jum, ^%r3
.set bo, 1<<%r3
sv.extsw./ff=coco *85, *%r84
sv.extsw./pr=jum/m=bo *5, *%r72

...as well as does the following, with -mregnames:

.set coco, eq
.set jum, ^r3
.set bo, 1<<r3
sv.extsw./ff=coco *85, *r84
sv.extsw./pr=jum/m=bo *5, *r72

Comment 33 Luke Kenneth Casson Leighton 2022-07-24 22:06:54 BST

(In reply to Dmitry Selyutin from comment #32)

> On svp64-ng branch, the following works:
> .set coco, eq
> .set jum, ^r3
> .set bo, 1<<r3
> sv.extsw./ff=coco *85, *r84
> sv.extsw./pr=jum/m=bo *5, *r72

hoorah, much shorter.

ah. sorry it's taken me a while to spot this, ff and pr must only take
CR-fields (inv + eq/lt/gt/so).  there's no bit here to specify to use
integer regs r3/r10/r31:

https://libre-soc.org/openpower/sv/normal/#index2h1

        0-1 	2 	3 4 	description
        01 	inv 	CR-bit 	Rc=1: ffirst CR sel
        11 	inv 	CR-bit 	Rc=1: pred-result CR sel

https://git.libre-soc.org/?p=binutils-gdb.git;a=blob;f=gas/config/tc-ppc-svp64.c;h=7f27093cde7fff236e4d254b54072e8430a1aab5;hb=0abcbc8220d812de3261fc44c5c65989882c0cec#l304

 304 svp64_decode_predicate (char *str, bool *cr, unsigned int *mask,
                             bool crmode)
 305 {
 ...
 372       predicate = (enum svp64_predicate)exp.X_add_number;
 373       *cr = (table[predicate].cr ? true : false);
           if (crmode && !(*cr)) { // pr= and ff= only allow CRs
               return NULL;
           }

then ff and pr decode set true (others set false)

 640 svp64_decode_ff (char *str, struct svp64_ctx *svp64)
 641 {
 ...
 650   iter = svp64_decode_predicate (str, &cr, &mask,
                                      true);

 662 svp64_decode_pr (char *str, struct svp64_ctx *svp64)
 663 {
 ...
 672   iter = svp64_decode_predicate (str, &cr, &mask,
                                      true);

Comment 34 Dmitry Selyutin 2022-07-24 22:28:21 BST

(In reply to Luke Kenneth Casson Leighton from comment #33)
> (In reply to Dmitry Selyutin from comment #32)
> 
> ah. sorry it's taken me a while to spot this, ff and pr must only take
> CR-fields (inv + eq/lt/gt/so).  there's no bit here to specify to use
> integer regs r3/r10/r31:

I guess I broke this between revisions, perhaps when I needed some code to be checked after implementing ^%r3 support. In svp64 branch, this is OK, I broke it for svp64-ng where I experimented with macros. Should be fine now.

Comment 35 Luke Kenneth Casson Leighton 2022-07-24 23:01:51 BST

(In reply to Dmitry Selyutin from comment #34)

> I guess I broke this between revisions, perhaps when I needed some code to
> be checked after implementing ^%r3 support. In svp64 branch, this is OK, I
> broke it for svp64-ng where I experimented with macros. Should be fine now.

 632   iter = svp64_decode_predicate (str, &cr, &mask);
 633   if (!iter || !(cr || ((mask & SVP64_RC1_ACTIVE) != 0)))

!cr - perfect, that stops it just as well.