845 – binutils: drop svmagic and deal with the prefix instead

Bug 845 - binutils: drop svmagic and deal with the prefix instead

Summary: binutils: drop svmagic and deal with the prefix instead

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- normal
Assignee:	Dmitry Selyutin

URL:

Depends on:
Blocks:

Reported:	2022-06-02 13:22 BST by Dmitry Selyutin
Modified:	2022-10-06 13:17 BST (History)
CC List:	2 users (show)

See Also:	917
NLnet milestone:	NLNet.2019.10.032.Formal
total budget (EUR) for completion of task and all subtasks:	1150
budget (EUR) for this task, excluding subtasks' budget:	1150
parent task for budget allocation:	550
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	ghostmansd = { amount = 850, submitted = 2022-09-25, paid = 0222-10-06 } lkcl = { amount = 300, submitted = 2022-06-16, paid = 2022-09-06 }

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dmitry Selyutin 2022-06-02 13:22:07 BST

This is the next logical step after a temporary "magic" instruction. Things work with the magic instruction, but disassembler does not get us a real listing. Once we support the real prefix, this would make objdump work correctly.

Comment 1 Dmitry Selyutin 2022-07-07 21:03:53 BST

Exactly as I predicted in https://bugs.libre-soc.org/show_bug.cgi?id=849#c2, in scope of task 849, I already successfully introduced some changes related to prefix. For assembler, we already operate with the first part as with prefix; it's therefore only the disassembler which will need more tuning to be able to recognize the instructions.

Comment 2 Dmitry Selyutin 2022-07-28 10:56:56 BST

The rest of this task will need more tuning. We'll basically have to implement what we have for assembler, but in an opposite direction.

Let's assume the code below.

sv.addi %r3, %r1, 125
sv.extsw./ff=eq 5, 67

With the current objdump, this code is printed as below:

   0:   00 00 40 05     .long 0x5400000
   4:   7d 00 61 38     addi    r3,r1,125
   8:   04 02 40 05     .long 0x5400204
   c:   b5 07 65 7c     extsw.  r5,r3

I already managed to make objdump recognize this as a whole instruction, and even added our prefix:

   0:   00 00 40 05     sv.addi r3,r1,125
   4:   7d 00 61 38 
   8:   04 02 40 05     sv.extsw. r5,r3
   c:   b5 07 65 7c

The rest, like reconstructing the operands and qualifiers, will take much more time. We'll basically have to invent custom "extraction" routines for each and every operand, and also perhaps hack other stuff around to affect the instruction name with qualifiers.

Stay tuned.

Comment 3 Dmitry Selyutin 2022-07-31 20:11:38 BST

Here's what we have in binutils:

#define A(op, xop, rc)				\
  (OP (op)					\
   | ((((uint64_t)(xop)) & 0x1f) << 1)	\
   | (((uint64_t)(rc)) & 1))
#define A_MASK A (0x3f, 0x1f, 1)
{"fsub",	A(63,20,0),	AFRC_MASK,   PPCCOM,	PPCEFS|PPCVLE,	{FRT, FRA, FRB}},
{"fsub.",	A(63,20,1),	AFRC_MASK,   PPCCOM,	PPCEFS|PPCVLE,	{FRT, FRA, FRB}},

Notice A(63,20,0) and A(63,20,1) fields. They serve the purpose of instantiating the "opcode" field; this field is exactly what we use for lookup in disassembler code.

We can get the following upon calling openpower.decoder.power_svp64.SVP64RM.get_svp64_csv:

minor_63.csv shows that the major opcode is 63 (op == 63).
20 can be obtained from -----11100 record for fsub at minor_63.csv.
0/1 corresponds to RC field.

My question is, do we have some common technique to construct the whole "opcode" stuff? I mean, I understand the logic for cases like major.csv (where the OP is a simple integer) and for cases like minor_#.csv (where OP is # and XOP is determined by the entry). But how do we address other CSVs? Perhaps there's a simple way to get the "opcode" immediately in Python?

For the record, here are CSVs we currently handle:

    paths = (
        "minor_19.csv",
        "minor_30.csv",
        "minor_31.csv",
        "minor_58.csv",
        "minor_62.csv",
        "minor_22.csv",
        "minor_5.csv",
        "minor_63.csv",
        "minor_59.csv",
        "major.csv",
        "extra.csv",
    )

Comment 4 Dmitry Selyutin 2022-07-31 20:13:40 BST

We can cheat and, for each SVP64 instruction name, find it in vanilla PPC entries, and get opcode there. The issues here are: 1) disassembler doesn't build the hash table for names, it only hashes opcodes; 2) even if we add it, it wastes time upon startup.

Comment 5 Luke Kenneth Casson Leighton 2022-07-31 22:06:35 BST

(In reply to Dmitry Selyutin from comment #3)

> My question is, do we have some common technique to construct the whole
> "opcode" stuff? 

yyees... sort-of, i mean it's part of that wossname bugreport
the one for cleaning up (automated) decode svp64.py  ah! bug #838.

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder.py;hb=HEAD#l702

returns an object of type TopPowerDecoder which contains the
pattern-recognition list of XOs and associated bit-positions
where they start from.  just... err... in LSB0 order not MSB0.


> I mean, I understand the logic for cases like major.csv
> (where the OP is a simple integer) and for cases like minor_#.csv (where OP
> is # and XOP is determined by the entry). But how do we address other CSVs?
> Perhaps there's a simple way to get the "opcode" immediately in Python?

well, they're bit-pattern-matching, a mask plus a match.

in *theory* they could be turned into a single match plus start/end
because i don't believe there's any that are "---111---00101---"
i.e. all bits are contiguous.  i think.  have to check.

Comment 6 Luke Kenneth Casson Leighton 2022-07-31 22:22:51 BST

 739         Subdecoder(pattern=22, opcodes=get_csv("minor_22.csv"),
 740                    opint=False, bitsel=(0, 11), suffix=None, subdecoders=[]),

  20 0110001110-,ALU,OP_CPROP,
  21 ------10001,ALU,OP_BMASK,
  22 -----00011-,ALU,OP_FMVIS,
  23 -----01011-,ALU,OP_FISHMV,

so here you can parse the minor_22.cvs bitpattern -------10001
and go "oh, bitsel starts from 0 to 11 but actually we are only
interested in 0...4 because the rest is '-'s.  so actually
this turns into [0:4] == 0b10001

and for fmvis that would be:

"oh, "-----00011-" bitpattern starts at 1 so actually we want
1...5 this turns [1:5] == 0b00011

something like that.

Comment 7 Luke Kenneth Casson Leighton 2022-07-31 22:27:18 BST

yep none of these are "----1111----000" they are all contiguous

22.csv:-----11011-,
22.csv:-----011001,
22.csv:-----101001,
22.csv:-----111001,
22.csv:-----10011-,
22.csv:0111001110-,
22.csv:0011001110-,
22.csv:0101001110-,
22.csv:0001001110-,
22.csv:1101001110-,
22.csv:1011110110-,
22.csv:1001110110-,
22.csv:1111110110-,
22.csv:0111110110-,
22.csv:0110001110-,
22.csv:------10001,
22.csv:-----00011-,
22.csv:-----01011-,
59.csv:-----10010,F
59.csv:-----10100,F
59.csv:-----10101,F
59.csv:-----10110,F
59.csv:-----11000,F
59.csv:-----11001,F
59.csv:-----11010,F
59.csv:-----11100,F
59.csv:-----11101,F
59.csv:-----11110,F
59.csv:-----11111,F
59.csv:-----00100,F
59.csv:-----00101,F
59.csv:-----00110,F
59.csv:-----00111,F
59.csv:-----01101,F
59.csv:-----01111,F
5.csv:--------00-,S
5.csv:0010010110-,S
5.csv:-011010110-,S
5.csv:0010110110-,S
5.csv:0011110110-,S
63.csv:-----10010,F
63.csv:-----10100,F
63.csv:-----10101,F
63.csv:-----10110,F
63.csv:-----10111,F
63.csv:-----11000,F
63.csv:-----11001,F
63.csv:-----11010,F
63.csv:-----11100,F
63.csv:-----11101,F
63.csv:-----11110,F
63.csv:-----11111,F

Comment 8 Dmitry Selyutin 2022-08-01 05:56:35 BST

What really always pisses me off is that many our sources mix pure-Python concepts and stuff like Signal and similar crap. The pure-Python code which generates the C code for binutils should not really expect stuff like Signal or verilog or rtutil. This seriously complicates the options to use the code in a simple way. Why, for God's sake, if I want to generate binutils code, should I install verilog or nmigen!? I know, the reason is that nmigen and verilog needs were there way before; still we should care about it now.

But, putting this rage aside for a while (I'm not yet done with it, we'll return to this question eventually, because this is fundamentally wrong): divide_opcodes has most of what we need. I'll think of how to handle this. Either I'll create a PowerDecoder (duplicating some of create_pdecode stuff) or will handle it in a bit different way.

Comment 9 Luke Kenneth Casson Leighton 2022-08-01 18:24:34 BST

(In reply to Dmitry Selyutin from comment #8)
> What really always pisses me off is that many our sources mix pure-Python
> concepts and stuff like Signal and similar crap. The pure-Python code which
> generates the C code for binutils should not really expect stuff like Signal
> or verilog or rtutil. This seriously complicates the options to use the code
> in a simple way. Why, for God's sake, if I want to generate binutils code,
> should I install verilog or nmigen!? I know, the reason is that nmigen and
> verilog needs were there way before; still we should care about it now.

because when PowerDecoder was first written 2 years ago it was
initially only intended for HDL.

i then wanted to use it in the ISACaller Simulator and thought,
"ah HA! if i pass in an abstract class instance which has the exact
same capability as nmigen Signal, nmigen Switch, nmigen Case and nmigen
Mux, i can make a PowerDecoder which does NOT depend on nmigen"

it should be fairly obvious from the fact that this has not *yet*
happened that it was decided that (a) there wasn't time and (b) PowerDecoder
is already so horrendously abstract that it would cause people to run
away screaming...

> But, putting this rage aside for a while (I'm not yet done with it, we'll
> return to this question eventually, because this is fundamentally wrong):
> divide_opcodes has most of what we need. I'll think of how to handle this.
> Either I'll create a PowerDecoder (duplicating some of create_pdecode stuff)
> or will handle it in a bit different way.

no please REALLY do not duplicate the functionality of PowerDecoder,
it is extremely sophisticated and it would be a nightmare to
have two sets of code.  that's my first instinctive reaction.

the second reaction is: actually it may only be about 250 lines
so you might get away with it, but it is really complex data structures
so making me nervous.

my third reaction is, you've completely missed the point (aside from
the import, which could be solved by splitting out the appropriate
code) and it is this: you don't *actually* need to call any nmigen
at all, you only need to walk the data structures (the static hierarchy
of Named Tuples) to get the XOs.

in other words what you are seeing as a dependency on nmigen is ONLY
caused by these lines:

 767     return TopPowerDecoder(32, dec, name=name, col_subset=col_subset,
 768                            row_subset=row_subset,
 769                            conditions=conditions)
 770 

if you move the entire contents of create_pdecode except that one line
into a submodule and move Subdecoder (line 110) to the same file i think
you'll find the problem (for you) has completely gone away.

sorry i meant to spell that out.

yes v. likely that divide_opcodes() creates the mask needed for spotting
XO patterns, feel free to create a class that duplicates the recursive
walk of PowerDecoder but only after moving the majority of create_pdecode
and Subdecoder to a separate module, calling that new function from
power_decoder.py:create_pdecode(), and making damn sure everything still
works, first.

the last thing we need is two duplicate places of supported opcodes.

bottom line is, the information you are after is actually obtainable
and does not need nmigen, at all.

Comment 10 Dmitry Selyutin 2022-08-01 20:20:07 BST

(In reply to Luke Kenneth Casson Leighton from comment #9)
> i then wanted to use it in the ISACaller Simulator and thought,
> "ah HA! if i pass in an abstract class instance which has the exact
> same capability as nmigen Signal, nmigen Switch, nmigen Case and nmigen
> Mux, i can make a PowerDecoder which does NOT depend on nmigen"

Noooooo, this'd be mad. I meant quite the opposite: there's no need to implement the decoder here, at all. The opcode should simply be part of instruction data. More details below.

> it should be fairly obvious from the fact that this has not *yet*
> happened that it was decided that (a) there wasn't time and (b) PowerDecoder
> is already so horrendously abstract that it would cause people to run
> away screaming...

The thing I'm talking about is _not_ PowerDecoder. I'm talking of basic instruction-related information. Providing the way to quickly calculate the opcode (both with and without the operands, but at least without them) is natural. My point is, it's not expected to be buried that deep inside the decoder. The opcode is expected to be part of data which is associated with the instruction. When we parse the tables and output some stuff about the instructions, the basic opcode (operand-less) must be part of this information.

> if you move the entire contents of create_pdecode except that one line
> into a submodule and move Subdecoder (line 110) to the same file i think
> you'll find the problem (for you) has completely gone away.

Subdecoders are just named tuples; the logic is buried inside TopPowerDecoder, or, more correctly, inside its base class, PowerDecoder. Both are Elaboratable, both do way more than obtaining the simple information (building dynamic switches, creating some Signals and whatever stuff is under the hood). My point is, for code generation, there's no need to rely on any of these: basically the only stuff needed is what divide_opcodes produces. And my point is, this information should be associated with instruction, not be hidden.

> sorry i meant to spell that out.

No, this is actually perfect that you do it, and I'm really grateful you're doing this. First, we must discuss it; second, I'm really glad that you shed the light to many topics, because historical background and architectural details are priceless.

> yes v. likely that divide_opcodes() creates the mask needed for spotting
> XO patterns, feel free to create a class that duplicates the recursive
> walk of PowerDecoder but only after moving the majority of create_pdecode
> and Subdecoder to a separate module, calling that new function from
> power_decoder.py:create_pdecode(), and making damn sure everything still
> works, first.

I'm thinking of having the opcode coming as part of classes which are returned by get_svp64_csv wrapper. That is, for each instruction, it's perfectly doable. That is, think of it as "parse each and every CSV with the sacred knowledge of bitsels, suffices, etc., but don't create the decoder, just calculate all the information about the instructions that can be calculated statically, associating the data on per-instruction basis". Basically, yes, create_pdecode w/o the decoder, kinda like a big map of instructions (again, each instruction is a dict-like stuff, opcode included).

> bottom line is, the information you are after is actually obtainable
> and does not need nmigen, at all.

This exactly what I'm talking of. Good to know we're on the same page.

Comment 11 Dmitry Selyutin 2022-08-01 20:32:07 BST

So, TL;DR: what I've been thinking of is kinda like InstructionDatabase class, which parses all CSVs, a collection of instructions (likely searchable by both name/opcode). This instruction database parses all CSVs and also has some intimate knowledge about bitsels, suffixes, etc. InstructionDatabase would keep a collection of Instruction instances; each Instruction would provide the data, including stuff which needs some calculations.

For example, we calculate some SVP64 fields based on CSVs, but the calculations are hidden inside CSV parser (get_svp64_csv). Instead, this should be hidden inside Instruction class. Same applies to the opcode. Also, the logic with bitsels and similar stuff should be wrapped too, exactly like create_pdecode does, except that it doesn't need to create the actual decoder. :-)

Also, I should apologize, I think the overall tone of the first message was unnecessarily harsh. I had to dig into this for almost two consecutive days (sleep and food excluded), so it made me somewhat impatient and nervous. :-) I'm sorry, please don't consider it personal.

Comment 12 Luke Kenneth Casson Leighton 2022-08-01 20:35:50 BST

rright.

ok.

so the CSV files originally came from microwatt decode1.vhdl.

the missing information which is provided by create_decode() is,
"what the hell does column 1 (opcode) actually mean".

that's all.

and rhe information that is missing is:

1. bitsel (start,end)
2. opint

that's it.  that's all.  that's the *only* information that each CSV
file is missing. strictly speaking even opint can be determined "does this
opcode have minus-symbols in it, yes no".

now, about 6 weeks ago we added the ability to put comments into the
CSV files.

i have no problem *at all* with putting bitsel into the 1st line of the
CSV file as a comment:

# bitsel=0:5

and then:

1) adapting create_pdecode to read it
2) forgetting about create_pdecode() entirely for what you want.

once thar bitsel info is associated with the CSV file itself then
power_decode.py no longer is the canonical location for being able
to interpret the CSV files.

then get_csv_file() can add two extra columns on read, one is an XO
field, the other XO_mask.  it should be obvious what they are.

Comment 13 Luke Kenneth Casson Leighton 2022-08-01 20:49:50 BST

(In reply to Dmitry Selyutin from comment #11)
> So, TL;DR: what I've been thinking of is kinda like InstructionDatabase
> class, which parses all CSVs, a collection of instructions (likely
> searchable by both name/opcode). This instruction database parses all CSVs
> and also has some intimate knowledge about bitsels, suffixes, etc.
> InstructionDatabase would keep a collection of Instruction instances; each
> Instruction would provide the data, including stuff which needs some
> calculations.

i like it. a better version of get_csv().
 
> For example, we calculate some SVP64 fields based on CSVs, but the
> calculations are hidden inside CSV parser (get_svp64_csv). Instead, this
> should be hidden inside Instruction class. Same applies to the opcode. Also,
> the logic with bitsels and similar stuff should be wrapped too, exactly like
> create_pdecode does, except that it doesn't need to create the actual
> decoder. :-)

bear in mind the,stuff i wrote in comment #12 applies to scalar 32bit
ops, absolutely nothing to do with svp64 at all... but yes, i agree.


> Also, I should apologize, I think the overall tone of the first message was
> unnecessarily harsh. I had to dig into this for almost two consecutive days
> (sleep and food excluded), so it made me somewhat impatient and nervous. :-)
> I'm sorry, please don't consider it personal.

pffh i recognise a stress-out when i see one because i do it all the time.
if you'd said "stress stress blah blah oh and it's YOUR fault totally" then
my alarms would be hitting the defcon 4 button, but i saw nothing to get
concerned about at all.  just a perfectly normal "aaaaaaa!" moment.

Comment 14 Dmitry Selyutin 2022-08-14 15:36:28 BST

Good news folks! With the recent code which I implemented in openpower-isa, the opcode value and mask generation, we're able to retrieve the whole set of stuff needed to decode everything, including the corresponding `struct svp64_record`. I'm already marking the instructions as 8-byte, and output the correct name. The next step is to decode the operands values as needed.

Comment 15 Dmitry Selyutin 2022-08-17 20:31:09 BST

Adding new enum to sv_binutils is easy.

The patch...
https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=783d7c6c2ab159b4ca38d6fa6254697e8151939f

...the results:

enum svp64_function {
  SVP64_FUNCTION_NONE,
  SVP64_FUNCTION_ALU,
  SVP64_FUNCTION_LDST,
  SVP64_FUNCTION_SHIFT_ROT,
  SVP64_FUNCTION_LOGICAL,
  SVP64_FUNCTION_BRANCH,
  SVP64_FUNCTION_CR,
  SVP64_FUNCTION_TRAP,
  SVP64_FUNCTION_MUL,
  SVP64_FUNCTION_DIV,
  SVP64_FUNCTION_SPR,
  SVP64_FUNCTION_MMU,
  SVP64_FUNCTION_SV,
  SVP64_FUNCTION_VL,
  SVP64_FUNCTION_FPU,
};

struct svp64_desc {
  uint64_t function : 4;
  /* snip */
};

  {
    .name = "sv.ldx",
    .opcode = {
      .value = 0x7c00002a,
      .mask = 0xfc0007fe,
    },
    .desc = {
      .function = SVP64_FUNCTION_LDST,
      /* snip */
    },
  },

Comment 16 Dmitry Selyutin 2022-09-22 10:48:51 BST

As of today, the development version of binutils is already capable of producing this:

$ ./objdump -dr -Mlibresoc /tmp/test.o

/tmp/test.o:     file format elf64-powerpcle


Disassembly of section .text:

0000000000000000 <.text>:
   0:   e0 18 40 05     sv.add     r127,r1,*r67
   4:   14 82 e1 7f

Comment 17 Dmitry Selyutin 2022-09-22 23:01:42 BST

Completed extra and started working on specifiers. I already took the decision on how the modes handling would look, and fixed multiple issues in scope of it, the most important are:
1. We generated the Function emum instead SVMode enum. This is not what we want, we differ by SVMode categories. Since there already was Mode in sv_binutils, that one was renamed to ModeConst: this is a candidate to be deprecated anyway.
2. We'll need Rc value which binutils don't have (they simply have both "add" and "add." entries). We, however, need this to affect the mode matching. Therefore yet another bit from Desc was occupied, leaving us only 5 unused bits. Note, however, that this bit was freed by switch from Function (4 bits) to SVMode (3 bits).

The changes can be found in binutils branch at openpower-isa.

Comment 18 Dmitry Selyutin 2022-09-23 18:14:32 BST

I haven't checked everything yet, but normal modes seem to work.

   0:   e0 3f 4c 05     sv.add./dw=8 *r3,*r7,*r11
   4:   15 12 01 7c 
   8:   e0 3f 48 05     sv.add./dw=16 *r3,*r7,*r11
   c:   15 12 01 7c 
  10:   e0 3f 44 05     sv.add./dw=32 *r3,*r7,*r11
  14:   15 12 01 7c 
  18:   e0 3f 43 05     sv.add./sw=8 *r3,*r7,*r11
  1c:   15 12 01 7c 
  20:   e0 3f 42 05     sv.add./sw=16 *r3,*r7,*r11
  24:   15 12 01 7c 
  28:   e0 3f 41 05     sv.add./sw=32 *r3,*r7,*r11
  2c:   15 12 01 7c 
  30:   e0 3f 4e 05     sv.add./dw=8/sw=16 *r3,*r7,*r11
  34:   15 12 01 7c 
  38:   e0 3f 49 05     sv.add./dw=16/sw=32 *r3,*r7,*r11
  3c:   15 12 01 7c 
  40:   e0 3f 47 05     sv.add./dw=32/sw=8 *r3,*r7,*r11
  44:   15 12 01 7c 
  48:   e0 3f 45 05     sv.add./w=32 *r3,*r7,*r11
  4c:   15 12 01 7c 
  50:   e0 3f 4f 05     sv.add./w=8 *r3,*r7,*r11
  54:   15 12 01 7c 
  58:   e0 3f 4a 05     sv.add./w=16 *r3,*r7,*r11
  5c:   15 12 01 7c 
  60:   e0 7f 4d 05     sv.add./dw=8/sw=32/vec2 *r3,*r7,*r11
  64:   15 12 01 7c 
  68:   e0 3f 60 05     sv.add./m=r3 *r3,*r7,*r11
  6c:   15 12 01 7c 
  70:   e0 3f 50 05     sv.add./m=1<<r3 *r3,*r7,*r11
  74:   15 12 01 7c 
  78:   e0 3f d0 05     sv.add./m=~r10 *r3,*r7,*r11
  7c:   15 12 01 7c 
  80:   e0 3f e0 07     sv.add./m=so *r3,*r7,*r11
  84:   15 12 01 7c 
  88:   e0 3f d0 07     sv.add./m=ne *r3,*r7,*r11
  8c:   15 12 01 7c 
  90:   e0 3f 40 07     sv.add. *r3,*r7,*r11
  94:   15 12 01 7c 
  98:   e0 3f 40 05     sv.add. *r3,*r7,*r11
  9c:   15 12 01 7c

Comment 19 Dmitry Selyutin 2022-09-25 15:04:21 BST

0000000000000000 <vpx_get4x4sse_cs_svp64_real>:
   0:   00 00 e0 38     li      r7,0
   4:   a4 0f 84 78     sldi    r4,r4,1
   8:   a4 0f c6 78     sldi    r6,r6,1
   c:   b6 07 00 58     setvl   r0,r0,4,0,1,1
  10:   00 30 40 05     sv.lha  *r10,0(r3)
  14:   00 00 43 a8 
  18:   14 22 63 7c     add     r3,r3,r4
  1c:   00 30 40 05     sv.lha  *r14,0(r3)
  20:   00 00 63 a8 
  24:   14 22 63 7c     add     r3,r3,r4
  28:   00 30 40 05     sv.lha  *r18,0(r3)
  2c:   00 00 83 a8 
  30:   14 22 63 7c     add     r3,r3,r4
  34:   00 30 40 05     sv.lha  *r22,0(r3)
  38:   00 00 a3 a8 
  3c:   00 00 00 60     nop
  40:   00 30 40 05     sv.lha  *r26,0(r5)
  44:   00 00 c5 a8 
  48:   14 32 a5 7c     add     r5,r5,r6
  4c:   00 30 40 05     sv.lha  *r30,0(r5)
  50:   00 00 e5 a8 
  54:   14 32 a5 7c     add     r5,r5,r6
  58:   00 30 40 05     sv.lha  *r34,0(r5)
  5c:   00 00 05 a9 
  60:   14 32 a5 7c     add     r5,r5,r6
  64:   00 30 40 05     sv.lha  *r38,0(r5)
  68:   00 00 25 a9 
  6c:   b6 1f 00 58     setvl   r0,r0,16,0,1,1
  70:   c0 36 40 05     sv.subf *r42,*r10,*r26
  74:   50 30 42 7d 
  78:   c0 36 40 05     sv.mulld *r58,*r42,*r42
  7c:   d2 51 ca 7d 
  80:   04 06 40 05     sv.add/mr r7,*r58,r7
  84:   14 3a ee 7c 
  88:   78 3b e3 7c     mr      r3,r7
  8c:   20 00 80 4e     blr