Bug 1125 - split entire instruction into separate files so they can be [[!inline]]-ed into the wiki
Summary: split entire instruction into separate files so they can be [[!inline]]-ed in...
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: High enhancement
Assignee: Jacob Lifshay
URL:
Depends on: 1177
Blocks: 1048
  Show dependency treegraph
 
Reported: 2023-08-01 06:16 BST by Jacob Lifshay
Modified: 2024-01-08 01:06 GMT (History)
4 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
jacob=0


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2023-08-01 06:16:19 BST
this task is to split out the entire instruction database to single
files.

this task is NOT "splitting out the pseudocode".

the parser then also needs SIMPLE modification to understand
that it must read subdirectories of fixedarith/ etc instead of
fixedarith.mdwn

this task is designed to take approximately 30-50 minutes and no longer.

the following written by jacob is **NOT** the task. jacob's task is
now to UNDO these unauthorized modifications and to complete the work
as originally specified.


I changed the pseudo-code parser so it can parse [[!inline]] directives, and demoed changing one instruction (minmax. in av.mdwn), I pushed those changes to master. I then used a script to split out all the rest of the instructions and put those changes in the split-insns branch, so they can be reviewed before I push them.

master branch:
https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=26e689f923636bfdae9098b8eb39088292211e2e

commit 26e689f923636bfdae9098b8eb39088292211e2e
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 21:52:56 2023 -0700

    don't warn for directories

commit 60f9f523f78cae9e357b61e6bc55ca1b323dfa14
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 21:49:57 2023 -0700

    ignore indented comments too

commit b5d9084971dd761683a3a164af24c673a608aa23
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 19:52:05 2023 -0700

    demo moving pseudocode to separate file

commit 43152e91f4530ddaef5cef2614b41e022c57fced
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 19:44:00 2023 -0700

    add support for pseudocode being a [[!inline]] directive

split-insns branch:
https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=b46d6c40c8fe81747c702da6f00e3fca6f162030

commit b46d6c40c8fe81747c702da6f00e3fca6f162030
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 21:40:22 2023 -0700

    split out instructions from openpower/isa/system.mdwn

<snip out a whole bunch of similar commits>

commit 7743990753f18f6d9ada6d6d1644d0084ba25a9e
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 21:40:22 2023 -0700

    split out instructions from openpower/isa/av.mdwn

commit a221880faa57441e129d171a3cbc2d72480445e5
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 21:35:13 2023 -0700

    add split-insns.sh
Comment 1 Jacob Lifshay 2023-08-01 06:19:31 BST
we're out of funds from #775, so we should figure something else out...idk what else to put in see-also
Comment 2 Jacob Lifshay 2023-08-01 06:22:08 BST
one extra feature we get by using separate files, is that we can now put empty lines in an instructions' pseudo-code, since the parser can correctly handle that now that the input for each instruction is a separate file.
Comment 3 Luke Kenneth Casson Leighton 2023-08-01 11:00:09 BST
(In reply to Jacob Lifshay from comment #1)
> we're out of funds from #775, so we should figure something else out.

sorted, mini-task on bug #1029
Comment 4 Luke Kenneth Casson Leighton 2023-08-01 14:40:56 BST
(In reply to Jacob Lifshay from comment #0)
> I changed the pseudo-code parser so it can parse [[!inline]] directives, and
> demoed changing one instruction (minmax. in av.mdwn),

ah sorry i wasn't clear enough: i meant the *entire* instruction.

not:

    # DRAFT Minimum/Maximum (Rc=1)

    MM-Form

    * minmax. RT,RA,RB,MMM (Rc=1)

    Pseudo-code:

    [[!inline pagenames="openpower/isa/av/minmax." raw="yes"]]

    Special Registers Altered:

        CR0                     (if Rc=1)

ACTUALLY have the ENTIRETY of the instruction (all of the above)
in minmax..mdwn

such that av.mdwn becomes:

<!-- DRAFT Instructions for PowerISA Version 3.0 B Book 1 -->
<!-- https://libre-soc.org/openpower/sv/bitmanip/ -->
<!-- https://libre-soc.org/openpower/sv/av_opcodes/ -->
[[!inline pagenames="openpower/isa/av/minmax." raw="yes"]]
[[!inline pagenames="openpower/isa/av/other" raw="yes"]]
[[!inline pagenames="openpower/isa/av/next" raw="yes"]]

the way that you chose to do it was a lot more work, wasn't it?

all you actually had to do by doing the entire instruction
was look for "#" in the split-insns.sh file then add a second
level of subdirectories, or read "av.mdwn" for example and
notice "av/minmax" in it.

that's about... 15-20 minutes of work.

what you are proposing requires far more work as it involves
massive intrusive changes to the instruction parser.

which we DO NOT have time for and i do not want time or money
wasted on.
Comment 5 Jacob Lifshay 2023-08-01 15:06:08 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> (In reply to Jacob Lifshay from comment #0)
> > I changed the pseudo-code parser so it can parse [[!inline]] directives, and
> > demoed changing one instruction (minmax. in av.mdwn),
> 
> ah sorry i wasn't clear enough: i meant the *entire* instruction.

I thought about doing that, but decided against it because we often quote just the pseudo-code on the wiki and because the way instructions' different parts are organized in rfcs don't match how the parser likes them (e.g. we have the instruction form as a textual table, as well as having the full description text). if we do want the whole instruction split out per file, we still need just the pseudo-code in a separate file for the above reasons.


> the way that you chose to do it was a lot more work, wasn't it?

yes, but I think it is worth it.
> 
> all you actually had to do by doing the entire instruction
> was look for "#" in the split-insns.sh file then add a second
> level of subdirectories, or read "av.mdwn" for example and
> notice "av/minmax" in it.
> 
> that's about... 15-20 minutes of work.
> 
> what you are proposing requires far more work as it involves
> massive intrusive changes to the instruction parser.

actually that part is rather minor since it only supports [[!inline]] when the pseudocode is nothing but [[!inline]] (plus prefix comments)
> 
> which we DO NOT have time for and i do not want time or money
> wasted on.

well, i already completed all those changes...the simulator part took like 30min.
Comment 6 Luke Kenneth Casson Leighton 2023-08-01 16:02:26 BST
(In reply to Jacob Lifshay from comment #5)

> I thought about doing that, but decided against it because we often quote
> just the pseudo-code on the wiki 

you thought about it but then made an arbitrary ad-hoc decision without
further consultation.

> and because the way instructions' different
> parts are organized in rfcs don't match how the parser likes them (e.g. we
> have the instruction form as a textual table, as well as having the full
> description text). if we do want the whole instruction split out per file,
> we still need just the pseudo-code in a separate file for the above reasons.

no: the way i would like it done is that the RFCs *include the entire
instruction* and if the RFC is different the *instruction* is modified
(and the parser to match) to cope.

in this way when it comes to dropping the instructions into "patches
to the v3.2 or whatever Power ISA is available at the time" it's a
no-brainer task because the instructions *are already in the format that
pandoc can directly convert to latex* for inclusion *directly* in the
spec.

ultimately i want the Libre-SOC openpower-isa repo to become a git submodule
of the Power ISA repository which you can't access because (sigh) they
haven't made it available. sigh.
Comment 7 Jacob Lifshay 2023-08-05 01:27:37 BST
(In reply to Luke Kenneth Casson Leighton from comment #6)
> (In reply to Jacob Lifshay from comment #5)
> > and because the way instructions' different
> > parts are organized in rfcs don't match how the parser likes them (e.g. we
> > have the instruction form as a textual table, as well as having the full
> > description text). if we do want the whole instruction split out per file,
> > we still need just the pseudo-code in a separate file for the above reasons.
> 
> no: the way i would like it done is that the RFCs *include the entire
> instruction* and if the RFC is different the *instruction* is modified
> (and the parser to match) to cope.

ok, how about this:

we have the entire RFC instruction entry in e.g. openpower/isa/bitmanip/sadduw.mdwn and have the pseudo-code separated out into openpower/isa/bitmanip/sadduw_code.mdwn.

the pseudocode would be in a file by itself for those cases where we want just the pseudocode in the wiki, such as:
https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/bitmanip.mdwn;h=5c84e83e5912844d771488ac875921f787accf40;hb=4925f0c8f5a166c6ecb9528d60ab943280287c6e#l278

it would look like so (file contents indented by 4 extra spaces for clarity):

ls004.mdwn:
    <snip>

    \newpage{}

    [[!inline pagenames="openpower/isa/bitmanip/sadduw" raw="yes"]]

    # Instruction Formats

    <snip>

openpower/isa/bitmanip.mdwn:
    <!-- Draft Instructions here described in -->
    <!-- https://libre-soc.org/openpower/sv/bitmanip/ -->
    <!-- These instructions are *not yet official* -->

    <snip>
    [[!inline pagenames="openpower/isa/bitmanip/sadduw" raw="yes"]]

openpower/isa/bitmanip/sadduw.mdwn:
    # Shift-and-Add Unsigned Word

    `sadduw RT, RA, RB, SH`

    |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
    |-------|------|-------|-------|-------|-------|----|----------|
    |  PO   | RT   |  RA   |  RB   |  SH   |  XO   | Rc | Z23-Form |

    Pseudocode:

    ```
    [[!inline pagenames="openpower/isa/bitmanip/sadduw_code" raw="yes"]]
    ```

    When `SH` is zero, the lower word contents of register RB <snip>

openpower/isa/bitmanip/sadduw_code.mdwn:
(note additional indent-4 that's actually in file, beyond indent-4 for this comment)
        shift <- SH + 1                  # Shift is between 1-4
        n <- (RB)[32:63]                 # Only use lower 32-bits of RB
        sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
        RT <- sum                        # Result stored in RT
Comment 8 Jacob Lifshay 2023-08-05 01:34:49 BST
(In reply to Jacob Lifshay from comment #7)
> (In reply to Luke Kenneth Casson Leighton from comment #6)
> > no: the way i would like it done is that the RFCs *include the entire
> > instruction* and if the RFC is different the *instruction* is modified
> > (and the parser to match) to cope.
> 
> ok, how about this:
> 
> we have the entire RFC instruction entry in e.g.
> openpower/isa/bitmanip/sadduw.mdwn and have the pseudo-code separated out
> into openpower/isa/bitmanip/sadduw_code.mdwn.

because both our latest proposals involves having the full RFC text for each instruction, the parser will need a decent amount of modification to parse that, so this bug needs some budget assigned. since this makes RFCs automatically stay in sync with the pseudocode we actually test, would taking it from the "write RFCs" budget be good? (I'll find the bug number later)
Comment 9 Luke Kenneth Casson Leighton 2023-08-05 04:35:13 BST
(reminder again: please take extra care to trim context, it is
 easy to see replies above and wastes time and resources when reading
 duplicate copies of already-read comments)

(In reply to Jacob Lifshay from comment #7)

> ok, how about this:
> 
> we have the entire RFC instruction entry in e.g.
> openpower/isa/bitmanip/sadduw.mdwn and have the pseudo-code separated out
> into openpower/isa/bitmanip/sadduw_code.mdwn.

that works really well, the pandoc plugin already does the job.
if "code" is in a fixed location very little modification to
the machine-reading parser is needed.

freeform text is not appropriate to support.

4-space indentation can be supported by the pandoc plugin
(easily added because the plugin knows for a fact anything
 ending "_code.mdwn" gets indented 4 spaces).

pick a bugreport, this is a relatively small task...
bug #1013 would do well.
Comment 10 Luke Kenneth Casson Leighton 2023-08-05 04:36:36 BST
(In reply to Jacob Lifshay from comment #8)
> the parser will need a decent amount of modification

NO.
Comment 11 Jacob Lifshay 2023-08-06 07:34:04 BST
(In reply to Luke Kenneth Casson Leighton from comment #10)
> (In reply to Jacob Lifshay from comment #8)
> > the parser will need a decent amount of modification
> 
> NO.

we *do* need to parse the file that has the free-form text, because that's where the instruction operands, mnemonics, form, and Special-registers are. only the pseudocode is in a separate *_code.mdwn file.

by decent amount, i mean rewriting a decent part of page_reader.ISA.read_file, almost nothing else will change. The idea is I'm modifying enough that I should get paid for the work, since it's more than writing 5 lines of straightforward code.

my general idea is to skip over the free-form text sections by looking for the next line that is recognized, e.g. looking for "Special Registers Altered:"
Comment 12 Luke Kenneth Casson Leighton 2023-08-06 08:27:08 BST
(In reply to Jacob Lifshay from comment #11)
> (In reply to Luke Kenneth Casson Leighton from comment #10)
> > (In reply to Jacob Lifshay from comment #8)
> > > the parser will need a decent amount of modification
> > 
> > NO.
> 
> we *do* need to parse the file that has the free-form text, 

no we do not.

see, read, and take as top priority comment #6.

we mistakenly duplicated the instruction details into the
RFCs, modifying them *inappropriately* in a freeform fashion.
this mistake needs correction and given the size of that manual task
it needs a LARGE budget far bigger than what you envisage.
(see below)

therefore the work needed to be done is the MINIMUM amount
of work to put those modifications BACK into the mdwn
and then include the ENTIRE mdwn in the RFC... WITHOUT
attempting to support any freeform arbitrary text.

the format of instructions is very very specific and regular
in the Power ISA.

two modifications are needed:

1. a "Fields Form" that goes in the same place as can be seen in
Power ISA Specs.

2. a new "Description in English Language" section is needed that
has a hash in front of it just after the pseudocode and before
"Special Registers"

that is ALL THAT IS NEEDED.

this is about 8-10 lines of code.

please stop and think ahead rather than rush into writing vast amounts
of code that then has to be ripped out.

the above is then dead-easy to write a pandoc plugin that autogenerates
the latex for insertion into the Power ISA spec.

identification and insertion of (2) can be done using an insndb
walker but that *is* another separate task with its own budget
Comment 13 Luke Kenneth Casson Leighton 2023-08-06 09:10:53 BST
(In reply to Luke Kenneth Casson Leighton from comment #12)

> 1. a "Fields Form" that goes in the same place as can be seen in
> Power ISA Specs.

temporarily this will have to be optional as during the transition
(see bug #1133) the two machine-readable formats are needed to as
not to have an all-or-nothing conversion period.  this should
increase the number of lines needed by about... 3 possibly 4.

> 2. a new "Description in English Language" section is needed that
> has a hash in front of it just after the pseudocode and before
> "Special Registers"

*later* (as part of the pandoc-latex generation which is ANOTHER
separate task, this text may be parsed to look for "Programmer's note"
and appropriate latex incantations outputted)
Comment 14 Jacob Lifshay 2023-08-06 16:49:30 BST
(In reply to Luke Kenneth Casson Leighton from comment #12)
> two modifications are needed:
> 
> 1. a "Fields Form" that goes in the same place as can be seen in
> Power ISA Specs.

I'm planning on making the parser have the |-table be optional, so we can add it later.

inserting the "X-Form" text at the end of the title -- that should be trivial to make the shell script move to the correct place since it's just on the next non-blank line. (though obtaining that form name from fields.txt will be more complex, so i'm just planning on moving the existing form name to the correct place and the python parser can just check for consistency, to be added later).
> 
> 2. a new "Description in English Language" section is needed that
> has a hash in front of it just after the pseudocode and before
> "Special Registers"

this section *is* the free-form text i was talking about. perhaps describing it as free-form was misleading,

this needs to be a html comment since there is no text "Description in English language" in the v3.1B pdf (unless v3.2 changed that?).

likewise, there is no "Pseudocode" text so that also needs to be a comment.

both of those changes are trivial to do with the shell script (though the english description section will be empty, ready to be filled in)

> identification and insertion of (2) can be done using an insndb
> walker but that *is* another separate task with its own budget

using insndb to insert that section is unnecessarily complex, the shell script can easily do that.
Comment 15 Jacob Lifshay 2023-08-06 17:02:40 BST
(In reply to Luke Kenneth Casson Leighton from comment #13)
> *later* (as part of the pandoc-latex generation which is ANOTHER
> separate task, this text may be parsed to look for "Programmer's note"
> and appropriate latex incantations outputted)

this needs to not be part of the "description" section, since that is *before* "special registers", but all architecture/engineering/programmer's notes appear *after* "special registers", so need to be their own optional sections each of which can appear multiple times.
Comment 16 Luke Kenneth Casson Leighton 2023-08-06 17:07:32 BST
(In reply to Jacob Lifshay from comment #14)
> (In reply to Luke Kenneth Casson Leighton from comment #12)
> > two modifications are needed:
> > 
> > 1. a "Fields Form" that goes in the same place as can be seen in
> > Power ISA Specs.
> 
> I'm planning on making the parser have the |-table be optional, so we can
> add it later.

awesome. comment #13.

> inserting the "X-Form" text at the end of the title -- that should be
> trivial to make the shell script move to the correct place since it's just
> on the next non-blank line. (though obtaining that form name from fields.txt
> will be more complex,

again: read comment #13.  do NOT attempt to duplicate insndb.
you ignored that i requested you write it in python, by using
one of the existing parser-readers: now there are consequences.

> so i'm just planning on moving the existing form name
> to the correct place and the python parser can just check for consistency,
> to be added later).

don't do that please. leave it where it is.

> > 
> > 2. a new "Description in English Language" section is needed that
> > has a hash in front of it just after the pseudocode and before
> > "Special Registers"
> 
> this section *is* the free-form text i was talking about. perhaps describing
> it as free-form was misleading,

indeed.  again: see comment #13.

> this needs to be a html comment since there is no text "Description in
> English language" in the v3.1B pdf (unless v3.2 changed that?).

solvable at the planned pandoc-plugin level by stripping out the
words "Description in English language".

trivial and NOT today's task.
 
> likewise, there is no "Pseudocode" text so that also needs to be a comment.

all solvable with the pandoc plugin. and also NOT today's task.

> both of those changes are trivial to do with the shell script (though the
> english description section will be empty, ready to be filled in)
> 
> > identification and insertion of (2) can be done using an insndb
> > walker but that *is* another separate task with its own budget
> 
> using insndb to insert that section is unnecessarily complex,

no, it really is not trivial, {}-Forms are NOT UNIQUE, they are
hopelessly mixed up (thank you IBM) and require a lookup of the
frickin OPERANDS In order to identify the correct frickin fields.text
entry.

if you have duplicated that in shell script you have just wasted your
time doing something that i specifically advised you not to do.
(and i am not authorising payment for things that you have wasted
time on by not listening to my advice)

> the shell
> script can easily do that.

the shell script should never have been written, you ignored my
advice.

please abandon the shell script as quickly as possible and do not
increase its complexity any further.  it was intended to be a one-shot
purpose to create the split and then abandoned immediately.

you have already spent far too long on this task through not listening.
Comment 17 Luke Kenneth Casson Leighton 2023-08-06 17:09:01 BST
(In reply to Jacob Lifshay from comment #15)

> this needs to not be part of the "description" section, since that is
> *before* "special registers", but all architecture/engineering/programmer's
> notes appear *after* "special registers", so need to be their own optional
> sections each of which can appear multiple times.

unfortunately it varies. we get to decide what is is, but again
this is the *pandoc plugin* writer's problem to solve, not the
problem to deal with right now.

please focus.
Comment 18 Jacob Lifshay 2023-08-06 17:17:13 BST
(In reply to Luke Kenneth Casson Leighton from comment #16)
> (In reply to Jacob Lifshay from comment #14)
> > (In reply to Luke Kenneth Casson Leighton from comment #12)
> > > identification and insertion of (2) can be done using an insndb
> > > walker but that *is* another separate task with its own budget
> > 
> > using insndb to insert that section is unnecessarily complex,
> 
> no, it really is not trivial, {}-Forms are NOT UNIQUE, they are
> hopelessly mixed up (thank you IBM) and require a lookup of the
> frickin OPERANDS In order to identify the correct frickin fields.text
> entry.

you referred to (2) (insertion of "english description") and then proceed to complain about (1)? we're talking past each other here...
Comment 19 Jacob Lifshay 2023-08-06 17:35:38 BST
(In reply to Luke Kenneth Casson Leighton from comment #16)
> please abandon the shell script as quickly as possible and do not
> increase its complexity any further.  it was intended to be a one-shot
> purpose to create the split and then abandoned immediately.

the split needs to be re-done in order to split out the whole instruction and not just the pseudocode, so that's why I wanted to modify the shell script since I was planning on modifying it anyway to do the full split.
Comment 20 Jacob Lifshay 2023-08-06 17:37:16 BST
(In reply to Jacob Lifshay from comment #19)
> the split needs to be re-done in order to split out the whole instruction
> and not just the pseudocode, so that's why I wanted to modify the shell
> script since I was planning on modifying it anyway to do the full split.

so, do you want me to rewrite it in python or do the extra modifications to the shell script (afaict less work)?
Comment 21 Luke Kenneth Casson Leighton 2023-08-06 19:43:11 BST
(In reply to Jacob Lifshay from comment #20)

> so, do you want me to rewrite it in python 

it's too late, now, isn't it? that's *even more* work for what
shoud have been a simple task using insndb and a pre-existing
parser.

> or do the extra modifications to
> the shell script (afaict less work)?

different purpose, different script. more time wasted on rewrite.

1. splitter script leave as-is. run then abandon
2. insndb-based enumerator script

OR

3 *USE* insndb command-line tool to *OUTPUT* the opcode
   format FROM the shellscript.

outputting the opcode format is one of the options to one of the
listing tools, please use it instead of duplicating work.

3 *may* turn out to be dependent on 1 because the insndb tool can
list the mdwn file (it is part of insn Record). if that is now the
individual instruction then it is a near-trivial task.

basically you should not have used complex shell script, they are
unintelligable, user-hostile and impossible to understand except
by people with IQs well over 150.

but we are stuck with the decision, now, but the priority is
"get the job done and move on" not "do the job nicely"
Comment 22 Jacob Lifshay 2023-08-06 20:00:03 BST
(In reply to Luke Kenneth Casson Leighton from comment #21)
> 3 *may* turn out to be dependent on 1 because the insndb tool can
> list the mdwn file (it is part of insn Record). if that is now the
> individual instruction then it is a near-trivial task.

it is not an individual insn because the shell script currently only splits out the pseudocode, leaving the rest of the insn in the original mdwn -- hence why i want to change the script to split out the whole insn too. the script would remain one-time use (well, technically two times because last time was the first, ignoring the times i ran it to debug it).
Comment 23 Luke Kenneth Casson Leighton 2023-08-07 01:29:22 BST
(In reply to Jacob Lifshay from comment #22)

> it is not an individual insn because the shell script currently only splits
> out the pseudocode, leaving the rest of the insn in the original mdwn --
> hence why i want to change the script to split out the whole insn

yes - sorry i assumed that as given-context, given that it is supposed
to be the sole purpose of this bugreport and no other purpose

>  too. 

no really please *do not* add yet more coding time,
this was supposed to have been completed in under 30
minutes, it has been over 7 days and 22 round trip
discussions, far in excess of the actual coding time
has been wasted on it.

i am getting very fed up.

please complete the inline conversion to mdwn as originally
requested, no other "and another thing to add to the scope"
then close this bugreport.
Comment 24 Jacob Lifshay 2023-08-08 00:23:06 BST
(In reply to Luke Kenneth Casson Leighton from comment #23)
> (In reply to Jacob Lifshay from comment #22)
> please complete the inline conversion to mdwn as originally
> requested, no other "and another thing to add to the scope"
> then close this bugreport.

I completed the conversion to split out markdown files as well as doing the corresponding parser changes to make tests pass (having *nearly everything* broken isn't helpful).

since this is a large automated change, I force-pushed it to the split-insns branch until it can be reviewed by someone else, at which point it can be pushed/merged to master and this bug closed.

commits that aren't just splitting instructions:

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=6c7e8c6c1462578c28bc1b207a17a6ecaeadd9c0

commit 6c7e8c6c1462578c28bc1b207a17a6ecaeadd9c0
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Aug 7 16:02:51 2023 -0700

    change to only support fully-split instructions (split pseudocode is still optional)

commit 9ce09207e08ff8638241bece31c1814c4e1ce554
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Aug 7 15:28:52 2023 -0700

    change split-insns.sh to split out both whole insn and pseudocode into separate mdwn files

commit f24c440c9059fe2decf68603bf5494c7b1b129df
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Aug 7 14:08:07 2023 -0700

    Revert "demo moving pseudocode to separate file"
    
    reverting because split-insns.sh won't support [[!inline]] already existing
    
    This reverts commit b5d9084971dd761683a3a164af24c673a608aa23.

commit a221880faa57441e129d171a3cbc2d72480445e5
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Jul 31 21:35:13 2023 -0700

    add split-insns.sh


all the automated instruction splitting commits:

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=aff782105c86c1aae4807d7e5577c3c8c37a70f1

commit aff782105c86c1aae4807d7e5577c3c8c37a70f1
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Aug 7 16:04:00 2023 -0700

    split out instructions from openpower/isa/system.mdwn

<snip>

commit 8bd302d4376ce7859f617cc6fc528e4e454811aa
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Mon Aug 7 16:04:00 2023 -0700

    split out instructions from openpower/isa/av.mdwn
Comment 25 Luke Kenneth Casson Leighton 2023-09-30 14:51:08 BST
jacob i am now attempting to add "Description" parsing and i find
this gets in the way and the correct task has not been completed,
which is extremely irritating.

sorry to have to remind you of the Charter (3.7)
https://libre-soc.org/charter/discussion/
by not listening to me, this is your mess, therefore you own it,
and it is therefore also your responsibility to sort out.

the entire task was *literally* to move the entirety of every
single instruction into 2nd-level subdirectories, whereby the
job of collating them all could be done by enumerating the
*directories* (with dir()) instead of files ending ".mdwn",
then a 2nd use of dir(subdir) seamlessly collates all
individual instructions.

even parsing the words "inline" is completely unnecessary as
that job is handled by ikiwiki (and by the pandoc plugin).

can you please get rid of *all* of this, it's just crap.

            if inline_line is not None:
                assert not other, \
                    "can't use [[!inline]] directive with other content"

                re_match = re.fullmatch(
                    r'\[\[!inline pagenames="openpower/isa/([^" ]*[^"/ ])" '
                    r'raw="yes"]]', inline_line)
                assert re_match, (
                    'invalid [[!inline]] directive, must be of the form:\n'
                    '[[!inline pagenames="openpower/isa/foo/bar" '
                    'raw="yes"]]')
Comment 26 Luke Kenneth Casson Leighton 2023-10-01 11:00:29 BST
i've reverted the majority of the unauthorized work - you really do need
to get into the habit of following that process which you also need to
document (https://bugs.libre-soc.org/show_bug.cgi?id=1126#c21) precisely
because you don't follow it, and it's leaving a trail of debris (incompleted
work) behind across an alarmingly-high number of tasks now.

the idea of matching line numbers was great but you implemented it by
damaging the output from the parser. you *could* have implemented a
separate lookup dictionary that matched against the line numbers. if you
had asked i would have advised of this.

i will revert some more of the damage you did, because you are blocking
my task and the use of regexes in this type of "drastic simple" parser
was - as i explained a year ago - unauthorized.

i can then get on with the addition of "Description" detection, and
i expect you to work in branches and request a review from now on.
Comment 27 Luke Kenneth Casson Leighton 2023-10-01 11:17:05 BST
four commits from this point that undo the damage caused.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2b144dd

the damage includes destroying the strategic requirement that the
pagereader be capable of being used to perform morphing of the data.
the insertion of arbitrary blank lines was not authorized.

jacob under the Charter you need to "own responsibility" for this mess
that you have made, and clean it up.  the goals that you set were
absolutely fantastic: improving the line-number matching is really
valuable: it's just that you rushed ahead without consulting (again).

this is a long-term habit and you have to cease doing it. please work
*only* in branches from now on forward.
Comment 28 Jacob Lifshay 2023-10-04 08:12:16 BST
(In reply to Luke Kenneth Casson Leighton from comment #27)
> four commits from this point that undo the damage caused.
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2b144dd
> 
> the damage includes destroying the strategic requirement that the
> pagereader be capable of being used to perform morphing of the data.
> the insertion of arbitrary blank lines was not authorized.

iirc I described how I did it to you quite a while ago and you seemed happy at the time.

> 
> jacob under the Charter you need to "own responsibility" for this mess
> that you have made, and clean it up.  the goals that you set were
> absolutely fantastic: improving the line-number matching is really
> valuable: it's just that you rushed ahead without consulting (again).

I will try to work on getting this bug fixed soon.
Comment 29 Luke Kenneth Casson Leighton 2023-10-04 10:06:18 BST
(In reply to Jacob Lifshay from comment #28)
> (In reply to Luke Kenneth Casson Leighton from comment #27)
> > four commits from this point that undo the damage caused.
> > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2b144dd
> > 
> > the damage includes destroying the strategic requirement that the
> > pagereader be capable of being used to perform morphing of the data.
> > the insertion of arbitrary blank lines was not authorized.
> 
> iirc I described how I did it to you quite a while ago and you seemed happy
> at the time.

i loved the *idea*, which is a fantastic goal. i didn't in any way
realise at the time the full implications of the non-reproduceability
(pretty-print | diff - u | wc == 0)

> I will try to work on getting this bug fixed soon.

appreciated. as there is now support for [currently-optional] english
language "descriptions" i'll cherry-pick those over as otherwise we
end up with branch conflicts.
Comment 30 Luke Kenneth Casson Leighton 2023-10-04 10:13:40 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f60a76730

that i *think* was the last (main) difference, it's for shriya to track
down where/when errors in the shriya_add_descriptions branch get introduced
Comment 31 Luke Kenneth Casson Leighton 2024-01-08 01:06:11 GMT
see bug #1048 comment 24