1039 – add hardware-cycle-accurate stastistical modelling to ISACaller for an in-order core

Bug 1039 - add hardware-cycle-accurate stastistical modelling to ISACaller for an in-order core

Summary: add hardware-cycle-accurate stastistical modelling to ISACaller for an in-ord...

Status:	IN_PROGRESS

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	High enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:	https://libre-soc.org/3d_gpu/architec...

Depends on:
Blocks:

Reported:	2023-03-22 16:49 GMT by Luke Kenneth Casson Leighton
Modified:	2024-03-09 11:14 GMT (History)
CC List:	4 users (show)

See Also:	939
NLnet milestone:	NLnet.2022-08-107.ongoing
total budget (EUR) for completion of task and all subtasks:	3000
budget (EUR) for this task, excluding subtasks' budget:	3000
parent task for budget allocation:	961
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	lkcl=1500 donated=1300 ghostmansd={amount=200, submitted=2024-02-26, paid=2024-03-08}

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-03-22 16:49:40 GMT

hardware-cycle-accurate statistics are needed but it will take
some effort to add to cavatools. therefore a model needs to be
added to ISACaller which allows some basic performance estimates

---

* TODO: cavatools power consumption needs to be extracted and
        implemented in python
* TODO: tidy up the corrections to the wiki page

Comment 1 Luke Kenneth Casson Leighton 2023-05-02 11:02:05 BST

https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45

Comment 2 Luke Kenneth Casson Leighton 2023-05-02 18:54:12 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e74dfbf1
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=c8290fce

begin first version with a bit of explanation

Comment 3 Luke Kenneth Casson Leighton 2023-05-07 14:16:31 BST

ghostmansd, these two instructions are where insndb comes into play:
first decoding the instruction and then identifying its registers.
the register nomenclature will need to be unique: it doesn't have
to be a string, it can be a tuple.  so add 1,2,3 could be converted
to ("GPR", 1) and 2 and 3 or it could be converted to "r1" "r2" "r3"
it doesn't matter which as long as, later, back in reads_possible()
and writes_possible() it is easy to identify which register is
associated with which regfile.

  99         # get the read and write regs
 100         writeregs = get_input_regs(insn)
 101         readregs = get_output_regs(insn)

this assumes a *global* regfile which is clearly false.

 164         while len(possible) < 3 and len(r) > 0:

it is here that the number of *GPRs* has to be less than (e.g.) 3
but if there happens also to be cr0 read then obviously that is
permitted *at the same time* as 3 GPR reads, as it is a totally
separate regfile.

so perhaps a spec of ("GPR", n) and ("CR", m) would be better?

Comment 4 Luke Kenneth Casson Leighton 2023-05-09 18:29:14 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=10d93c409853eb0c7994b43e13eee0560437d058

split out ISAPages so it can be instantiated without needing
entire ISACaller

Comment 5 Dmitry Selyutin 2023-05-10 12:42:27 BST

It seems that I've managed to migrate to new laptop, at least process described at HDL devscripts completed successfully:
https://libre-soc.org/HDL_workflow/devscripts/

I'll start checking the code soon.

Comment 6 Dmitry Selyutin 2023-05-10 16:33:30 BST

insndb needs to be extended with support for keeping some information after the assembly process took place. Right now we construct instruction on-the-fly; once the instruction is assembled, we simply have 32 or 64 bits (depending on prefix).

Comment 7 Luke Kenneth Casson Leighton 2023-05-12 12:22:23 BST

i added a log output incredibbly simple, no need for insndb

format: instruction args rGPR=n/offset/elwidth wGPR...

Comment 8 Dmitry Selyutin 2023-05-12 13:27:33 BST

Since I already started changing insndb to make it reusable for any module which needs either assembly or disassembly, I decided to complete these changes. Regardless of the current task, the changes seem to be good for other future needs.

These works seem to be completed now: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/insndb-asm-dis. As we discussed, I won't use these changes in scope of this task, unless necessary later (certainly not now).

Comment 9 Dmitry Selyutin 2023-05-12 13:29:00 BST

I'll perhaps revisit the trace format to make it more usable for lookup/split, but at this stage insndb is not needed. Again, the former commits are for future purposes only.

Comment 10 Luke Kenneth Casson Leighton 2023-05-12 18:51:16 BST

(In reply to Dmitry Selyutin from comment #9)
> I'll perhaps revisit the trace format to make it more usable for
> lookup/split,

yes it should easily be parseable it 2-3 lines, nothing sophisticated
needed. i know you like (and importantly understand) regexes, do try
to keep it simple :)

format

"r or w"
"target" (regfile name GPR FR CR CRf XER MSR FPSCR SPRf SPRs" or MEM)
"reg or memaddr"
"elwidth or memwidth"
"element offset" (not for rMEM or wMEM)

> but at this stage insndb is not needed. Again, the former
> commits are for future purposes only.

ack appreciated.

CR is the whole CR, that will be one port on its own, assume max 1R1W
CRf is one 4-bit port, assume 3R3W of these
GPR is int regfile assume 4R1W
FPR is FP regfile assume 3R1W
XER is special assume 1R1W
FPSCR is special assume 1R1W
SPRf is FAST spr, assume 3R3W
SPRs is SLOW spr, assume 1R1W

MEM ignore for now.

so here:

 175         while len(possible) < 3 and

becomes:

     while possible_cr < 3

and

     while possible_gpr < 4

etc. i.e. the instruction can only proceed once all regs have been read,
but there is limited ports so it can take several cycles, but each one
read you can take it out the set.

when the set is empty then all operands are read, it can issue.

real simple.

Comment 11 Dmitry Selyutin 2023-05-14 22:09:41 BST

Jacob, Luke, I've re-checked the discussion[0][1], and came to conclusion that you both have valid points, so I decided to go with something intermediate. The branch tracefile contains the following functionality:
1. Every ISACaller instance can be provided with `tracefile` argument; this argument must be any object which has a callable `write` attribute or None.
2. If `ISACaller.tracefile` is None, then we create a temporary file.
3. However, we don't save these temporary files, unless TRACEFILE environment variable is set to anything. By default, these files are __transient__.
4. If and only if the user sets TRACEFILE environment variable, the tracefiles are kept.
5. The files have names starting with "trace_" and ending with ".trace". Everything inbetween obeys tempfile.NamedTemporaryFile rules.
6. test_runner has extended names, starting with either "trace_testXX_" or "trace_subtestXX_", where XX stands for global test counter (if launched via run_tst) or for test_case unique counter (nmutil.get_test_path.Counter). Everything inbetween obeys tempfile.NamedTemporaryFile rules.

This way:
1. We avoid conditional code which checks whether `self.tracefile` is None or not.
2. We can turn tracefiles on/off via environment (I think it'd be a bad idea to create these by default).
3. Unique per-test and per-subtest trace files are still created.

[0] https://libre-soc.org/irclog/%23libre-soc.2023-05-13.log.html#t2023-05-13T19:56:50
[1] https://libre-soc.org/irclog/%23libre-soc.2023-05-14.log.html#t2023-05-14T06:17:45

Comment 12 Dmitry Selyutin 2023-05-14 22:15:40 BST

I'll wait for CI, because ISACaller is a common code, so I don't want to end up in a situation when I broke something. Please let me know your opinions and suggestions; feel free to give this functionality a shot. Here's what I had for the first three tests in test_caller.py:

$ TRACEFILE=true python3 src/openpower/decoder/isa/test_caller.py | grep tracefile
tracefile /tmp/trace_test0_iap01fbe.trace (permanent)
tracefile /tmp/trace_test1__ee5vb3b.trace (permanent)
tracefile /tmp/trace_test2_s3xvvg2d.trace (permanent)
$ cat /tmp/trace_test0_iap01fbe.trace
add 1, 3, 2 rGPR:3.0/64 rGPR:2.0/64 wGPR:1.0/64
$ cat /tmp/trace_test1__ee5vb3b.trace
addi 3, 0, 0x1234 rGPR:0.0/64 wGPR:3.0/64
addi 2, 0, 0x4321 rGPR:0.0/64 wGPR:2.0/64
add  1, 3, 2 rGPR:3.0/64 rGPR:2.0/64 wGPR:1.0/64
$ cat /tmp/trace_test2_s3xvvg2d.trace
addi 1, 0, 0x0010 rGPR:0.0/64 wGPR:1.0/64
addi 2, 0, 0x1234 rGPR:0.0/64 wGPR:2.0/64
stw 2, 0(1) rGPR:1.0/64 rGPR:2.0/64
lwz 3, 0(1) rGPR:1.0/64 wGPR:3.0/64

Comment 13 Luke Kenneth Casson Leighton 2023-05-16 02:09:32 BST

this needs to be *brutally* simple.  we have full control over the format
and it is auto-generated.  therefore there is absolutely no need for
any error-checking of any kind, and as it is single-purpose there is
absolutely no need whatsoever for dataclasses, classes, or anything more
complex than lists tuples and dicts.  even namedtuple is a stretch.

most of the work here actually needs to go into the unit tests, and
making sure that the model is "correct", followed by adding in power-estimates
(which will be done later).


https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=365af65074431ccba0483dbd6cf618d135421fd6

added namedtuple for the hazards (nothing more complex needed)

+# trace file entries are lists of these.
+Hazards = namedtuple("Hazards", ["action", "target", "ident", "offs", "elwid"])

added profiles (as a dict - nothing more complex needed)

+# key: readport, writeport (per clock cycle)
+HazardProfiles = {
+    "GPR": (4, 1),  # GPR allows 4 reads 1 write possible in 1 cycle...
+    "FPR": (3, 1),

added read_file function: nothing more complex needed, no error-checking required

+def read_file(fname):
+    """reads a trace file in the format "[r:FILE:regnum:offset:width]* # insn"
+    """
+    ...

Comment 14 Luke Kenneth Casson Leighton 2023-05-16 02:15:49 BST

(In reply to Dmitry Selyutin from comment #12)

> I'll wait for CI, because ISACaller is a common code, so I don't want to end
> up in a situation when I broke something. 

it is a little early to focus on the output that ISACaller creates.
the primary focus is on the inorder.py model code, using independent
unit tests that have absolutely no direct connection whatsover to ISACaller.

however when it is done there should be no use of temp files (at all),
under any circumstances.  there should be no presence of temp files,
no attempt by ISACaller to create temp files, at all.

ISACaller should explicitly be given an input (which is either None,
a filename, or an object with a "write" function) and should detect
that.

if the argument is None then absolutely nothing happens.

in this way you can easily ensure that you do absolutely no damage
because under no circumstances should you have modified any code
in any *existing* unit tests or anywhere at all in existing functionality
that sets that argument.

and thus a branch is *NOT* necessary, at all.

Comment 15 Luke Kenneth Casson Leighton 2023-05-16 02:22:12 BST

(In reply to Dmitry Selyutin from comment #11)

> Jacob, Luke, I've re-checked the discussion[0][1], and came to conclusion
> that you both have valid points, so I decided to go with something
> intermediate. The branch tracefile contains the following functionality:

there should be no branch.  it is unnecessary.

> 1. Every ISACaller instance can be provided with `tracefile` argument; this
> argument must be any object which has a callable `write` attribute or None.

yes.

> 2. If `ISACaller.tracefile` is None, then we create a temporary file.

*NO*.  please do not do this.  there is no such requirement.

> 3. However, we don't save these temporary files, unless TRACEFILE
> environment variable is set to anything. By default, these files are
> __transient__.

overkill.

> 4. If and only if the user sets TRACEFILE environment variable, the
> tracefiles are kept.

overkill.

> 5. The files have names starting with "trace_" and ending with ".trace".

overkill.

> Everything inbetween obeys tempfile.NamedTemporaryFile rules.

unnecessary.

> 6. test_runner has extended names, starting with either "trace_testXX_" or
> "trace_subtestXX_", where XX stands for global test counter (if launched via
> run_tst) or for test_case unique counter (nmutil.get_test_path.Counter).
> Everything inbetween obeys tempfile.NamedTemporaryFile rules.

complete overkill.

> This way:
> 1. We avoid conditional code which checks whether `self.tracefile` is None
> or not.

there is no requirement to create temporary tracefiles.  if there was
such a requirement i would have mentioned it.

the requirement was added by jacob who is not the project manager.

this task is drastically and brutally simple.  it is being made or is
envisioned to be far more complex than it actually is or needs to be.

absolute bare minimum and extreme simplicity is required - no error-checking
is required (at all) because the output from ISACaller is completely
under our control and is intended as machine-readable input of the absolute
simplest format and level.

Comment 16 Luke Kenneth Casson Leighton 2023-05-16 02:35:50 BST

converted to explicit argument for trace-log-writing

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=140c9b38e3f011ac216ede4b69f25aa9796b93ee

there is no need to focus on ISACaller - at all - for now - until
the inorder.py model has been confirmed with unit tests that it
is reasonably functional.

the inorder.py model as a completely standalone independent program
may be worked on *without* needing a branch because it has absolutely
no connection whatsoever of any kind to any other code.

inorder.py should in no way be modified to import any other code

under no circumstances should any other code be modified to import inorder.py

Comment 17 Andrey Miroshnikov 2023-05-28 21:43:42 BST

Looked at the inorder.py code today. 

Started adding code for decode stage to determine read/write registers.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2a35755d34df6aedb05eac2287fcd2913c6ac957

If this is not at all appropriate, I'll stop working on it.

Comment 18 Andrey Miroshnikov 2023-05-29 12:11:05 BST

It is not obvious in what format 'writeregs' and 'readregs' are supposed to be. Are 'get_input_regs()' and 'get_output_regs()' supposed to produce a set of Hazard objects (with actions 'r' and 'w' respectively)?

I ask this because on line #216, you're using the 'difference_update' method, which is part of a set() object.
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/cyclemodel/inorder.py;hb=999c11ee9159222343945e3850b1e2c4a64a8623#l216


Also, Fetch's 'process_instructions()' has to be run twice initially. First to store the trace into stages[0], and then again for stages[0] to propagate to the Decode (via add_instruction). I'm assuming this is expected behaviour because fetch and decode of the same instruction cannot happen at the same time.

Comment 19 Dmitry Selyutin 2023-06-02 13:57:28 BST

I guess readeregs and writeregs are just like this:

def regfilter(regs, type):
    return filter(lambda reg: reg.type == type, regs)

regs = tuple(...)
readregs = tuple(regfilter(regs, "r"))
writeregs = tuple(regfilter(regs, "w"))

Comment 20 Dmitry Selyutin 2023-06-02 14:13:43 BST

Andrey, I've reassigned this task to you, since you're working on it, any objections?

Comment 21 Andrey Miroshnikov 2023-06-02 18:19:04 BST

(In reply to Dmitry Selyutin from comment #19)
> I guess readeregs and writeregs are just like this:
> 
> def regfilter(regs, type):
>     return filter(lambda reg: reg.type == type, regs)
> 
> regs = tuple(...)
> readregs = tuple(regfilter(regs, "r"))
> writeregs = tuple(regfilter(regs, "w"))

Thanks, my code was a bit more naive (if hazard.action = 'r' return hazard etc.). Will update when I have time.

(In reply to Dmitry Selyutin from comment #20)
> Andrey, I've reassigned this task to you, since you're working on it, any
> objections?

Sure. When I get the model to work, I may need more guidance however.

Comment 22 Andrey Miroshnikov 2023-08-21 17:44:16 BST

Made several changes which now allow the code to run the test case (not yet correct, but at least the code actually runs).

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ed7797d8288cc9701d070a82884ebe74816b0232
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ed7797d8288cc9701d070a82884ebe74816b0232

The results are wrong, and stall isn't occurring, but that's because I
intentionally only pushed the necessary changes to get the code to run.

What is the issue stage really meant to be?
On online examples of a pipelined processor, there's usually only the decode
stage, which performs both decoding and issuing.
In your code comments Luke, you mentioned that the register read/writes should
really be moved to the issue pipeline.

I added a change to the code to ensure the initial fetch stage[0] instruction
isn't wiped by the tick() before it is passed to the decode object (1 clock
later). Similar changes were also made to decode, issue, and execute.

The decode, issue, and execute pipelines also don't run if there's nothing in
their corresponding stage[0] entries (checks for either None or zero-length for
execute).

My understanding was that an instruction must at least remain at each stage for
one clock cycle (fetch -> decode -> issue -> execute), which means there's a
4-cycle latency before any results get out (assuming all insn execute in one
cycle for now).

The code however, allows the instruction to go straight from decode to execute
within the same cycle. I have noticed this, but haven't fixed it yet,
because the execute object's 'add_stage' method seems to add the extra 2 cycle
delay.

You can run the code with -t option to go through the 8 instructions in the unit
test. Obviously doesn't properly work yet, but at least committing what I've
done so far. The output is a table (which will be converted to markdown later),
showing the path of the instruction over time (in clock cycles).

Here's the table for the test case (so far incorrect, no stalls, etc.):

| clk # | fetch | decode | issue | exec |
| 0 | addi 1, 0, 0x0010|       |      |     |
| 1 | addi 2, 0, 0x1234| addi 1, 0, 0x0010| addi 1, 0, 0x0010|     |
| 2 | stw 2, 0(1)| addi 2, 0, 0x1234| addi 2, 0, 0x1234|     |
| 3 | lwz 3, 0(1)| stw 2, 0(1)| stw 2, 0(1)| addi 1, 0, 0x0010|
| 4 | add 1, 3, 2| lwz 3, 0(1)| lwz 3, 0(1)| addi 2, 0, 0x1234|
| 5 | addi 3, 0, 0x1234| add 1, 3, 2| add 1, 3, 2| stw 2, 0(1)|
| 6 | addi 2, 0, 0x4321| addi 3, 0, 0x1234| addi 3, 0, 0x1234| lwz 3, 0(1)|
| 7 | add  1, 3, 2| addi 2, 0, 0x4321| addi 2, 0, 0x4321| add 1, 3, 2|

Comment 23 Andrey Miroshnikov 2023-08-21 17:46:48 BST

(In reply to Andrey Miroshnikov from comment #22)
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=ed7797d8288cc9701d070a82884ebe74816b0232

Missed the second commit:
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=755b918d6dad6fc2d1a2a75beb1fdd39223679c7

Comment 24 Luke Kenneth Casson Leighton 2023-08-21 23:26:10 BST

(In reply to Andrey Miroshnikov from comment #22)


> because the execute object's 'add_stage' method seems

not "seems": DOES.

>  to add the extra 2
> cycle
> delay.

this is 100% correct and 100% intended. do not alter it, fix the
previous stage so that the transfer from decode to issue is on
a 1 cycle delay. that is the source of the error.

later on, DIV will not be 2 cycles, will it?

A DIV operation is variable-length, isn't it?


(In reply to Andrey Miroshnikov from comment #23)

> Missed the second commit:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=755b918d6dad6fc2d1a2a75beb1fdd39223679c7

please revert.

execute must have multiple operations outstanding (and an additional piece of output indicating the countdown)

this is a superscalar architecture.

otherwise when DIV operations take 128 clock cycles to complete, 128 execute columns will be needed, won't they?

Comment 25 Luke Kenneth Casson Leighton 2023-08-21 23:31:06 BST

+        # Flag which indicates if fetch should be pushing insn to decode,
+        # else there won't be an initial one-clock delay between fetch and dec
+        self.pushed_to_decode = False
 
and another one from decode to issue.  the lack of which is the ource of
the duplication problem.

Comment 26 Luke Kenneth Casson Leighton 2023-08-21 23:33:52 BST

 158 
 159     def process_instructions(self, stall):
 160         if len(self.stages) > 0:

deindent 161-173 and invert the above.

 159     def process_instructions(self, stall):
 160         if len(self.stages) == 0:
                  return stall
             ....
             ....


don't waste precious indentation space when there is an 80
char (79 char) limit.

Comment 27 Luke Kenneth Casson Leighton 2023-08-21 23:35:17 BST

 227         if self.stages[0] is not None:

same.

if self.stages[0] is None:
    return Stall
and deindent.

Comment 28 Luke Kenneth Casson Leighton 2023-08-21 23:36:45 BST

 261         if self.stages[0] is not None:

same

Comment 29 Luke Kenneth Casson Leighton 2023-08-21 23:38:41 BST

 303         stall = self.fetch.process_instructions(stall, insn_trace)
 304         stall = self.decode.process_instructions(stall)
 305         stall = self.issue.process_instructions(stall)
 306         stall = self.exe.process_instructions(stall)

if you invert the calling order of these 4 functions you will NOT need
the flags added.  see comment #25 and disregard it, and REMOVE
that flag.

Comment 30 Luke Kenneth Casson Leighton 2023-08-21 23:41:42 BST

(In reply to Andrey Miroshnikov from comment #22)
intentionally only pushed the necessary changes to get the code to run.
 
> What is the issue stage really meant to be?

in a single-issue Scalar core? a dummy placeholder MODELLING the
fact that the HDL issued a single instruction.

i did tell you that all of this is far simpler than everyone tasked
with it has made it out to be.

well done getting to the absolutely critical "iterative feedback"
phase. didn't take much, did it?

Comment 31 Andrey Miroshnikov 2023-08-22 20:59:21 BST

(In reply to Luke Kenneth Casson Leighton from comment #24)
> 
> please revert.
> 

Reverted:
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e99cc80eba4a854a9c98ae3d831f901fe7c5d272



(In reply to Luke Kenneth Casson Leighton from comment #26)
> don't waste precious indentation space when there is an 80
> char (79 char) limit.

Fixed:
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f746c31efca189fb09aec2e661dba686616a846f


(In reply to Luke Kenneth Casson Leighton from comment #29)
> if you invert the calling order of these 4 functions you will NOT need
> the flags added.  see comment #25 and disregard it, and REMOVE
> that flag.

Order inverted:
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=cf1e1acdf0c79d13c478935e195c4c33c4c132a3

Comment 32 Luke Kenneth Casson Leighton 2023-08-23 00:41:52 BST

(In reply to Andrey Miroshnikov from comment #31)

> Reverted:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=e99cc80eba4a854a9c98ae3d831f901fe7c5d272

so sorry, it was fine, i thought what you were doing was altering
the format of the data (which would not be ok)

Comment 33 Luke Kenneth Casson Leighton 2023-08-23 08:57:49 BST

+        if self.stages[0] is None: return stall
+
+        insn, writeregs = self.stages[0]
+        self.cpu.exe.add_instruction(insn, writeregs)

generally if you're going to do that you might as well do:

+        if self.stages[0] is None:
+             return stall
+        insn, writeregs = self.stages[0]
+        self.cpu.exe.add_instruction(insn, writeregs)

personally i quite like the style of putting single-line ifs,
but pep8 bitches about it, and autopep8 will change it, which
gives you some indication of how uncommon it is, and therefore
it's risky from a maintenance perspective
(Principle of Least Surprise)

Comment 34 Luke Kenneth Casson Leighton 2023-09-05 04:22:21 BST

(In reply to Luke Kenneth Casson Leighton from comment #0)

> * TODO: tidy up the corrections to the wiki page

Andrey you need to sort out the corrections made,
read them, tidy them up, and generally get a better
understanding and appreciation that there has been an
InOrder core in soc/ for over 12 months, and provide
links to it that demonstrate that you know this.

as written the page gave the false impression to the
wider world that the InOrder Core was nonexistent,
which is quite serious from the perspective of NLnet
funding and promotion of the EU's trust in NLnet,
and much more besides.

you can find the commit history outlining the corrections
that were made which you need to tidy up, here:

https://git.libre-soc.org/?p=libreriscv.git;a=history;f=3d_gpu/architecture/inorder_model.mdwn;h=e61c7a938eac023889ff579018a374746e18f4c7;hb=773391b0eb427c9d61bef169c8583c9922a1f417

Comment 35 Luke Kenneth Casson Leighton 2023-09-05 04:45:15 BST

(In reply to Andrey Miroshnikov from comment #31)

> (In reply to Luke Kenneth Casson Leighton from comment #29)
> > if you invert the calling order of these 4 functions you will NOT need
> > the flags added.  see comment #25 and disregard it, and REMOVE
> > that flag.
> 
> Order inverted:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=cf1e1acdf0c79d13c478935e195c4c33c4c132a3

did it have the desired effect? as a general rule rather than blindly
do something "as if being ordered/demanded", you should assume that
the person with the experience and knowledge but extreme-limited time
is simply too busy to do anything other than give very terse guidance,
that it is your responsibility to "intelligenrly think through the
missing pieces of what they are trying to say", and to give feedback
that saves that more senior and extremely-busy person less work to do.

so rather than *you* being terse as a "reaction" to me being terse
*by necessity*, you need to be much more verbose, much more responsive,
and say "Order inverted, unit tests showed better/worse results
as shown below, my next planned step is to do XYZ" ok?

also due to timeconstraints as you can see i made a mistake in reading
the diffs (comment #32), so it is critically important that you
actually think "hm is this request making any sense?" and if the unit
tests don't pass for example *don't just "blindly do it", query it!

i.e. you need to work on providing feedback, compensating for the
project manager simply being too busy, and thus you don't end up
down blind alleys that waste your time, nor make their life difficult
by being unresponsive, such that they spend considerable time to provide
long explanations of the expected interactions, such as this one, as
well as spending time having to chase you up.