hardware-cycle-accurate statistics are needed but it will take some effort to add to cavatools. therefore a model needs to be added to ISACaller which allows some basic performance estimates --- * TODO: cavatools power consumption needs to be extracted and implemented in python * TODO: tidy up the corrections to the wiki page
https://libre-soc.org/irclog/%23libre-soc.2023-05-02.log.html#t2023-05-02T10:51:45
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e74dfbf1 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=c8290fce begin first version with a bit of explanation
ghostmansd, these two instructions are where insndb comes into play: first decoding the instruction and then identifying its registers. the register nomenclature will need to be unique: it doesn't have to be a string, it can be a tuple. so add 1,2,3 could be converted to ("GPR", 1) and 2 and 3 or it could be converted to "r1" "r2" "r3" it doesn't matter which as long as, later, back in reads_possible() and writes_possible() it is easy to identify which register is associated with which regfile. 99 # get the read and write regs 100 writeregs = get_input_regs(insn) 101 readregs = get_output_regs(insn) this assumes a *global* regfile which is clearly false. 164 while len(possible) < 3 and len(r) > 0: it is here that the number of *GPRs* has to be less than (e.g.) 3 but if there happens also to be cr0 read then obviously that is permitted *at the same time* as 3 GPR reads, as it is a totally separate regfile. so perhaps a spec of ("GPR", n) and ("CR", m) would be better?
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=10d93c409853eb0c7994b43e13eee0560437d058 split out ISAPages so it can be instantiated without needing entire ISACaller
It seems that I've managed to migrate to new laptop, at least process described at HDL devscripts completed successfully: https://libre-soc.org/HDL_workflow/devscripts/ I'll start checking the code soon.
insndb needs to be extended with support for keeping some information after the assembly process took place. Right now we construct instruction on-the-fly; once the instruction is assembled, we simply have 32 or 64 bits (depending on prefix).
i added a log output incredibbly simple, no need for insndb format: instruction args rGPR=n/offset/elwidth wGPR...
Since I already started changing insndb to make it reusable for any module which needs either assembly or disassembly, I decided to complete these changes. Regardless of the current task, the changes seem to be good for other future needs. These works seem to be completed now: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/insndb-asm-dis. As we discussed, I won't use these changes in scope of this task, unless necessary later (certainly not now).
I'll perhaps revisit the trace format to make it more usable for lookup/split, but at this stage insndb is not needed. Again, the former commits are for future purposes only.
(In reply to Dmitry Selyutin from comment #9) > I'll perhaps revisit the trace format to make it more usable for > lookup/split, yes it should easily be parseable it 2-3 lines, nothing sophisticated needed. i know you like (and importantly understand) regexes, do try to keep it simple :) format "r or w" "target" (regfile name GPR FR CR CRf XER MSR FPSCR SPRf SPRs" or MEM) "reg or memaddr" "elwidth or memwidth" "element offset" (not for rMEM or wMEM) > but at this stage insndb is not needed. Again, the former > commits are for future purposes only. ack appreciated. CR is the whole CR, that will be one port on its own, assume max 1R1W CRf is one 4-bit port, assume 3R3W of these GPR is int regfile assume 4R1W FPR is FP regfile assume 3R1W XER is special assume 1R1W FPSCR is special assume 1R1W SPRf is FAST spr, assume 3R3W SPRs is SLOW spr, assume 1R1W MEM ignore for now. so here: 175 while len(possible) < 3 and becomes: while possible_cr < 3 and while possible_gpr < 4 etc. i.e. the instruction can only proceed once all regs have been read, but there is limited ports so it can take several cycles, but each one read you can take it out the set. when the set is empty then all operands are read, it can issue. real simple.
Jacob, Luke, I've re-checked the discussion[0][1], and came to conclusion that you both have valid points, so I decided to go with something intermediate. The branch tracefile contains the following functionality: 1. Every ISACaller instance can be provided with `tracefile` argument; this argument must be any object which has a callable `write` attribute or None. 2. If `ISACaller.tracefile` is None, then we create a temporary file. 3. However, we don't save these temporary files, unless TRACEFILE environment variable is set to anything. By default, these files are __transient__. 4. If and only if the user sets TRACEFILE environment variable, the tracefiles are kept. 5. The files have names starting with "trace_" and ending with ".trace". Everything inbetween obeys tempfile.NamedTemporaryFile rules. 6. test_runner has extended names, starting with either "trace_testXX_" or "trace_subtestXX_", where XX stands for global test counter (if launched via run_tst) or for test_case unique counter (nmutil.get_test_path.Counter). Everything inbetween obeys tempfile.NamedTemporaryFile rules. This way: 1. We avoid conditional code which checks whether `self.tracefile` is None or not. 2. We can turn tracefiles on/off via environment (I think it'd be a bad idea to create these by default). 3. Unique per-test and per-subtest trace files are still created. [0] https://libre-soc.org/irclog/%23libre-soc.2023-05-13.log.html#t2023-05-13T19:56:50 [1] https://libre-soc.org/irclog/%23libre-soc.2023-05-14.log.html#t2023-05-14T06:17:45
I'll wait for CI, because ISACaller is a common code, so I don't want to end up in a situation when I broke something. Please let me know your opinions and suggestions; feel free to give this functionality a shot. Here's what I had for the first three tests in test_caller.py: $ TRACEFILE=true python3 src/openpower/decoder/isa/test_caller.py | grep tracefile tracefile /tmp/trace_test0_iap01fbe.trace (permanent) tracefile /tmp/trace_test1__ee5vb3b.trace (permanent) tracefile /tmp/trace_test2_s3xvvg2d.trace (permanent) $ cat /tmp/trace_test0_iap01fbe.trace add 1, 3, 2 rGPR:3.0/64 rGPR:2.0/64 wGPR:1.0/64 $ cat /tmp/trace_test1__ee5vb3b.trace addi 3, 0, 0x1234 rGPR:0.0/64 wGPR:3.0/64 addi 2, 0, 0x4321 rGPR:0.0/64 wGPR:2.0/64 add 1, 3, 2 rGPR:3.0/64 rGPR:2.0/64 wGPR:1.0/64 $ cat /tmp/trace_test2_s3xvvg2d.trace addi 1, 0, 0x0010 rGPR:0.0/64 wGPR:1.0/64 addi 2, 0, 0x1234 rGPR:0.0/64 wGPR:2.0/64 stw 2, 0(1) rGPR:1.0/64 rGPR:2.0/64 lwz 3, 0(1) rGPR:1.0/64 wGPR:3.0/64
this needs to be *brutally* simple. we have full control over the format and it is auto-generated. therefore there is absolutely no need for any error-checking of any kind, and as it is single-purpose there is absolutely no need whatsoever for dataclasses, classes, or anything more complex than lists tuples and dicts. even namedtuple is a stretch. most of the work here actually needs to go into the unit tests, and making sure that the model is "correct", followed by adding in power-estimates (which will be done later). https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=365af65074431ccba0483dbd6cf618d135421fd6 added namedtuple for the hazards (nothing more complex needed) +# trace file entries are lists of these. +Hazards = namedtuple("Hazards", ["action", "target", "ident", "offs", "elwid"]) added profiles (as a dict - nothing more complex needed) +# key: readport, writeport (per clock cycle) +HazardProfiles = { + "GPR": (4, 1), # GPR allows 4 reads 1 write possible in 1 cycle... + "FPR": (3, 1), added read_file function: nothing more complex needed, no error-checking required +def read_file(fname): + """reads a trace file in the format "[r:FILE:regnum:offset:width]* # insn" + """ + ...
(In reply to Dmitry Selyutin from comment #12) > I'll wait for CI, because ISACaller is a common code, so I don't want to end > up in a situation when I broke something. it is a little early to focus on the output that ISACaller creates. the primary focus is on the inorder.py model code, using independent unit tests that have absolutely no direct connection whatsover to ISACaller. however when it is done there should be no use of temp files (at all), under any circumstances. there should be no presence of temp files, no attempt by ISACaller to create temp files, at all. ISACaller should explicitly be given an input (which is either None, a filename, or an object with a "write" function) and should detect that. if the argument is None then absolutely nothing happens. in this way you can easily ensure that you do absolutely no damage because under no circumstances should you have modified any code in any *existing* unit tests or anywhere at all in existing functionality that sets that argument. and thus a branch is *NOT* necessary, at all.
(In reply to Dmitry Selyutin from comment #11) > Jacob, Luke, I've re-checked the discussion[0][1], and came to conclusion > that you both have valid points, so I decided to go with something > intermediate. The branch tracefile contains the following functionality: there should be no branch. it is unnecessary. > 1. Every ISACaller instance can be provided with `tracefile` argument; this > argument must be any object which has a callable `write` attribute or None. yes. > 2. If `ISACaller.tracefile` is None, then we create a temporary file. *NO*. please do not do this. there is no such requirement. > 3. However, we don't save these temporary files, unless TRACEFILE > environment variable is set to anything. By default, these files are > __transient__. overkill. > 4. If and only if the user sets TRACEFILE environment variable, the > tracefiles are kept. overkill. > 5. The files have names starting with "trace_" and ending with ".trace". overkill. > Everything inbetween obeys tempfile.NamedTemporaryFile rules. unnecessary. > 6. test_runner has extended names, starting with either "trace_testXX_" or > "trace_subtestXX_", where XX stands for global test counter (if launched via > run_tst) or for test_case unique counter (nmutil.get_test_path.Counter). > Everything inbetween obeys tempfile.NamedTemporaryFile rules. complete overkill. > This way: > 1. We avoid conditional code which checks whether `self.tracefile` is None > or not. there is no requirement to create temporary tracefiles. if there was such a requirement i would have mentioned it. the requirement was added by jacob who is not the project manager. this task is drastically and brutally simple. it is being made or is envisioned to be far more complex than it actually is or needs to be. absolute bare minimum and extreme simplicity is required - no error-checking is required (at all) because the output from ISACaller is completely under our control and is intended as machine-readable input of the absolute simplest format and level.
converted to explicit argument for trace-log-writing https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=140c9b38e3f011ac216ede4b69f25aa9796b93ee there is no need to focus on ISACaller - at all - for now - until the inorder.py model has been confirmed with unit tests that it is reasonably functional. the inorder.py model as a completely standalone independent program may be worked on *without* needing a branch because it has absolutely no connection whatsoever of any kind to any other code. inorder.py should in no way be modified to import any other code under no circumstances should any other code be modified to import inorder.py
Looked at the inorder.py code today. Started adding code for decode stage to determine read/write registers. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=2a35755d34df6aedb05eac2287fcd2913c6ac957 If this is not at all appropriate, I'll stop working on it.
It is not obvious in what format 'writeregs' and 'readregs' are supposed to be. Are 'get_input_regs()' and 'get_output_regs()' supposed to produce a set of Hazard objects (with actions 'r' and 'w' respectively)? I ask this because on line #216, you're using the 'difference_update' method, which is part of a set() object. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/cyclemodel/inorder.py;hb=999c11ee9159222343945e3850b1e2c4a64a8623#l216 Also, Fetch's 'process_instructions()' has to be run twice initially. First to store the trace into stages[0], and then again for stages[0] to propagate to the Decode (via add_instruction). I'm assuming this is expected behaviour because fetch and decode of the same instruction cannot happen at the same time.
I guess readeregs and writeregs are just like this: def regfilter(regs, type): return filter(lambda reg: reg.type == type, regs) regs = tuple(...) readregs = tuple(regfilter(regs, "r")) writeregs = tuple(regfilter(regs, "w"))
Andrey, I've reassigned this task to you, since you're working on it, any objections?
(In reply to Dmitry Selyutin from comment #19) > I guess readeregs and writeregs are just like this: > > def regfilter(regs, type): > return filter(lambda reg: reg.type == type, regs) > > regs = tuple(...) > readregs = tuple(regfilter(regs, "r")) > writeregs = tuple(regfilter(regs, "w")) Thanks, my code was a bit more naive (if hazard.action = 'r' return hazard etc.). Will update when I have time. (In reply to Dmitry Selyutin from comment #20) > Andrey, I've reassigned this task to you, since you're working on it, any > objections? Sure. When I get the model to work, I may need more guidance however.
Made several changes which now allow the code to run the test case (not yet correct, but at least the code actually runs). https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ed7797d8288cc9701d070a82884ebe74816b0232 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ed7797d8288cc9701d070a82884ebe74816b0232 The results are wrong, and stall isn't occurring, but that's because I intentionally only pushed the necessary changes to get the code to run. What is the issue stage really meant to be? On online examples of a pipelined processor, there's usually only the decode stage, which performs both decoding and issuing. In your code comments Luke, you mentioned that the register read/writes should really be moved to the issue pipeline. I added a change to the code to ensure the initial fetch stage[0] instruction isn't wiped by the tick() before it is passed to the decode object (1 clock later). Similar changes were also made to decode, issue, and execute. The decode, issue, and execute pipelines also don't run if there's nothing in their corresponding stage[0] entries (checks for either None or zero-length for execute). My understanding was that an instruction must at least remain at each stage for one clock cycle (fetch -> decode -> issue -> execute), which means there's a 4-cycle latency before any results get out (assuming all insn execute in one cycle for now). The code however, allows the instruction to go straight from decode to execute within the same cycle. I have noticed this, but haven't fixed it yet, because the execute object's 'add_stage' method seems to add the extra 2 cycle delay. You can run the code with -t option to go through the 8 instructions in the unit test. Obviously doesn't properly work yet, but at least committing what I've done so far. The output is a table (which will be converted to markdown later), showing the path of the instruction over time (in clock cycles). Here's the table for the test case (so far incorrect, no stalls, etc.): | clk # | fetch | decode | issue | exec | | 0 | addi 1, 0, 0x0010| | | | | 1 | addi 2, 0, 0x1234| addi 1, 0, 0x0010| addi 1, 0, 0x0010| | | 2 | stw 2, 0(1)| addi 2, 0, 0x1234| addi 2, 0, 0x1234| | | 3 | lwz 3, 0(1)| stw 2, 0(1)| stw 2, 0(1)| addi 1, 0, 0x0010| | 4 | add 1, 3, 2| lwz 3, 0(1)| lwz 3, 0(1)| addi 2, 0, 0x1234| | 5 | addi 3, 0, 0x1234| add 1, 3, 2| add 1, 3, 2| stw 2, 0(1)| | 6 | addi 2, 0, 0x4321| addi 3, 0, 0x1234| addi 3, 0, 0x1234| lwz 3, 0(1)| | 7 | add 1, 3, 2| addi 2, 0, 0x4321| addi 2, 0, 0x4321| add 1, 3, 2|
(In reply to Andrey Miroshnikov from comment #22) > > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=ed7797d8288cc9701d070a82884ebe74816b0232 Missed the second commit: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=755b918d6dad6fc2d1a2a75beb1fdd39223679c7
(In reply to Andrey Miroshnikov from comment #22) > because the execute object's 'add_stage' method seems not "seems": DOES. > to add the extra 2 > cycle > delay. this is 100% correct and 100% intended. do not alter it, fix the previous stage so that the transfer from decode to issue is on a 1 cycle delay. that is the source of the error. later on, DIV will not be 2 cycles, will it? A DIV operation is variable-length, isn't it? (In reply to Andrey Miroshnikov from comment #23) > Missed the second commit: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=755b918d6dad6fc2d1a2a75beb1fdd39223679c7 please revert. execute must have multiple operations outstanding (and an additional piece of output indicating the countdown) this is a superscalar architecture. otherwise when DIV operations take 128 clock cycles to complete, 128 execute columns will be needed, won't they?
+ # Flag which indicates if fetch should be pushing insn to decode, + # else there won't be an initial one-clock delay between fetch and dec + self.pushed_to_decode = False and another one from decode to issue. the lack of which is the ource of the duplication problem.
158 159 def process_instructions(self, stall): 160 if len(self.stages) > 0: deindent 161-173 and invert the above. 159 def process_instructions(self, stall): 160 if len(self.stages) == 0: return stall .... .... don't waste precious indentation space when there is an 80 char (79 char) limit.
227 if self.stages[0] is not None: same. if self.stages[0] is None: return Stall and deindent.
261 if self.stages[0] is not None: same
303 stall = self.fetch.process_instructions(stall, insn_trace) 304 stall = self.decode.process_instructions(stall) 305 stall = self.issue.process_instructions(stall) 306 stall = self.exe.process_instructions(stall) if you invert the calling order of these 4 functions you will NOT need the flags added. see comment #25 and disregard it, and REMOVE that flag.
(In reply to Andrey Miroshnikov from comment #22) intentionally only pushed the necessary changes to get the code to run. > What is the issue stage really meant to be? in a single-issue Scalar core? a dummy placeholder MODELLING the fact that the HDL issued a single instruction. i did tell you that all of this is far simpler than everyone tasked with it has made it out to be. well done getting to the absolutely critical "iterative feedback" phase. didn't take much, did it?
(In reply to Luke Kenneth Casson Leighton from comment #24) > > please revert. > Reverted: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e99cc80eba4a854a9c98ae3d831f901fe7c5d272 (In reply to Luke Kenneth Casson Leighton from comment #26) > don't waste precious indentation space when there is an 80 > char (79 char) limit. Fixed: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f746c31efca189fb09aec2e661dba686616a846f (In reply to Luke Kenneth Casson Leighton from comment #29) > if you invert the calling order of these 4 functions you will NOT need > the flags added. see comment #25 and disregard it, and REMOVE > that flag. Order inverted: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=cf1e1acdf0c79d13c478935e195c4c33c4c132a3
(In reply to Andrey Miroshnikov from comment #31) > Reverted: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=e99cc80eba4a854a9c98ae3d831f901fe7c5d272 so sorry, it was fine, i thought what you were doing was altering the format of the data (which would not be ok)
+ if self.stages[0] is None: return stall + + insn, writeregs = self.stages[0] + self.cpu.exe.add_instruction(insn, writeregs) generally if you're going to do that you might as well do: + if self.stages[0] is None: + return stall + insn, writeregs = self.stages[0] + self.cpu.exe.add_instruction(insn, writeregs) personally i quite like the style of putting single-line ifs, but pep8 bitches about it, and autopep8 will change it, which gives you some indication of how uncommon it is, and therefore it's risky from a maintenance perspective (Principle of Least Surprise)
(In reply to Luke Kenneth Casson Leighton from comment #0) > * TODO: tidy up the corrections to the wiki page Andrey you need to sort out the corrections made, read them, tidy them up, and generally get a better understanding and appreciation that there has been an InOrder core in soc/ for over 12 months, and provide links to it that demonstrate that you know this. as written the page gave the false impression to the wider world that the InOrder Core was nonexistent, which is quite serious from the perspective of NLnet funding and promotion of the EU's trust in NLnet, and much more besides. you can find the commit history outlining the corrections that were made which you need to tidy up, here: https://git.libre-soc.org/?p=libreriscv.git;a=history;f=3d_gpu/architecture/inorder_model.mdwn;h=e61c7a938eac023889ff579018a374746e18f4c7;hb=773391b0eb427c9d61bef169c8583c9922a1f417
(In reply to Andrey Miroshnikov from comment #31) > (In reply to Luke Kenneth Casson Leighton from comment #29) > > if you invert the calling order of these 4 functions you will NOT need > > the flags added. see comment #25 and disregard it, and REMOVE > > that flag. > > Order inverted: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=cf1e1acdf0c79d13c478935e195c4c33c4c132a3 did it have the desired effect? as a general rule rather than blindly do something "as if being ordered/demanded", you should assume that the person with the experience and knowledge but extreme-limited time is simply too busy to do anything other than give very terse guidance, that it is your responsibility to "intelligenrly think through the missing pieces of what they are trying to say", and to give feedback that saves that more senior and extremely-busy person less work to do. so rather than *you* being terse as a "reaction" to me being terse *by necessity*, you need to be much more verbose, much more responsive, and say "Order inverted, unit tests showed better/worse results as shown below, my next planned step is to do XYZ" ok? also due to timeconstraints as you can see i made a mistake in reading the diffs (comment #32), so it is critically important that you actually think "hm is this request making any sense?" and if the unit tests don't pass for example *don't just "blindly do it", query it! i.e. you need to work on providing feedback, compensating for the project manager simply being too busy, and thus you don't end up down blind alleys that waste your time, nor make their life difficult by being unresponsive, such that they spend considerable time to provide long explanations of the expected interactions, such as this one, as well as spending time having to chase you up.