a visitor-walker of the instruction database is needed. 1. 1st priority is disassembly (binary-to-{visitor}) 2. 2nd priority is assembly (asciiinstruction-to-{visitor}) * TODO read https://medium.com/design-patterns-in-python/visitor-pattern-b9227759d6be * TODO evaluate StreamBasedExpressionVisitor https://pyomo.readthedocs.io/en/stable/_modules/pyomo/core/expr/visitor.html * TODO a drop-in replacement for PowerDecoder (bug #737) is a good candidate. * TODO also ISACaller needs a *separate* decoder that is not yield-based * TODO c-based Power ISA decoder bug #979 * TODO c-based Power ISA instructions bug #980 * TODO ISACaller parser (instruction-walking) bug #NNNN * TODO ... * TODO ... links: * https://libre-soc.org/irclog/%23libre-soc.2023-06-01.log.html#t2023-06-01T15:00:46 * https://libre-soc.org/irclog/%23libre-soc.2023-05-29.log.html#t2023-05-29T22:11:41 * https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager * https://docs.python.org/3/library/html.parser.html * https://github.com/realistschuckle/pyvisitor/blob/master/src/visitor.py * https://github.com/python/cpython/blob/2.7/Lib/ast.py * https://github.com/python/cpython/blob/2.7/Lib/HTMLParser.py <programmerjake> with visitor.db(db): <programmerjake> for insn in db: <programmerjake> with visitor.insn(insn): <programmerjake> for operand in insn: <lkcl> class MyInsnDBParser: <lkcl> def handle_db_start(self, ....) <lkcl> def handle_db_end(self, ....) <lkcl> def handle_record_start(self, ....) <lkcl> def handle_record_end(self, ....) <lkcl> if hasattr(visitor, "handle_db"): visitor->handle_db(db) <lkcl> visitor->handle_db_start(insn) <lkcl> for each insn do <lkcl> visitor->handle_insn_start(insn) <lkcl> for each record in insn do <lkcl> visitor->handle_record(record) <lkcl> visitor->handle_insn_end(insn) <lkcl> visitor->handle_db_end(insn)
an idea i had for making it easier -- use with and contextmanager: def visit(db, visitor): with visitor.db(db): for i, insn in enumerate(db): with visitor.insn(i, insn): for i, operand in enumerate(insn): visitor.operand(i, operand) class MyDemoVisitor: @contextmanager def db(self, db): print("before insns") yield print("after insns") @contextmanager def insn(self, insn_index, insn): print("before operands") yield print("after operands") def operand(self, operand_index, operand): print("operand") generates output like: before insns before operands operand operand after operands before operands operand operand operand after operands after insns
https://pyomo.readthedocs.io/en/stable/_modules/pyomo/core/expr/visitor.html this is really good (StreamBasedExpressionVisitor). it's generic, and looks dead-easy to use. i will create a "visitor_test" branch adding that and see what it does
https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/visitor_test ok added, and "fixed" some of the unit tests (as best i can). i had to create dummy classes, they're not perfect, but seem to convince (most of) the unit tests that "all is good". the idea that occurred to me is to actually add these functions (beforeChild etc.) *actually to the insndb classes*. the reason for that is that the context about "what to do" is actually maintained *in* the class itself.
(In reply to Luke Kenneth Casson Leighton from comment #3) > the idea that occurred to me is to actually add these functions > (beforeChild etc.) *actually to the insndb classes*. part of that will be: "what sub-things do you want me to call a visitor on?"
I checked the visitor branch, and I have some thoughts to share. 1. If we take a 3rd-party code, please keep it as is. Yes this includes formatting. No need to adopt. 2. I don't really like what I see in pyomo. It seems that the overall implementation is bloated, complex and too pyomo-way-centric (unsurprisingly). This seems way more complex than it should've been. 3. The APIs you refer to (beforeChild, afterChild, etc.) are actually deprecated. What I like, in contrast: 1. Something dead simple as html.parser visitors. Yes they are not as generic as pyomo, but there's no need to be generic in this exact case. We handle insns, not anything in the world. 2. The html.parser approach can be augmented with contextlib, like Jacob suggested. I have to check and play around the code, because I rarely used context managers before. Now on the order of the subtasks. I'd start with binutils. I know you already tired of this never-ending-journey, but here's the rationale. 1. binutils code must be updated anyway with respect to assembly (and to lesser degree disassembly). You remember that I started binutils when we had the old assembler, so binutils assembly follows old assembler's principles. Disassembler, in contrast, relies much more on autogenerated code; but still not enough. Too much manually-written code. With visitors, we can generate much more code to avoid manual synchronization between these code bases. 2. Despite that we added missing specifiers, we never had a chance to sync and especially check whether the manually written code works everywhere (quite unlikely). Fixing these scenarios, especially for assembly, will simply waste the time unless we rely more on the generated code. 3. binutils is the most active insndb user, and I know its structure (almost) perfectly. 4. If implemented carefully, the same principles from C-generator-visitors (encoding/decoding) can be reused for C-based Power ISA decoder. Perhaps even to the point that we can generate the same code and use it in different places. 5. It's a good candidate considering the size (about 700 LOC). In an ideal world, even assembly/disassembly *inside* insndb could be implemented via visitors. However, this is too fragile and complex target to start with. The C-generator-visitors are much better targets, and binutils seems to be the best to sharpen a skill and test the overall approach. What do you think?
(In reply to Dmitry Selyutin from comment #5) > I checked the visitor branch, and I have some thoughts to share. > 1. If we take a 3rd-party code, please keep it as is. Yes this includes > formatting. No need to adopt. it's too big - normally i would agree. so i had to cut it back > 2. I don't really like what I see in pyomo. It seems that the overall > implementation is bloated, complex and too pyomo-way-centric not surprising > (unsurprisingly). This seems way more complex than it should've been. > 3. The APIs you refer to (beforeChild, afterChild, etc.) are actually > deprecated. i didn't actually look closely! just "got it running" > What I like, in contrast: > 1. Something dead simple as html.parser visitors. Yes they are not as > generic as pyomo, but there's no need to be generic in this exact case. We > handle insns, not anything in the world. i don't mind that at all. > 2. The html.parser approach can be augmented with contextlib, like Jacob > suggested. I have to check and play around the code, because I rarely used > context managers before. ack > Now on the order of the subtasks. I'd start with binutils. because you know it well
I've almost completed a hierarchy reorganization for insndb. Now it's really insndb: $ ls src/openpower/insndb/ __init__.py __pycache__ asm.py dis.py types.py pysvp64.py -> asm.py pysvp64dis.py -> dis.py power_insn.py -> types.py This is done in order to keep structure clean and obvious. The setuup.py script is updated as well. Visitors will reside here too. Next, I'll implement some script which simply dumps the instruction database based on visitors: it came to my mind that this is much simpler to do for starters instead of binutils. Since this change might affect other uses outside of openpower-isa, so this will be pushed to branch and tested with CI (thanks again, Jacob, I cannot even stress how grateful I am).
(In reply to Dmitry Selyutin from comment #7) > Since this change might affect other uses outside of openpower-isa, so this > will be pushed to branch and tested with CI (thanks again, Jacob, I cannot > even stress how grateful I am). well, technically openpower-isa ci just runs pytest in openpower-isa, having it run more tests is something we should probably put in an auto-updated integration repo with submodules for all our repos.
(In reply to Dmitry Selyutin from comment #7) > I've almost completed a hierarchy reorganization for insndb. Now it's really > insndb: > $ ls src/openpower/insndb/ > __init__.py __pycache__ asm.py dis.py types.py > > pysvp64.py -> asm.py > pysvp64dis.py -> dis.py > power_insn.py -> types.py db.py instead of types.py? "insn/db.py" "insn/asm.py"? maybe types.py works better, dunno. > This is done in order to keep structure clean and obvious. The setuup.py > script is updated as well. Visitors will reside here too. suggest a separate "visitor.py" file. > Next, I'll > implement some script which simply dumps the instruction database based on > visitors: it came to my mind that this is much simpler to do for starters > instead of binutils. yes very much so, iirc it's kinda "the canonical thing to do" with visitor-walkers. insndb/asciidump.py? insndb/print.py? btw if you want a hint on doing indentation: the "child_begin" function increments "self.tabdebth", and "child_end" function decrements it. real simple.
(In reply to Luke Kenneth Casson Leighton from comment #9) > > db.py instead of types.py? "insn/db.py" "insn/asm.py"? Ah sorry, I totally missed this. Well, I also thought about db.py, but came to conclusion that db.py module would provide a special script itself. Now it's in repository: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=78231140441402bec1dedbe4e97238cb9597b96f This commit provides pysvp64db script, which now can list all instructions. I'll play more with this script and its possible commands. > maybe types.py works better, dunno. I was thinking about core.py or something like this. But for now I'm keeping types.py to concentrate on visitors. > suggest a separate "visitor.py" file. Perhaps yes, if we'll have some concrete visitor implementations to be reused. For now I just added a "default" Visitor class into types.py, and that, combined with approach suggested by Jacob, actually was already sufficient to write a simple script which dumps all instructions. Another part of "visitors" is that we actually have more stuff to be "visited". Not only the database itself, but, for example, all possible fields in RM. Or all possible specifiers. And so on. But at this point it seems that all these chunks might be just separate context managers (like with db and record methods). > insndb/asciidump.py? insndb/print.py? > > btw if you want a hint on doing indentation: the "child_begin" function > increments "self.tabdebth", and "child_end" function decrements it. > > real simple. I think this should be handled in concrete visitors: not all of them need tabulation. Perhaps the inherited visitor could simply track this on their own? I'll think about it when I'll move to binutils.
A new command is available: pysvp64db opcodes INSN. Example: $ pysvp64db opcodes svshape 010110---------------0000-011001 010110---------------0001-011001 010110---------------0010-011001 010110---------------0011-011001 010110---------------0100-011001 010110---------------0101-011001 010110---------------0110-011001 010110---------------0111-011001 010110---------------1010-011001 010110---------------1011-011001 010110---------------1100-011001 010110---------------1101-011001 010110---------------1110-011001 010110---------------1111-011001 $ pysvp64db opcodes svshape2 010110---------------100--011001
(In reply to Dmitry Selyutin from comment #10) > (In reply to Luke Kenneth Casson Leighton from comment #9) > > > > db.py instead of types.py? "insn/db.py" "insn/asm.py"? > > Ah sorry, I totally missed this. Well, I also thought about db.py, i ran *immediately* into import problems due to common-names "import dis", and the same would happen on "import types". so *temporarily* i invented some names commit aaaecc47a8bbe21bc047e589e34bec6e18f928f6 (HEAD -> master, origin/master) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Jun 3 12:47:30 2023 +0100 using names of modules that are identical to commonly-used python modules (even at the leaf-node) is causing import problems commit 9546d80ef122b4f53b61dc1f3997a2ad70c139bd Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sat Jun 3 12:45:35 2023 +0100 import dis overloads naming of modules already in python3, and stops functions from importing > but came > to conclusion that db.py module would provide a special script itself. Now > it's in repository: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=78231140441402bec1dedbe4e97238cb9597b96f like it > This commit provides pysvp64db script, which now can list all instructions. > I'll play more with this script and its possible commands. > > > maybe types.py works better, dunno. > > I was thinking about core.py or something like this. But for now I'm keeping > types.py to concentrate on visitors. > > > suggest a separate "visitor.py" file. > > Perhaps yes, if we'll have some concrete visitor implementations to be > reused. For now I just added a "default" Visitor class into types.py, and > that, combined with approach suggested by Jacob, actually was already > sufficient to write a simple script which dumps all instructions. > Another part of "visitors" is that we actually have more stuff to be > "visited". Not only the database itself, but, for example, all possible > fields in RM. aaabsolutely. it will extend down further. > or all possible specifiers. And so on. But at this point it > seems that all these chunks might be just separate context managers (like > with db and record methods). with Visitor.record.rm.extra(blah) > I think this should be handled in concrete visitors: not all of them need > tabulation. ok so we are basically subdividing into Model-View-Controller here. i *think* - i am not certain - that you might be conflating/merging two of those two together. > Perhaps the inherited visitor could simply track this on their > own? I'll think about it when I'll move to binutils. the idea of MVC is that Visitors are passed *both* the data *and* "what to do on it". the "what to do on it" has an API (child_before, child, child_end) which the *Visitor* knows about. the idea of "tabulation" is *not* for "All Visitors". it's for *that one Visitor* to keep, as a *local variable*, the "current tab depth". def visit(db, visitor): with visitor.db(db): for i, insn in enumerate(db): with visitor.insn(i, insn): for i, operand in enumerate(insn): visitor.operand(i, operand) class MyDemoVisitor: def __init__(self): self.tab = 0 @property def tabs(self): return " " * 4 * self.tab @contextmanager def db(self, db): self.tab += 1 print(self.tabs + "before insns") yield print(self.tabs + "after insns") self.tab += 1 @contextmanager def insn(self, insn_index, insn): self.tab += 1 print(self.tabs + "before operands") yield print(self.tabs + "after operands") self.tab += 1 def operand(self, operand_index, operand): self.tab += 1 print(self.tabs + "operand") self.tab -= 1 x = MyDemoVisitor() visit([["op1", "op2"], ["insn1_op2", "insn2_op2"]], x) output: (note extra indent, this is because self.tab increase *before* print) ``` before insns before operands operand 0 op1 operand 1 op2 after operands before operands operand 0 insn1_op2 operand 1 insn2_op2 after operands after insns ```
Created attachment 191 [details] visitor demo
Luke, could you, please, clarify, where and how you observe importing issues? I don't quite get the names collision here since I import stuff like `from openpower.insndb.dis` and haven't got any problems.
As for tabs, that's exactly what I meant: tabs go to concrete visitor implementation.
(In reply to Dmitry Selyutin from comment #14) > Luke, could you, please, clarify, where and how you observe importing > issues? I don't quite get the names collision here since I import stuff like > `from openpower.insndb.dis` and haven't got any problems. $ history | more ... 1787 python3 insndb/db.py ... (python3.7 - the project default version) note that's from that directory (In reply to Dmitry Selyutin from comment #15) > As for tabs, that's exactly what I meant: tabs go to concrete visitor > implementation. yes. and as an example it's a good one, because it shows that it's the visitor that contains its own context. (we need examples)
(In reply to Luke Kenneth Casson Leighton from comment #16) > (In reply to Dmitry Selyutin from comment #14) > > > Luke, could you, please, clarify, where and how you observe importing > > issues? I don't quite get the names collision here since I import stuff like > > `from openpower.insndb.dis` and haven't got any problems. > > $ history | more > ... > 1787 python3 insndb/db.py > ... I'd say that's not the intended usage anyway: the module is used as pysvp64db script (cf. setup.py). Also, Python has _so_ many modules that, regardless of the name used, you end up with collision anyway (unless perhaps you drop directories and use underscore in names instead). I don't see dbtypes as clear name. But neither is types. How about core?
Also, a corresponding PEP acknowledges the issue: https://peps.python.org/pep-0328/#rationale-for-absolute-imports So, TL;DR: there's always a chance a naming collision happens since we don't control which modules Python brings with a new versions. In order to solve this, there are correct import statements (which are used throughout our code base). So no need for this change other than "types.py simply doesn't fit this module well, its contents are not limited to types". So I suggest renaming the module to core.py, but never _ever_ bother with "donnerwetter we used standard module name" problem (as long as we import correctly). Objections?
I see that dis got renamed too. I like dis name, it's used as "something completely opposite to asm", it's present in test and corresponding executable script. Again, it's not a problem as long as the correct imports are used (and setup develop does things right, compared to python3 src/openpower/insndb/dis.py command). But the behavior of a shitty interpreter, which puts the module's directory first in imports, is not our issue. The module is either used with correct imports or as standalone script, in which case it must be installed. After all, we don't get ashamed when some code runs correctly only after it's installed (example: binaries which are installed with libraries, and cannot be run unless rpath is changed or installed to well-known places the linker is aware of). Please rollback these changes, that's not the intended use scenario.
OK I rolled them back so that I could continue developing, and renamed types into core.
Introduced operands command: $ SILENCELOG=true pysvp64db operands add. RT RA RB OE=0 Rc=1 Notice the completely redundant and useless SILENCELOG incantation here. I'll add the same hack we did to pysvp64dis (setenv), but still: do we have any plance to switch to logging module instead of prints?
(In reply to Dmitry Selyutin from comment #20) > OK I rolled them back so that I could continue developing, and renamed types > into core. dmitry! please don't make that kind of decision without an acknowledgement! i now cannot do any work! i am trying to track down the RA_OR_ZERO bug and have found it's not RA_OR_ZERO at all, but now have to terminate working on this. i know "dis" is short - i like it as well, because it matches with "asm" which is also 3 letters. but if there's a serious problem on global namespace clashes those names have to be considered "off-limits" generally, function/object names and keywords such as "type" are also off-limits. lkcl@fizzy:~/src/libresoc/openpower-isa/src/openpower$ python3 insndb/db.py Traceback (most recent call last): File "insndb/db.py", line 8, in <module> from openpower.insndb.core import ( File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 4, in <module> import dataclasses as _dataclasses File "/usr/lib/python3.7/dataclasses.py", line 5, in <module> import inspect File "/usr/lib/python3.7/inspect.py", line 35, in <module> import dis File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/dis.py", line 9, in <module> from openpower.insndb.core import ( ImportError: cannot import name 'Style' from 'openpower.insndb.core' (/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py)
Luke, please just use pysvp64db instead, as mentioned in comments above. I gave the rationale before reverting this, it's just the misuse. $(setup.py develop) installs these scripts correctly. Using them from source tree is not valid use case. https://bugs.libre-soc.org/show_bug.cgi?id=1094#c18 https://bugs.libre-soc.org/show_bug.cgi?id=1094#c19
One might argue that this is issue related to implementation (Python should not bring module's internals atop of sys.path list); I personally think so. However, Python has tools and guidelines for doing this correctly. Our setup.py also does the correct thing and syncs the script path too. So I suggest using the script as intended by PEP and setup.py and overall logic.
(In reply to Dmitry Selyutin from comment #24) > One might argue that this is issue related to implementation (Python should > not bring module's internals atop of sys.path list); they likely feel they had to get it to the front of sys.path so as not to have user-imports cause problems. if it is at the back it is still a problem but one that makes the user's code break (because when using the module they get the *system* module) rather than "python wudnt import mi fing" errors which look like *python* is broken. either way it's still a problem, but one way avoids the python devs from getting blamed for it. > I personally think so. > However, Python has tools and guidelines for doing this correctly. Our > setup.py also does the correct thing and syncs the script path too. So I > suggest using the script as intended by PEP and setup.py and overall logic. this i feel would be reasonable if it wasn't for running the module itself as a way to check for syntax errors and import errors - on a standalone basis. i actually do this, and running a program (/usr/local/bin/...) which imports that script isn't the same thing. also under some circumstances, to put unit tests in. example: insndb/db.py: import unittest class Visitor: etc # create some Visitor unit tests class unittest.Blah(....) ... # the main import entry-point for the script pysvp64db def main(): .... if __name__ == '__main__': unittest.main() now that would be a good reason to run that script directly: to fire the unit tests for it, which as long as they are not massive, and are more like "use-cases", i feel are kinda ok to have right next to the actual class itself. do in this case we need to put (small, demo) unit tests in there? unlikely. will i try running the script to check syntax-errors and import-errors when writing it? yes.
Unlike to docs where this is fine for documenting the classes and routines I have a strong vomit reflex for putting tests in the place the code lives, especially when tests are mixed with the code. This just complicates the major case where everything works and you just want to get how it works, and rarely help when something doesn't work and you want to know why (because no test can replace a debugging, even via prints). Also, I'd rather had this: if __name__ == "__main__": main() I find it exceptionally confusing that stuff under "main" executes something other than main function. And it's especially strange if the script called as standalone executable does a totally different thing than the same script but called via python. This doesn't make any sense to do one stuff via shebang and another via calling almost the same stuff but directly. Many other modules follow this principle too (all these test_XXX.py).
(In reply to Dmitry Selyutin from comment #26) > Unlike to docs where this is fine for documenting the classes and routines I > have a strong vomit reflex for putting tests in the place the code lives, > especially when tests are mixed with the code. :) if they're small enough (and intended as "documented demos") they can be very useful. but not if they are dozens and dominate the module. > Many other modules follow this principle too (all these test_XXX.py). those are not intended as external programs, that's for sure.
Provided two new visitors: pcode and extras. Supported sv. prefixed instructions. Example: $ pysvp64db pcode addi RT <- (RA|0) + EXTS(SI) $ pysvp64db extras sv.setvl in1 sel RA0 reg RA seltype SRC idx EXTRA1 in2 sel NONE reg NONE seltype NONE idx NONE in3 sel NONE reg NONE seltype NONE idx NONE cr_in sel NONE reg NONE seltype NONE idx NONE cr_in2 sel NONE reg NONE seltype NONE idx NONE out sel RT0 reg RT seltype DST idx EXTRA0 out2 sel NONE reg NONE seltype NONE idx NONE cr_out sel CR0 reg CR0 seltype NONE idx NONE Note that the latter command only works with sv.-prefixed instructions since the extras make sense only with these.
Other commands simply discard sv. prefix where it makes sense (e.g. `pysvp64db pcode sv.addi` does the same as `pysvp64db pcode addi` would).
(In reply to Dmitry Selyutin from comment #28) > Provided two new visitors: pcode and extras. let's not go down the rabbithole of walking the entire pcode AST quite just yet :) (there's python-ply for that) > Supported sv. prefixed instructions. Example: > > $ pysvp64db pcode addi > RT <- (RA|0) + EXTS(SI) cool! > $ pysvp64db extras sv.setvl > in1 > sel RA0 > reg RA > seltype SRC > idx EXTRA1 this is going to be incredibly useful. i can see scenarios where latex auto-conversion is possible, and much more.
Supported operand spans: $ pysvp64db operands setvl RT 6,7,8,9,10 RA 11,12,13,14,15 SVi 16,17,18,19,20,21,22 vf 25 vs 24 ms 23 Rc=0 31 Note that this does not support inline extras, since these operate on concrete operand values to determine whether extras must be present or not. No idea how to handle this case yet.
(In reply to Luke Kenneth Casson Leighton from comment #30) > (In reply to Dmitry Selyutin from comment #28) > > Provided two new visitors: pcode and extras. > > let's not go down the rabbithole of walking the entire pcode AST > quite just yet :) (there's python-ply for that) Yeah, this should also be a separate visitor. This should make the generation simpler. Anyway, not in this task scope. :-) > > $ pysvp64db extras sv.setvl > > in1 > > sel RA0 > > reg RA > > seltype SRC > > idx EXTRA1 > > this is going to be incredibly useful. i can see scenarios where > latex auto-conversion is possible, and much more. Now that you mentioned this, I thought I should provide a separate .visit method for handling these, like Database and Record classes have. These should come handy for C code generation too.
Introduced "visitable" extra object and the corresponding handler. This, however, got tricky, since visitors had to track what exact instructions they visit. Please check the recent commits, I'd like have some ideas on improvements to make it less messy.
see https://bugs.libre-soc.org/show_bug.cgi?id=1056#c52 what is needed here is an enumerator of the instruction, *re-creating* and back-annotating the SVP64 operands *into* the markdown spec page. aka *a visitor*! it will need to provide clear visual tables showing how to decode SVP64, it is pretty much identical to the "verbose decode disasm debug print" option, but it needs to be much more compact.
(In reply to Dmitry Selyutin from comment #32) > Now that you mentioned this, I thought I should provide a separate .visit > method for handling these, like Database and Record classes have. These > should come handy for C code generation too. i believe you may be getting to the point where my suggestion to have the visitor-walking be much more part of each DB-class itself makes 0 sense. in this way *at the class definition* it describes its "sub-things" Database : * declares to all Vistors that it has 32-bit instructions * declares to all Vistors that it has SVP64 instructions * declares to all Visitors that it has Fields and Forms (in the python-html DOM/SAX visitor these are all identified by the function names). Record: * declares that it has {CSV-properties} and the Visitors walk that *entire lot* looking for corresponding function/object-names and calling them with the data. (In reply to Dmitry Selyutin from comment #33) > Introduced "visitable" extra object and the corresponding handler. This, > however, got tricky, since visitors had to track what exact instructions > they visit. yes they do. it is never easy to do visitors because the *declaration* of the database defines how to walk it, but that declaration never contains the data. (we are back to things like BNF, SemanticWeb, XSLT etc *shudder*) > Please check the recent commits, I'd like have some ideas on > improvements to make it less messy. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=7ffad8f7202 class BaseVisitor(Visitor): - def __init__(self, **_): - pass + def __init__(self, **arguments): + self.__arguments = types.MappingProxyType(arguments) + self.__current_db = None + self.__current_record = None + self.__current_extra = None + return super().__init__() ahhh.... i need to check: a Vistor *itself* may store state-information, but *not all visitors need to do so*. the *walking* aspect ("how to walk") is intended to be utterly separate from the "tracking" aspect, and for many Visitor-instances they hold an open file-handle and the visited-functions simply call "write" on it, using their arguments. so the *walker* simply ends up with stuff on the stack. lots of it. as it drills down into more child-data. if the Vistor *actually needed* any prior state (a parent), it is the responsibility *of that Visitor instance on its begin function* to make a note of it. so i need to check: are you intending *ALL* Visitors to critically rely on the existence of self.__current_db and so on? because i am concerned that all three of these have been added + self.__current_db = None + self.__current_record = None + self.__current_extra = None record is part of *Record* visiting. extra is part of *Extra* visiting. there should not even really be **arguments. (and certainly no use of python types module) python's standard module "types" - types.MappingProxyType(arguments) really should not be used.
Sorry, on the phone, so briefly. On **arguments: this is just a way to pass CLI arguments, nothing else. Some commands operate on a single instruction and I need to know _which_ one. On tracking db, record and extra: yes the code is, at best, shitty. Basically what I want is "only filter some specific records to be visited, where record name matches the CLI arguments". I don't know how to do it, because visitor expects all records to be visited. Another option would be to pass "parent" as part of arguments for each visit method (e.g. Record.visit might get db too, and VisitableExtra would have db and record in its visit method). But this can be somewhat clumsy too. I'd like to keep the walking logic inside insndb and provide a way to caller to adjust how to handle the fact that something is traversed. Callback-based way suffers from the same issue, though, so it's not specific to context manager.
Ah yes, on MappingProxyType: that's the standard way of declaring a dict which cannot be changed in Python. Kinda RO-wrapper around dict.
> --- Comment #36 from Dmitry Selyutin <ghostmansd@gmail.com> --- > Sorry, on the phone, so briefly. > > On **arguments: this is just a way to pass CLI arguments, nothing else. Some > commands operate on a single instruction and I need to know _which_ one. ok so here's a really important insight into Visitors: the pattern that they use has to be completely independent, such that a third-party does not have to inherit from *any* of the classes used or created by anyone else. > On tracking db, record and extra: yes the code is, at best, shitty. Basically > what I want is "only filter some specific records to be visited, where record > name matches the CLI arguments". ok so you: * create an entirely-new class, not having *any* inheritance from the *entire* set of code created so far; * have its constructor store the records; def __init__(self, *args): self.args = args * create an instance of that object * pass it to the walker-function which performs the traversal. > I don't know how to do it, because visitor > expects all records to be visited. Another option would be to pass "parent" as > part of arguments for each visit method (e.g. Record.visit might get db too, > and VisitableExtra would have db and record in its visit method). But this can > be somewhat clumsy too. again: you create an entirely-new class, which has *no* inheritance at all from the entire set of code created so far, > I'd like to keep the walking logic inside insndb and provide a way to caller to > adjust how to handle the fact that something is traversed. i think what you're describing is not a Visitor pattern, it's something else. > Callback-based way > suffers from the same issue, though, so it's not specific to context manager. and yet they are still used/developed, and python programmers the world over continue to use/develop the visitor pattern. context-manager is actually a bit of a problem because it (In reply to Dmitry Selyutin from comment #37) > Ah yes, on MappingProxyType: that's the standard way of declaring a dict > which cannot be changed in Python. Kinda RO-wrapper around dict. so if that's envisaged as needed then there is something very wrong with the Visitor API. Visitor APIs *have* to be completely independent of the data. now, if the *use* of the Visitor API needs that, then that's their problem, but there literally should be no types, no data, absolutely nothing, in the *API itself*/ the *only* thing it should do is: * call a function/method at the start, with the object at current-depth * in a for-loop: - call a child-begin-function with the sub-object (and index) - call a child-do function with the sub-object (and index) - call a child-end function with the sub-object (and index) * call a function/method at the end, before dropping out to prev-depth we can look at doing "polish-notation walking" and "reverse-polish-notation walking" later. can i suggest going back again to the pyomo visitor code, as the complexities of Visitor-Walking are actually really hard to express? https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/visitor.py;h=7ad025cd096 and no, "using some other project's code as-is" is not going to work for this project, for many reasons. one of which is "we do not want additional dependencies" and another is "they make assumptions about *their* code and *their* structure which bleeds into the modules" additionally: Visitor API walker patterns are very compact but extremely complex code, being developed usually over many months.
(In reply to Dmitry Selyutin from comment #36) > Basically what I want is "only filter some specific records to be visited, > where record name matches the CLI arguments". https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder.py;h=9389dd3a#l328 i pass in a col_subset (a set - by CSV-name - of which columns) and a row_subsetfn. row_subsetfn receives the row and returns True/False. however the *visitor* would simply go: def child_data_function(self, child_index, child_data): if some_filter_criteria(child_data): return do_something_with_child_data(child_data) ahhh i know what you've missed. look again at the pyomo Visitor API: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/visitor.py;h=7ad025cd0#l97 the "enterNode()" function is where the Visitor API allows for "results" to be accumulated - partially - and keeps all necessary state information. 83 initializeWalker(expr) -> walk, result 84 enterNode(N1) -> args, data 85 {for N2 in args:} 86 beforeChild(N1, N2) -> descend, child_result 87 enterNode(N2) -> N2_args, N2_data 88 [...] 89 exitNode(N2, n2_data) -> child_result 90 acceptChildResult(N1, data, child_result) -> data 91 afterChild(N1, N2) -> None 92 exitNode(N1, data) -> N1_result 93 finalizeWalker(result) -> result so the Visitor-callbacks receive all the information they need to "keep track". one of the things they can "keep track" of is: an object with the filter arguments. anything else can also be passed "up the stack" of the Visitor-callbacks.
(In reply to Luke Kenneth Casson Leighton from comment #39) > ahhh i know what you've missed. look again at the pyomo Visitor API: It's not that I missed it, that's a deliberate omission. With context managers, everything before yield is beforeNode; everything after is afterNode. The way I currently implemented it is that folks override the manager instead of overriding three methods. We must choose one approach, either Jacob's or html-parser-like. Jacob's looks simpler, but makes the user know that "yield" acts kinda onStuff, everything before yield is beforeStuff, and everything after is afterStuff. On the other hand, if we don't allow overriding the manager, we can hide the walking logic, so I'm unsure what to choose. Ideas?
(In reply to Dmitry Selyutin from comment #40) > (In reply to Luke Kenneth Casson Leighton from comment #39) > > ahhh i know what you've missed. look again at the pyomo Visitor API: > > It's not that I missed it, that's a deliberate omission. With context > managers, everything before yield is beforeNode; everything after is > afterNode. contextmanager - which only has constructor, __enter__ and __exit, are actually too simple. now, it turns out that *in some cases* (particularly things that write directly to file-streams) you just don't need and don't care about "ongoing state", and *in those cases* a contextmanager is perfectly sufficient. (these are effectively "one-pass" in compiler terminology) but for things that i am envisioning, contextmanager is nowhere near enough. imagine that you want to do a full language-translation (pseudocode-to-c), and that (aieee) we do actually put iterative-node-walking down into the RTL. but the big advantage of pyomo's Visitor walker-function is: it does *both*. > On the other hand, if we don't allow overriding the manager, we can hide the > walking logic, so I'm unsure what to choose. Ideas? these two should in no way be mutually-exclusive. as i have used Visitors for such a long time i have absolutely no problem at all with the "there's these functions you provide and they get called" idea. the walking-logic should be its own completely separate and independent function. https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/visitor.py;h=7ad025#l506 note, sigh, ha ha: pyomo designed the code so that you *can* inherit from the VisitorClass. but look carefully at the notes: 168 Clients interact with this class by either deriving from it and 169 implementing the necessary callbacks (see above), assigning callable 170 functions to an instance of this class, or passing the callback 171 functions as arguments to this class' constructor. and another way would be to pass in an "instance which has those same methods"
I'd like something intermediate between context managers (which are simple but not flexible) and pyomo (which perhaps powerful but overkill). How about three methods to be overriden?
(In reply to Dmitry Selyutin from comment #42) > I'd like something intermediate between context managers (which are simple > but not flexible) and pyomo (which perhaps powerful but overkill). trust me it only looks that way because it is a new concept and use-cases haven't been envisaged/understood yet :) pyomo's code can actually call contextmanagers. > How about three methods to be overriden? ngggh, the Visitor-pattern doesn't quite work that way (the python-html-walker doesn't either: the functions all conform to "handle_{database_sub-type}_{start/end}" ) okok one way to *make* it conform is for the functions to be: def handle_start(DATABASE_TYPE, node, data): def handle_child(DATABASE_TYPE, node, child_idx, child_data): def handle_end(DATABASE_TYPE, node, data): instead of handle_db_start handle_db_child handle db_end it is handle_start("db",...) handle_child("db", ...) handle_en ("db", ...) but this way also lies another form of madness: a massive suite of "if DATABASE_TYPE == 'db' do this else do that" in every visitor-instance
I meant three methods per context manager (three for db, three for record, etc.). Ok, if no objections, I'll switch to this way.
(In reply to Dmitry Selyutin from comment #44) > I meant three methods per context manager (three for db, three for record, > etc.). yes, i understood. didn't say it, but understood it. > Ok, if no objections, I'll switch to this way. none here - the only thing i will say is: please please make absolutely certain to separate "the act of walking" from "the visitor which is called by the walker". (i know that pyomo has the walking-function as part of the Visitor class) the next question to solve is: how do you "stack" them? how do you have db know about record, how do you have record know about span/extra/etc.? i don't mean the *data*. and the answer to that is where the "three functions" paradigm breaks down, and i _believe_ you're trying to solve it by placing local variables down at the BaseVisitor... and trust me when i say that's not the right way to write a Visitor API really, when you have time, please do go over StreamBasedExpressionVisitor, bear in mind it can do both DOM *and* SAX - and we will need both. (DOM: in-place in-memory tree modifying, used for compiler AST modifying. SAX: streaming-only. think "files", zero context. good for e.g. turning certain words into CAPITALS)
ah! i think i know how to describe Visitor-walking. * imagine that the original data (and its format) has been lost. there is no chance of recovery. * imagine that you investigate and find a "trace log" - a print-out - of the ENTIRE database contents (a good example would be a "SQL DUMP") * each new line in the trace-log has: - the TYPE of the thing - its CONTENTs - what its PARENT is now write two COMPLETELY SEPARATE programs: * program 1) can PRODUCE that trace-log * program 2) can RECREATE the database - both its format and its data - purely FROM the trace-log that's how Visitor-walking works. ------------------------------------ ah. i found another one, which says it is "so small you might as well cut/paste it into your project" https://github.com/mbr/visitor here's the tree-walk repro example: https://github.com/mbr/visitor/blob/9b7dbe83ed2163b2cfc4424926b555bb636492b0/tests/example.py#L18 def visit_list(self, node): self.indent += 1 s = '[\n' + ' ' * self.indent s += (',\n' + ' ' * self.indent).join(self.visit(item) for item in node) self.indent -= 1 s += '\n' + ' ' * self.indent + ']' return s ahh yuk. notice "self.visit(child_item)"? that's exactly the kind of thing that good Visitor APIs *don't* do: impose on the caller to explicitly continue the recursive-descent. (that is probably why you believe that pyomo's Visitor API is "complex" because they have abstracted-out the "tree-walking" part entirely from the "taking action" part) yes, it really is this simple! https://github.com/mbr/visitor/blob/master/visitor/__init__.py that's literally one function! class Visitor(object): """Base class for visitors.""" def visit(self, node): if isinstance(node, type): mro = node.mro() else: mro = type(node).mro() for cls in mro: meth = getattr(self, 'visit_' + cls.__name__, None) if meth is not None: return meth(node) raise NotImplementedError('No visitation method visit_{}' .format(node.__class__.__name__)) note absolutely zero use of typing!
another example: https://stackoverflow.com/questions/1515357/simple-example-of-how-to-use-ast-nodevisitor/26374977#26374977 which is the same programming-concept as this https://github.com/mbr/visitor/blob/master/visitor/__init__.py but made much more generic. that leads me to ast.py in python itself: https://github.com/python/cpython/blob/1237fb6a4b177ce8f750949b9006c58f9f22942e/Lib/ast.py#L1667 BLECH! def visit_Lambda(self, node): with self.require_parens(_Precedence.TEST, node): self.write("lambda") with self.buffered() as buffer: >>>>>> self.traverse(node.args) <<< EUURRRRRRG! if buffer: self.write(" ", *buffer) self.write(": ") self.set_precedence(_Precedence.TEST, node.body) self.traverse(node.body) NOOOO! :) i have a horrible feeling that stackoverflow answer is referring to an older version of python.... ah yes it was https://github.com/python/cpython/blob/2.7/Lib/ast.py that's MUCH better. ah HA! see iter_fields(node). it is *self-describing* thanks to fields, and performs "yield"... def iter_fields(node): for field in node._fields: try: yield field, getattr(node, field) except AttributeError: pass and... ah HA! class NodeVisitor(object): def visit(self, node): """Visit a node.""" method = 'visit_' + node.__class__.__name__ visitor = getattr(self, method, self.generic_visit) return visitor(node) def generic_visit(self, node): """Called if no explicit visitor function exists for a node.""" for field, value in iter_fields(node): if isinstance(value, list): for item in value: if isinstance(item, AST): self.visit(item) elif isinstance(value, AST): self.visit(value) that's... that's awesome. it's abstracting-out the field-definitions and allowing for walking of them. ah ha! we have _fields already in the insndb! ta-daaaa!
found another one that looks extremely nice https://github.com/realistschuckle/pyvisitor except that on close investigation it again relies on visitors having the "control" over the recursion/nesting-depth by forcing the visitor class-instance to call an "accept" method on child-objects... which i don't think is a good idea. however overriding __call__ i think has a lot of potential https://github.com/realistschuckle/pyvisitor/blob/master/src/visitor.py notice again how brutally-short that code is. and how it does *not* "import typing" or "import types". that's really important!
https://github.com/python/cpython/blob/2.7/Lib/HTMLParser.py forget the actual html-parsing, search for instances "handle_" in that code, and notice that HTMLParser *itself* performs the recursive-descent walking but does not expect the *handler* functions to do - or have any control over, or get involved with - the *act* of walking. there is no need at all in the *visitor* functions to perform "self.visit(somesubnode" or even worse "for child in thing self.visit(child)" i know i am emphasising it a lot, it's hard to express, but i feel that "actions taken on {thing}" should be completely separate from "act of tree-walking".
https://medium.com/design-patterns-in-python/visitor-pattern-b9227759d6be ah this is really clear and very nice: it clearly accentuates and illustrates the difference between *visiting* and *accepting*. they call it: class IVisitor(metaclass=ABCMeta): "An interface that custom Visitors should implement" class IVisitable(metaclass=ABCMeta): """ An interface the concrete objects should implement that allows the visitor to traverse a hierarchical structure of objects """ Visitor and Visitable. ha! that's the phrases i was looking for. * at the base class of *all* the insndb classes should be an accept(self, visitor) function * that function should, like python 2.7 ast.py, be capable of doing *full* recursive self-introspection, based on the insndb fields, *not* based on explicit implementation of accept() in *every* single insndb class (of which we have... 100?) the only thing "bad" about the example there is that IVisitor.visit() is a static method. but there is probably a good reason for it?
I've come up with the idea based on AST example you posted, and updated the code. I think it takes best from the ideas you posted and Jacob's managers. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5f07a672ec7eb1cf3206f87ddc00ea10f4430a24 The idea is: all this stuff inherits from some generic Node. Each Node-derived can specify what subnodes it has (also of class Node). The visiting algorithm is then described as: class Visitor: @_contextlib.contextmanager def Node(self, node, depth): yield node for subnode in node.subnodes: manager = subnode.__class__.__name__ manager = getattr(self, manager, self.Node) with manager(node=subnode, depth=(depth + 1)): pass def __getattr__(self, attr): return self.Node def __call__(self, node, depth): manager = node.__class__.__name__ manager = getattr(self, manager, self.Node) return manager(node=node, depth=depth) def visit(visitor, node): with visitor(node=node, depth=0): pass Note the way it's called: __call__ yields either overriden context manager or default Node context manager (and the latter just goes over subnodes). Then anyone can specify how the specific types are handled. Here's the visitor which filters the concrete record in database and prints its opcodes. Note that I also have __setitem__ in this _concrete_ visitor (not on top-level) which is used to access command line arguments (I could've used just **kwargs here, or call setattr(concrete_visitor, cmd_arg), or whatever). class InstructionVisitor(BaseVisitor): @contextlib.contextmanager def Database(self, node, depth): yield node for subnode in node.subnodes: if subnode.name == self["insn"]: with self(node=subnode, depth=(depth + 1)): pass class OpcodesVisitor(InstructionVisitor): @contextlib.contextmanager def Record(self, node, depth): for opcode in node.opcodes: print(opcode) yield node
Note that __getattr__ is kinda redundant: we always feed default self.Node in getattr calls. This, however, is more explicit. Just pointing that it can be reduced even more.
(In reply to Dmitry Selyutin from comment #52) > Note that __getattr__ is kinda redundant: we always feed default self.Node > in getattr calls. This, however, is more explicit. Just pointing that it can > be reduced even more. it looks really neat. the amount of code really should be this short: anything beyond 80 lines, there's something off. i'd really like to see "Visitor" separate from "Visiting", let me drop a copy of the medium.com/sbcode.net code here https://sbcode.net/python/visitor/#visitorvisitor_conceptpy the design that you have, the "Visitor" and "Visiting" are still merged into one: in the current code it is the *Visitor* that is performing (explicitly) the walking, which it should not be doing. def Database(self, node, depth): yield node for subnode in node.subnodes: <- this is merging "Visitor" with "Visiting" def Record(self, node, depth): for opcode in node.opcodes: <- again, "Visitor" is merged with "Visiting" print(opcode) that will get complex *real* fast, approximately... 5 to 6 nested-depths of for-loops. here, they have the concept "accept()" which instead we want the ability to yield from a contextmanager (i like that - "accept" sounds kinda dumb) # pylint: disable=too-few-public-methods # pylint: disable=arguments-differ "The Visitor Pattern Concept" from abc import ABCMeta, abstractmethod class IVisitor(metaclass=ABCMeta): "An interface that custom Visitors should implement" @staticmethod @abstractmethod def visit(element): "Visitors visit Elements/Objects within the application" class IVisitable(metaclass=ABCMeta): """ An interface the concrete objects should implement that allows the visitor to traverse a hierarchical structure of objects """ @staticmethod @abstractmethod def accept(visitor): """ The Visitor traverses and accesses each object through this method """ class Element(IVisitable): "An Object that can be part of any hierarchy" def __init__(self, name, value, parent=None): self.name = name self.value = value self.elements = set() if parent: parent.elements.add(self) def accept(self, visitor): "required by the Visitor that will traverse" for element in self.elements: element.accept(visitor) visitor.visit(self) # The Client # Creating an example object hierarchy. Element_A = Element("A", 101) Element_B = Element("B", 305, Element_A) Element_C = Element("C", 185, Element_A) Element_D = Element("D", -30, Element_B) # Now Rather than changing the Element class to support custom # operations, we can utilise the accept method that was # implemented in the Element class because of the addition of # the IVisitable interface class PrintElementNamesVisitor(IVisitor): "Create a visitor that prints the Element names" @staticmethod def visit(element): print(element.name) # Using the PrintElementNamesVisitor to traverse the object hierarchy Element_A.accept(PrintElementNamesVisitor) class CalculateElementTotalsVisitor(IVisitor): "Create a visitor that totals the Element values" total_value = 0 @classmethod def visit(cls, element): cls.total_value += element.value return cls.total_value # Using the CalculateElementTotalsVisitor to traverse the # object hierarchy TOTAL = CalculateElementTotalsVisitor() Element_A.accept(CalculateElementTotalsVisitor) print(TOTAL.total_value)
to make Instruction "conform" to the IVisitable pattern: class Instruction(str): def __new__(cls, string): svp64 = False if string.startswith("sv."): string = string[len("sv."):] svp64 = True self = super().__new__(cls, string) self.__svp64 = svp64 return self @property def svp64(self): return self.__svp64 add this: def accept(self, visitor): "required by the Visitor that will traverse" for opcode in self.opcodes: opcode.accept(visitor) visitor.visit(self) and if we want to extend it to include "filtering": def accept(self, visitor, accept_filter=lambda x:x): "required by the Visitor that will traverse" for opcode in self.opcodes: if accept_filter(opcode): opcode.accept(visitor) visitor.visit(self) then, OpcodesVisitor becomes: class OpcodesPrintVisitor: def Record(self, opcode): print(opcode) and that's all!! for Records: class Record: ..... def accept(self, visitor, accept_filter=lambda x:x): "required by the Visitor that will traverse" for operand in self.dynamic_operands: if accept_filter(operand): operand.accept(visitor) for operand in self.static_operands: if accept_filter(operand): operand.accept(visitor) visitor.visit(self) at which point it becomes possible to do this: class Record: ..... def accept(self, visitor, accept_filter=lambda x:x): "required by the Visitor that will traverse" for field in dataclass.fields(self): for subthing in field: if accept_filter(subthing): subthing.accept(visitor) visitor.visit(self)
OK, so basically the idea is that visitor calls special magic method. I have no problems with that, that's just another option to reach the same goal.
and Record filter for what is currently in OperandsVisitor: accept_filter = lambda operand: operand.name not in ("PO", "XO")
(In reply to Dmitry Selyutin from comment #55) > OK, so basically the idea is that visitor calls special magic method. there i was copying the "accept()" method from here: https://sbcode.net/python/visitor/#visitorvisitor_conceptpy i don't actually like the name of the function, but the *separation* of the actual-walking, that's the important bit. i really like the contextmanager and __call__() here, much better than "accept()". > I have > no problems with that, that's just another option to reach the same goal. it's very subtle, it's part of the "Model View Controller" paradigm. what you have done is {ModelViewController-all-in-one}
Another option. We visit everything always, but now can filter based on another visitor: def visit(node, handler, matcher=Matcher()): node.visit(handler=handler, matcher=matcher, depth=0) class Database: def visit(self, handler, matcher, depth): if matcher(node=self, depth=depth): with handler(node=self, depth=depth): for record in self: record.visit(depth=(depth + 1), handler=handler, matcher=matcher) Here's how instruction matching looks: class InstructionMatcher(Matcher): def Record(self, node, depth): return (node.name == self["insn"]) The trick is that we have two different visitors, one calls the check and another calls the corresponding context manager. Both resort to `return True` and `yield node` respectively. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=84b7982235332e51ca157e8bbe500a3f1e51910e
(In reply to Dmitry Selyutin from comment #58) > Another option. We visit everything always, but now can filter based on > another visitor: i initially went "ah ha!" but then realised that the idea is missing the fundamental, and it's that the "walking" is still not separate from the "action". until the *act of walking* is separated from the "action taken on the walk" there will be serious issues later. here is how it is separated in ast.py: https://github.com/python/cpython/blob/2.7/Lib/ast.py#L203 there are three functions involved: walk, iter_child_nodes, and iter_fields. by creating a Base Class from which all aspects of insndb derive, it will be possible to replace "if isinstance(field, AST):" (at line 179 and others) with "if isinstance(field, InsnDB_BaseClass)" and it will be possible to lift that recursive-walk code from ast.py pretty much as-is. > def visit(node, handler, matcher=Matcher()): > node.visit(handler=handler, matcher=matcher, depth=0) (btw the depth argument needs to be removed. it is the Visitor instance that must keep its *own* track of "depth") > class InstructionMatcher(Matcher): > def Record(self, node, depth): > return (node.name == self["insn"]) > > The trick is that we have two different visitors, one calls the check and > another calls the corresponding context manager. Both resort to `return > True` and `yield node` respectively. unfortunately whilst on the face of it i thought "ah ha! excellent" i realised this just moves the problem of recursive-iteration from the Visitor into the Matcher. > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=84b7982235332e51ca157e8bbe500a3f1e51910e yep it's increasing code-size rather than decreasing it. the fundamental code *really* should be no greater than around 100 lines, this is the "litmus test".
Decoupled visitors from walking: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=7de83fc89028c55a36af2210b80c426e24e6b5b9 The record filtering is now implemented in terms of concrete visitor. That's the whole code: class Visitor: def __call__(self, node): method = node.__class__.__name__ method = getattr(self, method, self.Node) return method(node=node) @_contextlib.contextmanager def Node(self, node): for subnode in node.subnodes: with self(subnode): pass yield node class Record(Node): @property def subnodes(self): for (name, fields) in self.extras.items(): yield Extra(name=name, **fields) class Database(Node): @property def subnodes(self): yield from self Obviously there's more in Record and Database to be visited, just a practical example. FWIW, here's how the filtering now looks: class RecordNameVisitor(Visitor): def __init__(self, name): self.__name = name self.__records = set() return super().__init__() @contextlib.contextmanager def Record(self, node): if node.name == self.__name: self.__records.add(node) yield node def __iter__(self): yield from self.__records match = RecordNameVisitor(name=args["insn"]) with match(node=db): nodes = frozenset(match)
(In reply to Dmitry Selyutin from comment #60) > Decoupled visitors from walking: > > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=7de83fc89028c55a36af2210b80c426e24e6b5b9 > > The record filtering is now implemented in terms of concrete visitor. That's > the whole code: > > > class Visitor: > def __call__(self, node): > method = node.__class__.__name__ > method = getattr(self, method, self.Node) > return method(node=node) > > @_contextlib.contextmanager > def Node(self, node): > for subnode in node.subnodes: > with self(subnode): > pass > yield node > > class Record(Node): > @property > def subnodes(self): > for (name, fields) in self.extras.items(): > yield Extra(name=name, **fields) > > class Database(Node): > @property > def subnodes(self): > yield from self > > > Obviously there's more in Record and Database to be visited, just a > practical example. that's a *massive* amount of work, adding for-loops on every single object on every single class. all of it unnecessary. here's the thing: they *all* can be done by introspection of the fields. you're already using dataclasses.fields(): https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/insndb/core.py;h=2876e787#l127 in ast.py simply replace "for field in node._fields" with "for field in dataclasses.fields(self)" https://github.com/python/cpython/blob/2.7/Lib/ast.py#L166 also in iter_child_nodes, "if isinstance(field, AST):" replace with "if isinstance(field, Node) https://github.com/python/cpython/blob/2.7/Lib/ast.py#L179 then make sure that *all* InsnDb classes inherit from Node. a possible exception is Instruction because it inherits from str. you're either going to need to bite the bullet on use of "isinstance", (please don't consider it "crap", it is pretty normal in this kind of code, and in python in general), or come up with an awful-hack of having a hidden identifier function/property of Node that is unique and is *not* present in str, list, tuple, bool, int, or any other base datatype *or class instance*, then using "hasattr()" to detect it. and that will violate the "Principle of Least Surprise", it is using a non-standard pattern for a *known* expected pattern (isinstance). it will become a maintenance and usage nightmare. (EDIT: removed a bit (you can see it in the history) i need to re-think as the RecordVisitorExample *might* be ok, i need to re-read it again.
https://github.com/python/cpython/blob/2.7/Lib/ast.py#L203 ah - i know how to make this do breadth-first(?) order: def walk(node): from collections import deque todo = deque([node]) todonext = deque() while todo: node = todo.popleft() todonext.extend(iter_child_nodes(node)) yield node if not todo: todo = todonext todonext = deque() and that should do breadth-first node-walking. i am not entirely sure how to do depth-first.. is it that you just move the "yield node" up? i leave it with you to work out?
(In reply to Luke Kenneth Casson Leighton from comment #61) > (In reply to Dmitry Selyutin from comment #60) > > that's a *massive* amount of work, adding for-loops on every single > object on every single class. all of it unnecessary. > here's the thing: they *all* can be done by introspection of the > fields. You _again_ skip the rationale I posted. I'm worrying this became a _fixed_ pattern. I already told the rationale why there's a property, please finally _read_ this. TL;DR: this is too low-level yet to be used in dataclasses.fields. If you have budget for adopting insndb for higher-level — fine, I'm OK to do it. Meanwhile even this task doesn't have the budget allocated. > > you're already using dataclasses.fields(): Yes I do, because I know the low-level details of insndb. The users of visitors shouldn't. > > > https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/insndb/ > core.py;h=2876e787#l127 > > in ast.py simply replace "for field in node._fields" with > "for field in dataclasses.fields(self)" > > https://github.com/python/cpython/blob/2.7/Lib/ast.py#L166 > > also in iter_child_nodes, "if isinstance(field, AST):" > replace with "if isinstance(field, Node) > > https://github.com/python/cpython/blob/2.7/Lib/ast.py#L179 > > then make sure that *all* InsnDb classes inherit from Node. > a possible exception is Instruction because it inherits from > str. > > > you're either going to need to bite the bullet on use of "isinstance", > (please don't consider it "crap", it is pretty normal in this kind > of code, and in python in general) No it isn't for many scenarios, other than restricting the inputs. Almost any scenario can be solved either by pattern matching or polymorphism. Please don't break duck typing unless really unavoidable. > *, then using "hasattr()" to detect No need for hasattr. Again: duck typing and polymorphism. You already have Node class. If all fields of insndb have this type, they are traversible for free. Refer to the above. > and that will violate the "Principle of Least Surprise", it is > using a non-standard pattern for a *known* expected pattern > (isinstance). it will become a maintenance and usage nightmare. That's basically the same thing inverted. "Propely-named" methods are no way better than isinstance in this regard. > > FWIW, here's how the filtering now looks: > > > > class RecordNameVisitor(Visitor): > > def __init__(self, name): > > self.__name = name > > self.__records = set() > > return super().__init__() > > > > @contextlib.contextmanager > > def Record(self, node): > > if node.name == self.__name: > > self.__records.add(node) > > yield node > > > > def __iter__(self): > > yield from self.__records > > ok so this is storing all the records then replaying them? > that's not necessary to do because the database can be > considered static (it's not going to change). Yes, one can walk db, collect all relevant entries, then visit these. That's be better. > also it can be replaced with a single-line lambda function for > a fixed match: > > matcher = lambda node: node.name == "XO" > > and for one that is more generic: > > class RecordMatcher: # no derivation from any other class > def __init__(self, name): self.name = name > def __call__(self, node): return self.name == node.name There's no difference. You can affect visitor's call either way.
TL;DR. I'm really extremely concerned that you keep avoiding to _read_ my comments. This is a severe issue, it complicates the development. 1. I'm leaving a standalone entry to __call__ where anyone can happily insert these isinstance calls. 2. I can make all (most?) fields inherit from Node, but beware that these details are way too low-level. Any sane caller won't likely use them other for enumeration; the users are expected to use high-level stuff.
(In reply to Dmitry Selyutin from comment #60) > @contextlib.contextmanager > def Record(self, node): > if node.name == self.__name: > self.__records.add(node) > ---> yield node <---- REMOVE THIS "yield Node" should be removed, and then the walking *which is now entirely decoupled* uses walk() from ast.py, and it becomes: v = RecordVisitor("XO") for node in walk(db): v(node) or, more to the point, those two lines "for node in walk(db): v(node)" are moved behind a function which RecordVisitorExample inherits from, just like in pyomo and in that IVisitor pattern class IVisitor: @staticmethod def walk_fn(cls, db, visitor): for node in walk(db): v(node) then it is: v = RecordVisitor("XO") # inherits from IVisitor v.walk_fn(db, v) *no* Visitor should also perform or be involved in or responsible for child-node-walking *at all*, at *any* time. i appreciate this is... complex, it's some of the most ridiculously-short python code, but unless it is short it very much gets out-of-hand very quickly.
(In reply to Luke Kenneth Casson Leighton from comment #65) > (In reply to Dmitry Selyutin from comment #60) > > > @contextlib.contextmanager > > def Record(self, node): > > if node.name == self.__name: > > self.__records.add(node) > > ---> yield node <---- REMOVE THIS > > "yield Node" should be removed, and then the walking *which is now entirely > decoupled* uses walk() from ast.py, and it becomes: nope, scratch that: worked it out https://libre-soc.org/irclog/%23libre-soc.2023-06-08.log.html#t2023-06-08T12:39:16
I've added some new types, and also refactored Dataclass type. From now on, there's no need to manually use dataclasses.dataclass decorator: we always use frozen dataclasses, since database is static, and allow check for equality, since we indeed compare many of them.
I think the users of API will have some difficulties differentiating which stuff comes from where, especially with nested data types (e.g. dataclasses or tuple-like or similar). Options are: 1. For dataclasses or dict-like, yield field names along with nodes. For tuples, yield index. Something else for others? 2. Have the types decoupled, so that by type it's possible to determine the origin. Seems kinda dumb and overkill. 3. Yield the node parent along with the node itself. Ideas and suggestions are welcome.
Option 4, which is, I think, the best one: have a full element path. That is, for example: db.record.ppc, or db.record.fields[0]. db.record.mdwn.operands[0]... and so on.
Meanwhile I've added a functionality to walk over classes in a generic way. Something intermediate between classmethod decorator and usual method, with the except that the newly added decorator can accept either class or instance. Example: >>> print(tuple(Record.walk())) (<class 'openpower.insndb.core.String'>, <class 'openpower.insndb.core.Section'>, <class 'openpower.insndb.core.PPCRecord'>, <class 'openpower.insndb.core.Fields'>, <class 'openpower.insndb.core.MarkdownRecord'>, <class 'openpower.insndb.core.SVP64Record'>)
The overall usage looks kinda insane, so any ideas on improvements are welcome: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=61e53d70045e090620d6acccb92d344baaae0b37
(In reply to Dmitry Selyutin from comment #68) > I think the users of API will have some difficulties differentiating which > stuff comes from where, especially with nested data types (e.g. dataclasses > or tuple-like or similar). well, we are the users, and if we understand it (and, importanty, create enough examples), then it's fine. but also of course, documentation comments etc. are also crucial. this is compiler / language-translator technology, basically: data-transformation. it is always very tricky. > Options are: > 1. For dataclasses or dict-like, yield field names along with nodes. For > tuples, yield index. Something else for others? (for tuples and lists, and anything else iterable, yield the index, yes) > 2. Have the types decoupled, so that by type it's possible to determine the > origin. Seems kinda dumb and overkill. > 3. Yield the node parent along with the node itself. (3) i recall seeing in pyomo. oh. i know a trick that can help: create the following: * a "database printer to file-handle" Visitor * a "database reader from file-handle" Visitor. * a "database *comparator* Visitor. then a unit test can be written which: * prints out the entire database to a file * reads the entire database from the file * compares the database contents against the original through the decoupling to a file we *will* know: * firstly if the API looks sane, * secondly if everything works, and * thirdly have some absolutely fantastic (and complete) examples demonstrating how it all works. then run it on: * an instance of the Database class * the Database class itself. the latter might need a different comparator-Visitor
(In reply to Dmitry Selyutin from comment #70) > Meanwhile I've added a functionality to walk over classes in a generic way. > Something intermediate between classmethod decorator and usual method, with > the except that the newly added decorator can accept either class or > instance. > Example: > > >>> print(tuple(Record.walk())) > (<class 'openpower.insndb.core.String'>, <class NICE! (correction: frickin awesome) is the hierarchy preserved? Record.walk() isn't going recursively down the other classes, is it? hang on let me check... @_functools.total_ordering class Record(Dataclass): name: String section: Section ppc: PPCRecord fields: Fields mdwn: MarkdownRecord svp64: SVP64Record = None ha! cool!
ahhh... this may have been a misunderstanding current: class ListVisitor(Visitor): @contextlib.contextmanager def __call__(self, node): if isinstance(node, Record): print(node.name) yield node below is much better and readable, and it is a really beneficial part of the API yesterday(?): class ListVisitor(Visitor): @contextlib.contextmanager def Record(self, node): print(node.name) yield node was there a technical reason to make this change? when i was referring to use of isinstance, i was referring to the (now decoupled) database-walking-function, not the visitors. (now called Dataclass.walk)
ah. just spotted this: class Dataclass(metaclass=DataclassMeta): @walkmethod def walk(clsself, match=None): if match is None: match = lambda subnode: True def field_type(field): return field.type def field_value(field): return getattr(clsself, field.name) field = (field_type if isinstance(clsself, type) else field_value) ^^^^^^^^^^^ ahhhmmm.... ahmhahm.... i'm not sure that this is needed. if you are calling the Visitor on a class (not an instance), you *know* that the visitor-function is going to receive types, not instances. so there will be no "mixing"... ... but if we _did_ want to mix (have for example *both* visiting of the database-types *and* visiting of the database-instances) then one way to handle that would be.... [going back to the old Visitor API] class ListVisitor(Visitor): @contextlib.contextmanager def Record(self, node): print(node.name) yield node -- vvvv --- def RecordType(self, node): -- ^^^^ --- print(node.type) yield node it's.... six of one and half a dozen of the other: one option would be just to call two separate Visitors. * the first one to print out a c header file, * the second one to print out the c source file. but i kinda like the idea of the Visitor being (optionally) able to handle both (and the same Visitor instance be able to output both the header and source file)
(In reply to Dmitry Selyutin from comment #69) > Option 4, which is, I think, the best one: have a full element path. That > is, for example: db.record.ppc, or db.record.fields[0]. > db.record.mdwn.operands[0]... and so on. mmmm.... this kind of thing is what the Visitor should do (itself), by having all of the functions class BiggerVisitor(BaseVisitor): def __init__(self): self.stack_of_stuff = [] @contextlib.contextmanager def Record(self, node, depth): self.stack_of_stuff.append(node.name) print(".".join(self.stack_of_stuff)) yield node self.stack_of_stuff.pop(-1) # i think def Opcode(self, node, depth): self.stack_of_stuff.append(node.name) print(".".join(self.stack_of_stuff)) yield node self.stack_of_stuff.pop(-1) # i think urrr... those are both the same. urrr.... ahhhh... i think i know why you replaced def Record (etc) with def __call__, because the above is just this, because the two do the same thing def __call__(self, node): self.stack_of_stuff.append(node.name) print(".".join(self.stack_of_stuff)) yield node self.stack_of_stuff.pop(-1) # i think hmmmm hmmm hmmm.... this is solved in python 2.7 ast.py with a getattr(), isn't it? https://github.com/python/cpython/blob/2.7/Lib/ast.py#L237 def visit(self, node): """Visit a node.""" method = 'visit_' + node.__class__.__name__ visitor = getattr(self, method, self.generic_visit) return visitor(node) the equivalent here (for us) would be: visitor = getattr(self, method, self.__call__) ahhhh that's it. that's the solution. demonstrating back in the original (from yesterday?): @_dataclasses.dataclass(eq=True, frozen=True) class Dataclass: def subnodes(self, match=None): if match is None: match = lambda subnode: True def subnode(field): vvvvvvvvvvvvvvv return getattr(self, field.name, field.__call__) ^^^^^^^^^^^^^^^ yield from filter(match, map(subnode, _dataclasses.fields())) that will cope with both... is that right? class Dataclass(metaclass=DataclassMeta): @walkmethod def walk(clsself, match=None): if match is None: match = lambda subnode: True def field_type(field): return field.type def field_value(field): vvvvvvvvvvvvvv return getattr(clsself, field.name, field.__call__) ^^^^^^^^^^^^^^ field = (field_type if isinstance(clsself, type) else field_value) yield from filter(match, map(field, _dataclasses.fields(clsself)))
I'm too tired today to check the comments in details, I'll address all these later, OK (maybe tomorrow)? I've just finished a sketch for paths, but it's so ugly that it doesn't deserve to be put in master, cf. paths branch. There's a `pysvp64db tree` command which shows what I want. Some entities are not of Node type yet, so the output is far from perfect, but demonstrates the idea. Everything else later, cannot even think straight.
Caveat: don't treat anything you see in paths branch seriously, this got so messy and fricking ugly I'm thinking of some other approach. I'll check your ideas with stacks too, on the first glance this should work, but only if Node class has some field property (name in your example, but this doesn't fit, since some classes have such field too). But I'd like to avoid properties and reuse field.name from dataclasses. No idea how to do it in a graceful way. The code in paths branch should really be put out of misery.
(In reply to Dmitry Selyutin from comment #77) > I'm too tired today to check the comments in details, I'll address all these > later, OK (maybe tomorrow)? i wasn't expecting you to be up!! yes definitely not tonight. > I've just finished a sketch for paths, but it's > so ugly that it doesn't deserve to be put in master, cf. paths branch. :) > There's a `pysvp64db tree` command which shows what I want. likewise will look tomorrow :) but when you are awake i think you'll find that what you want is likely achievable with the "stack_of_stuff" from comment #76. > Some entities > are not of Node type yet, so the output is far from perfect, but > demonstrates the idea. fantastic. > Everything else later, cannot even think straight. take it easy ok? i know the lure of "LCD screen" substituting for "sunlight" very well :)
(In reply to Luke Kenneth Casson Leighton from comment #79) > likewise will look tomorrow :) but when you are awake i think > you'll find that what you want is likely achievable with the > "stack_of_stuff" from comment #76. ok yes it was easy to check - took a look: definitely "no" :) it is perfectly reasonable to have the Visitor itself be aware of the "order" of things, and to do "stuff" such as push the node onto a stack that is *local* to the Visitor. to reproduce what you did, i believe it is: def __call__(self, node): print("/".join(self.stack_of_stuff), node) self.stack_of_stuff.append(node.name) yield node self.stack_of_stuff.pop(-1) # i think as simple as that. * no need for pathlib when "/".join() will do (complete overkill) * the names are accumulated as the __call__ goes down the depth basically if there is a temptation to add any parameters (like the depth parameter), it should be resisted :) the *visitor* function becomes responsible for *using* the fact that it is being called. the only good additional parameter, as jacob pointed out, would be to have the node index. def __call__(self, node, node_idx): def Record(self, node, node_idx): ...
class String(Node, str): pass class PPCRecord(Dataclass): - comment2: str = '' + comment2: String = String() i really don't think this is a good route to go down (converting _all_ types to inherit from Node): indirectly it implies ultimately that SelectableInt and FieldSelectableInt must derive from Node and that is really not a good idea. (plus, bool cannot be inherited, and we need it). it's... if the basic types work (str, int, bool) and do the same job, then creating a class to do the same job... yyeah :)
Long story short. I share your opinion almost on everything, but I had to at least probe this route to understand its pros and cons. The only exception is generic __call__. This actually comes from your isinstance hints (about another section?), but I realized that relying on class name is too fragile. isinstance refers to an object (yes, types are objects too in Python); name just refers to some string. Whilst it looks simpler, it's indeed less intuitive than explicit checks. If somebody wants this functionality (per-class context managers), they can do it directly in concrete visitors: class WeirdVisitor(Visitor): @contextlib.contextmanager def __call__(self, node): method = getattr(self, node.__class__.__name__, self.Node) return method(node) @contextlib.contextmanager def Node(self, node): raise 666 @contextlib.contextmanager def Record(self, node): raise 42 TL;DR: I'm removing paths branch. However, there's still one issue with the approach you suggest. def __call__(self, node): print("/".join(self.stack_of_stuff), node) self.stack_of_stuff.append(node.name) yield node self.stack_of_stuff.pop(-1) # i think I'f we're going to drop the Node-inherited classes (Path, String and similar stuff), there's no way we could call node.name (or, again, some other generic field, because "name" is occupied). `str` or `bool` don't have such interface. As for bool inheritance, that's not a problem: whilst bool isn't inheritable, int is.
My point is, inheriting all stuff from Node gives this: 1. No need to check walk method availibility via isinstance/try+catch/hasattr. We can _assume_ everything "walkable" is Node-derived. 2. We can therefore extend Node-based stuff without problems. The first extension might be Node.path (or whatever name we can invent for it). 3. Some things can be stubs or generic (cf. Nodes.walk, Dataclass.walk; perhaps I'll add some stuff for Tuple-like, because we have several tuple-based things which should also be walkable). 4. It's just logical that everything which can be used in walk/visit API corresponds to some interface.
Ah, and one note about a single __call__. Even though it's an unlikely scenario, two classes might share the same name (e.g. one is on global level and other is class-nested). What'd be the callbacks then? So, all in all, I think that, as much as I don't like isinstance checks, they're much better than checking class names.
(In reply to Dmitry Selyutin from comment #82) > Long story short. I share your opinion almost on everything, but I had to at > least probe this route to understand its pros and cons. > The only exception is generic __call__. This actually comes from your > isinstance hints (about another section?), this is where the misunderstanding is: i was talking at the time about ast.py iter_child_nodes use of isinstance: def iter_child_nodes(node): for name, field in iter_fields(node): if isinstance(field, AST): yield field elif isinstance(field, list): for item in field: if isinstance(item, AST): yield item * you solved the first use there by doing contextmanager. * you solved the second use by doing Node.subnodes (now Nodes.walk) * the third instance could again use another contextmanager whereas i was talking about the use in the *visitor*, which in ast.py is this: https://github.com/python/cpython/blob/2.7/Lib/ast.py#L240 def visit(self, node): method = 'visit_' + node.__class__.__name__ visitor = getattr(self, method, self.generic_visit) return visitor(node) which for us would be: def visit(self, node): method = 'visit_' + node.__class__.__name__ visitor = getattr(self, method, self.__call__) return visitor(node) > but I realized that relying on > class name is too fragile. it's not "fragile", it's the best aspect of the entire visitor pattern when applied to python. plus it breaks the "Principle of Least Surprise". *everyone* who has ever used Visitors in python expects to have function-names in the Visitor named after the "data type of the database object being walked". you are proposing to break that expectation in effect. i don't mind self.__call__ being the default for > isinstance refers to an object (yes, types are > objects too in Python); name just refers to some string. Whilst it looks > simpler, it's indeed less intuitive than explicit checks. If somebody wants > this functionality (per-class context managers), they can do it directly in > concrete visitors: > > class WeirdVisitor(Visitor): > @contextlib.contextmanager > def __call__(self, node): > method = getattr(self, node.__class__.__name__, self.Node) > return method(node) the *very* first thing that i will do on creating *any* visitors is precisely and exactly that, copying that exactly and *only* using that! and in all documentation i will strongly recommend using that and not the Visitor! > @contextlib.contextmanager > def Node(self, node): > raise 666 > > @contextlib.contextmanager > def Record(self, node): > raise 42 > > TL;DR: I'm removing paths branch. However, there's still one issue with the > approach you suggest. > > def __call__(self, node): > print("/".join(self.stack_of_stuff), node) > self.stack_of_stuff.append(node.name) > yield node > self.stack_of_stuff.pop(-1) # i think > > I'f we're going to drop the Node-inherited classes (Path, String and similar > stuff), there's no way we could call node.name (or, again, some other > generic field, because "name" is occupied). `str` or `bool` don't have such > interface. then the *Visitor* should do "if isinstance(node, str) self.stack.append(node)" again, like the "depth" parameter, the walker-function should not be compromised by what the *Visitor* does. > As for bool inheritance, that's not a problem: whilst bool isn't > inheritable, int is. you see how going down that path is a danger? using bool is clear about the intent. additionally i would really like this code to be useable elsewhere in the project, and even a separate git repo created for it (and put onto pypi). in particular at some point i would like to replace the existing nmigen visitor-pattern (which is the usual "faulty" design) with this one. (In reply to Dmitry Selyutin from comment #84) > Ah, and one note about a single __call__. Even though it's an unlikely > scenario, two classes might share the same name (e.g. one is on global level > and other is class-nested). that would be completely confusing and seriously inadviseable from a project maintainability perspective! as an outside possibility that will immediately get picked up by the user when they find that they can't call a Visitor on that nested-class, they'll soon change the name. with Visitors being such a huge integral part of a good AST-walking / Language-translator / Compiler API, "users" (us at the moment but other people in future if we do drop this onto pypi) *will* keep a clean global namespace. think about HTML/DOM: imagine the total chaos if HTML's DOM (Domain Object Model) had an object-type that was locally different from the global type namespace, somewhere deep inside the structure. if DOM.Window was different from DOM.Document.Window for example. or if c had a different meaning for "struct" just because it was inside a "switch" statement? part of a clean AST / language / compiler is verrrry much about picking a clean global namespace - *definitely* don't do nested classes in insndb!! again (just on nested-classes): "Principle of Least Surprise". (i aim to propose to use insndb for work inside the ISA Working Group, for walking the specification document which is now in latex)
I just realized. No need to inherit everything in code from Node, unless required. We can just yield a proxy object, which wraps the original one but is inherited from Node.
About methods named as classes. If can you propose something that relies on _objects_, not _names_, I'm fine with that. If not -- that's a strong no. No this does not comply with principle of a least surprise, it breaks it, in fact, even more than isinstance. The method generation should at least be hidden and not be explicit. I'll think how to handle it, but this should not be based on names from caller's perspective.
(In reply to Dmitry Selyutin from comment #86) > I just realized. No need to inherit everything in code from Node, unless > required. We can just yield a proxy object, which wraps the original one but > is inherited from Node. eeehmmm... ehmehmehm... i'd love to see that in action, i can sort-of subconsciously / intuitively see it would work but i don't know how :) (In reply to Dmitry Selyutin from comment #87) > About methods named as classes. If can you propose something that relies on > _objects_, not _names_, I'm fine with that. think it through: if we publish this as a generic pypi package (and assume that the users are intelligent), will they freak out at needing to inherit everything from "class Node" in their project? (remember i am considering replacing nmigen's Visitor-pattern with this code) > If not -- that's a strong no. No > this does not comply with principle of a least surprise, it breaks it, in > fact, even more than isinstance. The method generation should at least be > hidden and not be explicit. I'll think how to handle it, but this should not > be based on names from caller's perspective. yyeah, i think we start to see why pyomo has a list of "builtin types", and also why they have "register this class type with the walker system" functions: https://github.com/Pyomo/pyomo/blob/main/pyomo/common/numeric_types.py#L70 native_types is used back in the visitor code (in their equivalent of the tree-walk function) to decide what to yield to the Visitor: https://github.com/Pyomo/pyomo/blob/main/pyomo/core/expr/visitor.py#L1013 elif type(child) in native_types: return False, child actually... "type(child) in native_types" might be the right thing to do, here, class Dataclass(metaclass=DataclassMeta): @walkmethod def walk(clsself, match=None): ... field = (field_type if isinstance(clsself, type) else field_value) yield from filter(match, map(field, _dataclasses.fields(clsself))) --> field = (field_value if type(child) in native_types else (field_type if isinstance(clsself, type) else field_value)) yield from filter(match, map(field, _dataclasses.fields(clsself))) would that work? i'm running on instinct here.
(In reply to Dmitry Selyutin from comment #87) > About methods named as classes. If can you propose something that relies on > _objects_, not _names_, I'm fine with that. okaaay, i grasp this now. some clues: i wondered if it was possible to just assume that dataclasses are iterable, because if they are then Node.subnodes may not even be necessary! https://stackoverflow.com/questions/74393947/make-python-dataclass-iterable https://github.com/ericvsmith/dataclasses/issues/21 drat, that won't work (unless first doing "getattr(node, 'astuple')", plus __iter__ is unlikely to work on objects. * bool is a singleton type so could be special-cased https://stackoverflow.com/a/2172204 field = field_value if isinstance(field, bool) else... * list/tuple/str/int likewise... no it really is looking like this is the right thing to do here: field = field_value if \ type(field) in [bool,list,tuple,str,int,float,complex,bytes]) else ... wait... no... not list/tuple and not field_value field = field if \ type(field) in [bool,str,int,float,complex,bytes]) else ... and for list/tuple to be walked separately... ahhh *that's* why python ast.py did "elif isinstance(field, list):" https://github.com/python/cpython/blob/2.7/Lib/ast.py#L181 so the order of "field = ...." should be something like: * bool first (as a singleton type) yields... *field* not field_value * type(field) in [str,int,foat,complex,bytes] also yield field * if class-type is a dataclass, use the dataclass.fields trick (field_type+field_value) * otherwise just iterate on it. something like this: class Dataclass(metaclass=DataclassMeta): @walkmethod # https://www.google.com/search?q=python+are+lambda+functions+hashable # lambda is a singleton, so there is no danger def walk(clsself, match=lambda subnode:True): def field_type(field): return field.type def field_value(field): return getattr(clsself, field.name) if type(clsself) in [bool,str,int,float,complex,bytes]: yield clsself # https://www.programiz.com/python-programming/methods/built-in/issubclass # this will work because "class is considered a subclass of itself" elif issubclass(clsself, dataclass): # i prefer the "getattr(field, ...., field.__call__)" thing be used field = (field_type if isinstance(clsself, type) else field_value) yield from filter(match, map(field, _dataclasses.fields(clsself))) else: yield from clsself # assumes that the object is iterable somehow something like that? combined with the proxy-idea i *think* that code becomes fully-independent of the class types... it's almost there :)
okok i have an idea. first, add visitor and match to visit fn: 123 def visit(visitor, node, match=None): if not match(node): return 124 with visitor(node=node): 125 if hasattr(node, "walk"): 126 for subnode in node.walk(visitor, match): 127 visit(visitor=visitor, node=subnode, match=match) (match missing would have been spotted soon enough, the key is adding visitor as an arg to Node.walk() and to def visit()) now add it to walk: 92 class Dataclass(metaclass=DataclassMeta): 93 @walkmethod 94 def walk(clsself, visitor, match=None): and now, ta-daaa! this can be done: if not match(clsself): return # not the clever bit yet... yield clsself yield from filter(match, visitor.get_any_child_nodes(clsself)) ta-daaa! and the rest moves into Visitor: class Visitor: .... def get_any_child_nodes(self, node): if type(node) in [bool, str, float, complex, bytes]: yield node elif ... and in this way anyone may override that, do whatever the hell they like, we don't care :) ahh this is good, it moves the existence of Node out where it belongs: in Visitor, which is what knows about the format of the Database. as long as nobody calls any Database class "get_any_child_nodes" it is fine :) but we also start to see why python HTMLlib used "do_" as a prefix, and why ast.py has "visit_....." as a prefix. hmmm that's probably a good practice, with *all* Visitors. otherwise a search you wonder "what the heck def Record and class Record, moo??"
Luke, you again contradict to your own words. Please *re-read* what you wrote before. Passing *visitors* is not part of walking.
That's what I suggest instead all of these "let's exclude special types", "let's invent a magic method which has class name", etc. $ cat interface.py class Visitor: @contextlib.contextmanager def __call__(self, node): (visitorcls, nodecls) = map(type, (self, node)) mgr = visitor.registry.get(visitorcls, {}).get(nodecls, None) if mgr is not None: with mgr(self, node=node) as ctx: yield ctx else: yield node def visitor(visitorcls, nodecls): def decorator(fn): mgr = contextlib.contextmanager(fn) visitor.registry[visitorcls][nodecls] = mgr return Visitor.__call__ return decorator visitor.registry = collections.defaultdict(dict) $ cat implementation.py class StubVisitor(Visitor): pass @visitor(StubVisitor, str) def coco(self, node): print("BY GOD IT'S A STRING!!!", node) yield node @visitor(StubVisitor, int) def jamboo(self, node): print("DONNERWETTER WE HAVE AN INTEGER!", node) yield node stub = StubVisitor() with StubVisitor().__call__("hi Luke"): pass with StubVisitor().__call__(42): pass
The only disadvantage that such handlers cannot be done inline in the class, and the visitor class must be passed along with the node class to be hooked. The decorator cannot deduce the wrapped class from the method itself, since by the time the decorator is called these methods are not bound yet. However, this is damn simple, straightforward, and never ever relies on class names, it relies on objects. And any type is an object in Python.
Oh, and by the way, if some weirdo ever wants to override the way the default __call__ behaves, they can override __call__ in the derived class. Exactly like those visit and generic_visit in AST.
Oh, and yes, if visitor class is passed into the decorator, there's no need for "registry" to be global: it can be per-visitor-class.
class VisitorMeta(type): def __init__(cls, name, bases, ns): cls.__registry = {} return super().__init__(cls) def __contains__(self, nodecls): return self.__registry.__contains__(nodecls) def __getitem__(self, nodecls): return self.__registry.__getitem__(nodecls) def __setitem__(self, nodecls, call): return self.__registry.__setitem__(nodecls, call) def __iter__(self): yield from self.__registry.items() class Visitor(metaclass=VisitorMeta): @contextlib.contextmanager def __call__(self, node): (visitorcls, nodecls) = map(type, (self, node)) if nodecls in visitorcls: with visitorcls[nodecls](self, node=node) as ctx: yield ctx else: yield node class visitormethod: def __init__(self, visitorcls, nodecls): self.__visitorcls = visitorcls self.__nodecls = nodecls return super().__init__() def __call__(self, call): self.__visitorcls[self.__nodecls] = contextlib.contextmanager(call) return Visitor.__call__
The patch above allows per-class registration, makes any visitor class object behave like dict (e.g. StubVisitor[str] will give the hook), enables iteration, etc.
(In reply to Dmitry Selyutin from comment #91) > Luke, you again contradict to your own words. Please *re-read* what you > wrote before. Passing *visitors* is not part of walking. that was me trying to say "when the visitor function was itself performing the walk" rather than it being the separate function it is now (not very well but we got there) a "well-defined collaboration with default behaviour" between Visitor and walker-function (def visit()) is perfectly fine.... but... ha! don't need it! love this: @visitor(StubVisitor, str) def coco(self, node): print("BY GOD IT'S A STRING!!!", node) yield node oo does that.... oo it separates the thing from the thing, oooo. does this work? class StubVisitor(Visitor): def __init__(self): self.stuff = "something" @visitor(StubVisitor, str) def coco(self, node): print(self.stuff, "BY GOD IT'S A STRING!!!", node) yield node can this be made to work without needing the 1st argument? @visitor(str) registry as a global variable will break things, it needs to be part of Visitor itself? class Visitor: registry = defaultdict() def visitor(cls, nodecls): registry(....) but for inheritance, sigh, the registry dict should be separate? ohhhh you made it double-nested, a dict-of-dicts, niiice more later, need food.
I'll come up with the patch to inline methods later. Note that the methods will need to have different names; the following won't work: class StubVisitor(Visitor): @visitormethod(str) def __call__(self, node): print("string", node) yield node @visitormethod(int) def __call__(self, node): print("integer", node) yield node ...but the following will: class StubVisitor(Visitor): @visitormethod(str) def coco(self, node): print("string", node) yield node @visitormethod(int) def jamboo(self, node): print("integer", node) yield node The names can be whatever you like, they just need to be distinct.
(In reply to Dmitry Selyutin from comment #94) > Oh, and by the way, if some weirdo ever wants to override the way the > default __call__ behaves, they can override __call__ in the derived class. > Exactly like those visit and generic_visit in AST. as long as the entire codebase is not full of "isinstance" checks, worse, "if isinstance elif isinstance elif isinstance" i don't mind. ohh i get it. the decorator and especially the defaultdict *does the same job* as the isinstance(s). holy cow that's powerful. btw try this: mgr = visitor.registry.get(visitorcls).get(nodecls, None) ... visitor.registry = collections.defaultdict(collections.defaultdict)
(In reply to Dmitry Selyutin from comment #99) > I'll come up with the patch to inline methods later. Note that the methods > will need to have different names; the following won't work: > > class StubVisitor(Visitor): > @visitormethod(str) > def __call__(self, node): ... > @visitormethod(int) > def __call__(self, node): ... assert not in dict already (overwrite not permitted). > The names can be whatever you like, they just need to be distinct. yeah totally got it. love it.
from https://realpython.com/python-metaclasses/: class VisitorMeta(type): def __new__(cls, name, bases, dct): x = super().__new__(cls, name, bases, dct) x.registry = collections.defaultdict return x class Visitor(metaclass=VisitorMeta): pass class Visitor2(Visitor): pass Visitor.registry[5] = 2 Visitor2.registry[9] = 7 print (Visitor.registry) # {5: 2} print (Visitor2.registry) # {9: 7} ta-daaaaa :)
(In reply to Dmitry Selyutin from comment #96) > class VisitorMeta(type): > def __init__(cls, name, bases, ns): > cls.__registry = {} > return super().__init__(cls) this is placing the registry in *VisitorMeta*. it needed to be the other way round: x = super().__new__(cls, name, bases, dct) x.registry = collections.defaultdict return x (remember also to pass *all* the arguments to type() otherwise it will (a) have no base classes (b) have no functions... i mean *maybe* this is good? doesn't seem like it, given that there are other functions in VisitorMeta) oh wait... you meta'd __init__ - that python-metaclasses meta'd __new__.
(In reply to Luke Kenneth Casson Leighton from comment #103) > this is placing the registry in *VisitorMeta*. > it needed to be the other way round: No it is not. The relationship between __new__ and __init__ in metaclasses the same as between __new__ and __init__ in classes. In fact, metaclasses are classes too. Just use `print(cls)`.
Pushed the holy-cow implementation. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=8a43ffad8d938f83cb4ac6c9963a404e0d8ab38b https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=61cc08f2d11fce0989db4fb4d502e14efc56d80a Took more code than you like, but most of it are just convenience wrappers (e.g. some methods in VisitorMeta) and checks (e.g. checking `callable` or `isinstance(nodecls, type)`). This is just so that users don't shoot their feet and at the same time could use some cool tricks (listing all hooks the visitor has, checking whether there's a hook for some type, etc.).
The proxy object I mentioned before is a bad option: it'll break our visitor checks (because proxy object is a new type). A new "path" property is a bad option too: it needs awful hacks with frozen dataclasses which don't allow overriding attributes after instantiation, and, what's worse, doesn't work with builtins (e.g. setting var.field fails if var is of str type). Not an option either. I'm still looking for options on how to yield fields which have names without refactoring the walking process. But I'd say I don't have an inspiration yet.
"overriding visitor method:" -> "overwriting existing visitor method"
I had to introduce some stuff like Tuple type (like with Dataclass class, it's just used to avoid duplicating the walk() method code). I've added the `pysvp64db tree` command. We're almost here. It handles not all fields yet, but many of them. The depth tracking is indeed easy; but I'm not sure how to handle paths. If only we had something in each node to be accessed...
OK, take 2: I committed a simplified version of walkers which yield paths too. But, this time, the path of the parent is not passed to the children walk algorithm. Each component just yields paths to its children nodes along with the nodes itself, and passes these to visitors. Luke, please take a look, I have no idea how to do it better, but this at least seems cleaner than the nonsense I posted yesterday. You can try it with `pysvp64db tree` command.
If this commit still looks like nonsense, we can revert it. But I think this is at least acceptable.
Meanwhile made tree more beautiful (truncated output a bit to keep comment simple, can be reproduced via `pysvp64db tree`): /0 /0/name 'tdi' /0/section /0/section/csv PosixPath('major.csv') /0/section/bitsel [0:5] /0/section/suffix None /0/section/mode <Mode.INTEGER: 1> /0/section/opcode None /0/section/priority <Priority.NORMAL: 0> /0/ppc /0/ppc/0 /0/ppc/0/opcode /0/ppc/0/opcode/value 00000000000000000000000000000010 /0/ppc/0/opcode/mask 00000000000000000000000000111111 /0/ppc/0/comment 'tdi' /0/ppc/0/flags /0/ppc/0/flags/0 '32b' ... /0/fields {'BF': (6, 7, 8), 'D': (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31), 'FRS': (6, 7, 8, 9, 10), 'FRT': (6, 7, 8, 9, 10), 'L': (10,), 'RA': (11, 12, 13, 14, 15), 'RS': (6, 7, 8, 9, 10), 'RT': (6, 7, 8, 9, 10), 'SI': (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31), 'TO': (6, 7, 8, 9, 10), 'UI': (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)} /0/mdwn /0/mdwn/pcode /0/mdwn/operands {'TO': (<class 'openpower.insndb.core.DynamicOperand'>, {'name': 'TO'}), 'RA': (<class 'openpower.insndb.core.GPROperand'>, {'name': 'RA'}), 'SI': (<class 'openpower.insndb.core.SignedOperand'>, {' name': 'SI'})} /0/svp64 /0/svp64/name 'tdi' /0/svp64/ptype /0/svp64/ptype 1P /0/svp64/etype <SVEType.EXTRA2: 1> /0/svp64/msrc <SVMaskSrc.NO: 0> /0/svp64/in1 <In1Sel.RA: 1> /0/svp64/in2 <In2Sel.NONE: 0> /0/svp64/in3 <In3Sel.NONE: 0> /0/svp64/out <OutSel.NONE: 0> /0/svp64/out2 <OutSel.NONE: 0> /0/svp64/cr_in <CRInSel.NONE: 0> /0/svp64/cr_in2 <CRIn2Sel.NONE: 0> /0/svp64/cr_out <CROutSel.NONE: 0> /0/svp64/extra {0: [], 1: [], 2: [], 3: []} /0/svp64/conditions '' /0/svp64/mode <SVMode.NORMAL: 1>
(In reply to Dmitry Selyutin from comment #109) > OK, take 2: I committed a simplified version of walkers which yield paths > too. But, this time, the path of the parent is not passed to the children > walk algorithm. Each component just yields paths to its children nodes along > with the nodes itself, and passes these to visitors. yep no on paths, the visitor instance should be tracking the path, not the walker. the only thing the walker might also pass is the index (as an integer not a string): enumerate(subnodes) this was ok: https://bugs.libre-soc.org/show_bug.cgi?id=1094#c82 sidenote, treewalker does not need depth, self.depth == len(self.paths)
(In reply to Dmitry Selyutin from comment #111) > Meanwhile made tree more beautiful (truncated output a bit to keep comment > simple, can be reproduced via `pysvp64db tree`): awesome. > /0 > /0/name > 'tdi' so little code, so many possibilities...
(In reply to Luke Kenneth Casson Leighton from comment #112) > (In reply to Dmitry Selyutin from comment #109) > this was ok: https://bugs.libre-soc.org/show_bug.cgi?id=1094#c82 This was OK interface-wise, but our nodes lack the names. If you have an idea how to augment them with name -- I'm totally fine to update it. Until then I see no better choice. Details here: https://bugs.libre-soc.org/show_bug.cgi?id=1094#c106
Switched operands and fields to Dict class (like Dict, but "frozen" and has default walk method). Example output: /6/fields /6/fields/RA (11, 12, 13, 14, 15) /6/fields/RB (16, 17, 18, 19, 20) /6/fields/RC (21, 22, 23, 24, 25) /6/fields/Rc (31,) /6/fields/RT (6, 7, 8, 9, 10) /6/fields/XO (26, 27, 28, 29, 30) /6/mdwn/operands /6/mdwn/operands/RT/class 'GPROperand' /6/mdwn/operands/RA/class 'GPROperand' /6/mdwn/operands/RB/class 'GPROperand' /6/mdwn/operands/RC/class 'GPROperand' /6/mdwn/operands/Rc/class 'StaticOperand' /6/mdwn/operands/Rc/value 0
whoops... $ python3 decoder/isa/test_caller_svp64_fp.py >& /tmp/f ====================================================================== ERROR: test_sv_fpload (__main__.DecoderTestCase) >>> lst = ["sv.lfsx *2, 0, *0" ---------------------------------------------------------------------- Traceback (most recent call last): File "decoder/isa/test_caller_svp64_fp.py", line 23, in test_sv_fpload lst = list(lst) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 61, in __iter__ yield from self.trans File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 114, in translate yield from self.translate_one(insn) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 106, in translate_one arguments=fields, specifiers=opmodes) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3520, in assemble insn = super().assemble(record=record, arguments=arguments) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 1900, in assemble arguments=arguments, operands=cls.dynamic_operands(record=record)) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 930, in __new__ raise ValueError("operands count mismatch") ValueError: operands count mismatch
(In reply to Luke Kenneth Casson Leighton from comment #116) > whoops... > > $ python3 decoder/isa/test_caller_svp64_fp.py >& /tmp/f > commit 963fa4fcadf76740083d9fce3ffe986a81f247fe Author: Dmitry Selyutin <ghostmansd@gmail.com> Date: Sat Jun 10 23:08:15 2023 +0300 insndb/core: switch Fields to Dict class > raise ValueError("operands count mismatch") > ValueError: operands count mismatch
Luke, thank you for your investigation! I'm sorry for missing this ussue, felt myself kinda invincible. :-) I've fixed, this is again the same error I always do: using isinstance instead of issubclass.
(In reply to Dmitry Selyutin from comment #118) > Luke, thank you for your investigation! I'm sorry for missing this ussue, > felt myself kinda invincible. :-) :) > I've fixed, this is again the same error I > always do: using isinstance instead of issubclass. still some work to do: python3 decoder/isa/test_caller_svp64_fp.py >& /tmp/f ====================================================================== ERROR: test_sv_fpmadds (__main__.DecoderTestCase) >>> lst = ["sv.fmadds *6, *2, *4, 8" ---------------------------------------------------------------------- Traceback (most recent call last): File "decoder/isa/test_caller_svp64_fp.py", line 161, in test_sv_fpmadds lst = list(lst) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 61, in __iter__ yield from self.trans File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 114, in translate yield from self.translate_one(insn) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 107, in translate_one yield from insn.disassemble(record=record, style=Style.LEGACY) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3556, in disassemble specifiers = sorted(rm.specifiers(record=record)) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3486, in __getattr__ return getattr(self.rm, key) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3422, in rm search = ((mode << 1) | self.record.Rc) File "/usr/local/lib/python3.7/dist-packages/cached_property.py", line 36, in __get__ value = obj.__dict__[self.func.__name__] = self.func(obj) File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 1125, in Rc return self["Rc"].value File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 1119, in __getitem__ return cls(record=self, **kwargs) TypeError: type object argument after ** must be a mapping, not tuple
ok i put "paths" branch at the former master, reverted some things, then re-patched them back in by hand. nothing is lost, pytest works at commit 37e33e6c5ee1c4e3e97704b5b76bf13c3665470c (HEAD -> master, origin/master) paths branch: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/paths
ok i have some (small) changes that got mixed up with the broken-ness: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=37e33e6c5 reduce down one more line in walkmethod.__get__ https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=bcab1f6 lambda is a singleton (hashable) therefore can be safely set as the default option to a function parameter https://bugs.libre-soc.org/show_bug.cgi?id=1094#c89 https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=67a3df94 reduce number of lines slightly by using yield from filter(match, ...) with this being so low-level / critical, now, and i think things have stabilised, let's move to the usual "branch-and-test-and-review-and-rebase"?
oh one thing: path needs renaming to nodepath. otherwise it gets utterly confused with "import path". likewise all python keywords need avoiding ("type=..." and "id=...." especially) i like short as well but short-and-conflicting is extremely bad for all sorts of reasons.
(In reply to Luke Kenneth Casson Leighton from comment #122) > oh one thing: path needs renaming to nodepath. otherwise it gets utterly > confused with "import path". > > likewise all python keywords need avoiding ("type=..." and "id=...." > especially) > > i like short as well but short-and-conflicting is extremely bad for all > sorts of reasons. We already had this debate. No it doesn't. Path is short and sweet, and Python developers use all keywords and names, it's up to the caller to differ our path and somebody else's. As I said, upon using practically any word there's a chance of having a collision with Python builtins or modules or whatever. EWONTFIX.
(In reply to Dmitry Selyutin from comment #123) > We already had this debate. No it doesn't. Path is short and sweet, and > Python developers use all keywords and names, dmitry: sorry to have to do this: as the *project leader* responsible for ensuring that the code developed is maintainable, and having over 20 years experience in writing and maintaining large software projects, i am setting a *project policy* that the use of python keywords and python modules as local variable-names, class names or local modules, is prohibited. i will update the developer guide to reflect this, over the next couple of days, as i have an international guest staying for a week and need to leave for the airport in 5 hours.
Ok, if we're choosing this path, then you need to reflect the exact way how to check that we don't have a collision. Then, whenever we update Python version, we need to check whether the entire code base matches the criteria you established. Then, obviously, you have to update all code (obviously including yours) according to the mentioned high standards. As project maintainer, you absolutely have the will. The way you dictate your will restricts other's willingness to contribute to the project. I hope that 20 years expertise can address this aspect too.
Updated the task accordingly.
Fixed the error in paths branch; will wait for CI to see whether something else is broken too.
Tests are completed, same failures as in master. https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4299105
(In reply to Dmitry Selyutin from comment #128) > Tests are completed, same failures as in master. > https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4299105 thank you dmitry. confirmed ran locally here, all good. branch can rebase fine. so sorry about my extreme-limited communication.
Please check the task changes, they are pretty self-explanatory. It is impossible to develop efficiently when I constantly have a spoke in a wheel. Each and every technical solution is met with extremely negative attitude, dislike, distaste. You basically force me to implement any recent code according to your personal preferences. Arguments and rationale beyond my choices are ignored or met with "I'm maintainer" argument. If this is the path you choose to follow — you can end up maintaining the code with only one developer. Not a single line of code until there's a formal document with valid technical rationale explaining why and how the choice is done. Yes this document should be discussed and accepted until this is enforced. If the rationale is "I'm the maintainer with 20 years of experience", then, please, don't even bother writing this document. If you treat persons around the project as a team, this should be discussed and accepted by team members. I'm really fed up with this nonsense. So much energy down the drain.
https://libre-soc.org/irclog/%23libre-soc.2023-06-12.log.html#t2023-06-12T12:07:42 Two ideas: 1. Paths or indices can be nodes as well, and be visitable. This allows to solve the issue on how and what to pass to visitor method. 2. We should support loose matches for visitor handlers, likely traversing MRO. The visitor below handles any Dict in a generic way but has a special logic for Operands: class MagicVisitor(Visitor): @visitmethod(Operands) def Operands(self, node): print("Oh and this is Operands node!", node) yield node @visitmethod(Dict) def Dict(self, node): print(node) yield node And basically the default __call__ becomes @visitormethod(object)!
I liked the idea so much that I decided to play more with it. I think I finally found the way to handle everything in a graceful way. Perhaps eventually I'll provide a standalone module, maybe even on PyPI. As this is a demo and as I remember these debates about underscores and other extremely important stuff, I took a liberty to write this code as I like. Since our project does not welcome "as I like" approach, I posted it to anonymous repository at repo.or.cz. Keep in mind that's just a concept; it's unlikely this will satisfy personal preferences unless modified. https://repo.or.cz/pydispatch.git Example: from pydispatch import visitor from pydispatch import walker class SpecialDict(dict): pass class CustomWalker(walker.Walker): @walker.hook(dict) def dict(self, instance): for (key, value) in instance.items(): yield (value, key) yield from self((value, key)) class CustomVisitor(visitor.Visitor): @visitor.hook(object) def object(self, instance): print("enter") yield instance print("leave") visitor = CustomVisitor() walker = CustomWalker() for item in walker(SpecialDict({"a": 1, "b": 2})): with visitor(item) as node: print("hook", repr(item))
The output: enter hook (1, 'a') leave enter hook 1 leave enter hook 'a' leave enter hook (2, 'b') leave enter hook 2 leave enter hook 'b' leave
Again, this is a concept. I'm simply tired of all this discussion and needed some sort of distraction. This concept demonstrates several aspects: 1. The key entity is Dispatcher class. It's used to wrap methods so that the call is dispatched based on the argument type. 2. Two classes, Visitor and Walker, inherit the Dispatcher (with some magic involved with metaclasses). Walker describes the way the objects are traversed, the visitor just enables the context managers. 3. The code supports loose type matches. In the example above, Dict inherits dict, but is caught loosely by CustomWalker handler for dict.
What I like here: 1. There's no need to update classes with some magic methods. If the walker's handler is registered, this will be caught. 2. It's loosely typed; inheritance is taken into account. 3. This has no problems with builtins and their "walkability". 4. The use is dead-simple. What I don't like: 1. I don't know if it's possible to make super() work with it (that is, call the superclass handler). Not sure if we need it, though: common things can be moved to usual methods or functions, and these can be called in hooks. 2. Perhaps this is more complex than it should've been. I tried to make it generic, though, and somewhat flexible.
(In reply to Dmitry Selyutin from comment #130) > You basically force me to implement any recent code > according to your personal preferences. dmitry: they really aren't. i am the one who has to think through the impliciations "when these few lines of code are integrated into the remaining 200,000+ lines of code that i am responsible for, how will that go? what problems will arise?" which is *very* different from the (small, focussed) task that you are doing, which at the moment is something that you need to be enjoying (and *not* worrying about the bigger picture, which is my responsibility). now, with the insndb before it was integrated deeply into the rest of the code, there was no problem: it was self-contained. but now it is becoming an absolutely critical part of the entire codebase - through the Walker API - i now have to think of the ramifications. so i have a potential solution / proposition: if *i* make any minor corrections (adding comments, renaming of variables), which are superficial but clearly project-management-related, you don't object to them. if we *do* have some disagreement - if there is something that you genuinely don't like *please speak up*, and we move it to a branch and discuss further. i already put in comments into insndb.py before the restructuring (over the past few months) and you didn't object, so i know you're fine with me doing that. this way, you are satisfied by being able to design something you love doing, and i can also make sure that what you are doing fits with the wider project for which *i* am responsible (not you). how does that sound?
I already told my opinion. If you have some certain criteria on naming, use of particular constructs, and want to permit or ban some techniques — this should be documented and, if you respect the team, at least be brought to discussion (the form "I'm responsible and I have 20 years of expertise" certainly lacks the discussion bits). As I said: the problem is not in particular decisions per se, the problem resides in lack of technical rationale behind. I don't ask to document each and every aspect of using Python or any other programming language, that'd be rather impossible. But those cases where we had disagreement must be documented. Stating that you are responsible explains nothing, it lacks the rationale and certain criteria. The current communication format just converts the development into quite a stressing process. I don't even know how to write the code anymore, because you keep interfering and, especially, make confusing statements which in the form they are done contradict to each other. I'm already afraid to make any following step, because you already pushed the situation to degree where I simply think that any code I publish will be rejected. Many projects start with coding style and certain guidelines. I think it's the good for us to have those. Not that I think it's the right time. But you told in many similar situations that you don't have time to address these issues yet, and I begin to worry whether you're going to address them at all.
Again: I'm OK with the strategy, but I need certain criteria and rationale. Again, I'd rather prefer discussing all these questions.
(In reply to Dmitry Selyutin from comment #137) > make confusing statements which in the > form they are done contradict to each other. yes. i have a combination of "writer's block", a form of dyslexia, *and* Asperger's. it means i can *recognise* something when i see (or hear it) but can in no way properly *express* it. often it takes over three weeks to clearly answer just one "simple" question that anyone else would find it perfectly reasonable to expect an answer, right now, now, now. overall going back 30 years it has been terribly frustrating for me, and very difficult for everyone else i have ever worked with or for. but also there is this: it isn't "me being difficult", the entire project is difficult - a level so far above standard computer science that only last week Mitch Alsup on comp.arch said "it is a pity that you have to study an entire lifetime before understanding enough to design a processor, and by then you are already at retirement age". i am barely keeping up holding all the knowledge in my head (hence the "writer's block"), and to expect every member of the team to do the same, in order to be able to make properly-informed project management and development decisions, is completely unreasonable. but forget all of that: it's my responsibility to deal with. my primary goal in is making sure that you have things that you can enjoy doing, that are also self-contained tasks. there's actually one that's high-priority but should be very short (bear in mind that notes for an "internet-blog" writeup of the algorithm is part of the task) https://bugs.libre-soc.org/show_bug.cgi?id=1085 and another one: https://bugs.libre-soc.org/show_bug.cgi?id=1044 and a third one that would be useful as a standard vector demo: daxpy https://bugs.libre-soc.org/show_bug.cgi?id=953 all of those should only be between 8-12 lines of SVP64 assembler. there are now three of us (me, jacob and konstantinos/markos) who can help with questions, if you're interested?
(In reply to Luke Kenneth Casson Leighton from comment #139) > (In reply to Dmitry Selyutin from comment #137) > > overall going back 30 years it has been terribly > frustrating for me, and very difficult for everyone else i have ever > worked with or for. I've seen many people, including, but not limited to people who develop, and believe me, you're far away from being an unbearable and difficult person. My particular issue is that I'm somewhat technology-centric; I can be annoyed with some technical solutions rather than with persons who stand beyond them. > but also there is this: it isn't "me being difficult", the entire > project is difficult - a level so far above standard computer science > that only last week Mitch Alsup on comp.arch said "it is a pity that > you have to study an entire lifetime before understanding enough to > design a processor, and by then you are already at retirement age". Couldn't agree more. This statement can be applied to many areas, though. > but forget all of that: it's my responsibility to deal with. It might be your responsibility, but you don't have to suffer and deal with it *alone*. Despite that sometimes I express things in a quite irritating and and annoying manner, like all of us, I want to do the best and want to help. I hope that the whole fact that I raised this whole discussion already clearly demonstrates how much I care about the project and the team. And that "the team" I mention obviously includes you. > my primary goal in is making sure that you have things that you can > enjoy doing, that are also self-contained tasks. > there are now three of us (me, jacob and konstantinos/markos) who can > help with questions, if you're interested? I like this task, actually, and would like to complete it. I have problems with some technical aspects, but the task is nice and lovely, despite how many debates it involves. Also I wouldn't say I'm a suitable person for math-related algorithms (you do remember that I'm a philologist, right?). Most of all I like architecture-related tasks (architecture in a broad sense, like designing APIs and data structures relationship). Perhaps the whole problem is that, since I like designing architectures, I treat some moments in a bit personal way, and our definitions of a good architecture differ, so this naturally leads to debates. That's why I suggested to establish some common understanding in a form of documentation (you didn't address this yet, if I'm not confused).
I had to change the name for pydispatch project, since this name is already occupied in PyPI. Renamed the project to mdis, and published it to PyPI. That's the second thing I published to PyPI, so this might be done wrong. https://repo.or.cz/mdis.git Again, to clarify: this is a self-contained implementation, and for our needs it perhaps mostly serves as an illustration. I don't insist we should use this module, I can duplicate its functionality, or make some excerpts of it. I like the approach, though, and I think I'll try maintaining and developing this. You might look at it as "what Dmitry thinks is fine". I haven't yet addressed some of Jacobs comments and suggestions, though.
dmitry, hi, my guest is going to be here until at least the weekend, i can only be brief - much briefer than i want to be. 1) i have enough money from the OPF grant. 2) API is great, my suggestion replace "path" with "nodeid" 3) separating as an independent repo, my suggestion make Node an optional argument. def visit(.... , basetreeclass=Node) if isinstance(node, basetreeclass): ... 4) prior to separation: my suggestion create a file named visitor.py in insndb/ all i can manage at the moment, will be free next week.
Meanwhile I've updated the module, incorporating suggestions from Jacob as well. Changes: 1. From now on we keep the original names for the functions and no longer bother constructing the new ones on-the-fly. This actually proved to be difficult, because I had to reconsider how and where keep the established hooks. 2. There is no explicit separation between call hooks and typeid hooks, corresponding to different decoration stages. Since both need class information, we construct the call hook dynamically, and have only one class: Hook. This type migrated into dispatcher module. 3. If there's a conflict in hooks, that is, a user attempted to hook the same type several times, the exception will show all conflicting functions. 4. Dropped the "wrapper" argument. As Jacob suggested, if somebody needs context managers, they should add these explicitly. 5. The representation for hooks is greatly simplified, this affects the help text too. The example for a CustomWalker installing hook for dict (irrelevant details snipped): class CustomWalker(mdis.walker.Walker) | Data and other attributes defined here: | dispatch_dict = <dict> | | Data and other attributes inherited from mdis.walker.Walker: | dispatch_mapping = <dict> | dispatch_object = <object> | dispatch_sequence = <tuple, list, set, frozenset> Note that the example above had a method called dispatch_mapping at dict; but instead the top-level method will be called due to MRO. Again, until we decide on how to store the module, I published it to PyPI and repo.or.cz (the reply from Luke considered an outdated state of art). I tend to think we might have a standalone repository here, but at the same time have this module in PyPI. On the other hand, the users would probably like something more HW-agnostic; but please not Github (I still have an account there, but would like something more liberal and open). https://pypi.org/project/mdis https://repo.or.cz/mdis.git
(In reply to Dmitry Selyutin from comment #143) > Again, until we decide on how to store the module, I published it to PyPI > and repo.or.cz (the reply from Luke considered an outdated state of art). the entire project has to be self-sufficient. i've created mdis on gitolite3.
Luke, it seems there was something strange with commits: "initial implementation" commit was repeated twice. That's clearly wrong. I've done a force push for everything, now everything is fine.
I've updated the project description to reflect our git and our bug tracker, and also incorporated some minor updates (introduced a base class for visitor which has its object hook implemented in terms of context managers). A new version is released. I'm wondering what'd be the right way to use this module in our project. Should I simply update setup.py install_requires? If so, should I just use mdis (since we're unique on PyPI), or I need to follow the way it's done in cached_property?
Roadmap is: 1. Update setup.py and install the module. Perhaps we should install it with devscripts? 2. Deprecate Node, walk() and similar stuff. 3. Switch db.py to new approach. Once it's done, should we consider this task closed, and switch to task for implementing binutils in visitor API?
I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll add it too. I think this will be a bit trickier to make it generic, though, since dataclasses do not inherit from some special class; we, however, do.
(In reply to Dmitry Selyutin from comment #148) > I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll > add it too. I think this will be a bit trickier to make it generic, though, > since dataclasses do not inherit from some special class; we, however, do. it'll need dynamic detection: i'm fairly certain you can do better than isinstance(dataclass) and ha, probably by registering with @thing just like @register(object) i suspect @register(dataclass) would work just as well oh btw one thing: i do strongly suggest removing all "import x as _x", it is non-standard practice across the entire python community. i recommend using the standard python practice established for over 20 years of using "__all__" instead. aside from anything it reduces the number of words and symbols, and reduces complexity.
(In reply to Dmitry Selyutin from comment #148) > I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll > add it too. I think this will be a bit trickier to make it generic, though, > since dataclasses do not inherit from some special class; we, however, do. just use: https://docs.python.org/3.7/library/dataclasses.html#dataclasses.is_dataclass and then call dataclasses.fields since that avoids needing to raise a typeerror, which is kinda slow iirc also you can check for plain_data instances (feel free to add a plain_data.is_plain_data() function based off plain_data.fields(), since creating exceptions is kinda slow): https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/plain_data.py;h=7bde6ba2f27ff3786beab9c036019337e36bbfe7;hb=4bf2f20bddc057df1597d14e0b990c0b9bdeb10e#l298 nmutil should probably be an optional import: try: # renamed to not conflict with dataclasses from nmutil.plain_data import is_plain_data as _is_plain_data, fields as _plain_data_fields except ImportError: def _is_plain_data(v): return False # _plain_data_fields not needed since _is_plain_data always returns False
(In reply to Luke Kenneth Casson Leighton from comment #149) > (In reply to Dmitry Selyutin from comment #148) > > I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll > > add it too. I think this will be a bit trickier to make it generic, though, > > since dataclasses do not inherit from some special class; we, however, do. > > it'll need dynamic detection: i'm fairly certain you can do better than > isinstance(dataclass) and ha, probably by registering with @thing just > like @register(object) i suspect @register(dataclass) would work just as > well there is no dataclass class
Sorry, typing from phone. I know that we can use isdataclass in Hook(object). What I was trying to express that we cannot do Hook(dataclass) because there's no dataclass class. However, there is Dataclass class in our code, so we can just hook this one: all our dataclasses inherit it. Jacob, sorry, I'm not sure why you brought nmutil. I assumed we don't need nmutil in decoupled mdis module. I imagined something like this (pay attention to file name): $ cat insndb/db.py class Walker(mdis.walker.Walker) @mdis.dispatcher.Hook(core.Dataclass) def dispatch_dataclass(self, instance): for field in dataclasses.fields(instance): yield (field.name, getattr(instance, field.name)) Same for similar cases, if that's what you mean by bringing nmutil. As for underscores — OK, I'll remove underscores.
Ah I think I got what you mean by nmutil. No, there's no place for nmutil in generic code, and never will. This is too specific. It's like bringing special support for PyQt to generic code. Hard no. If some code using nmutil needs this — they can do it in nmutil-related module, perhaps by providing their base walker or visitor with their own hooks twists.
Hm. Luke, I re-read your comment, and I think I now understand what you mean. You likely mean that we can detect the dataclasses.dataclass function itself... Yes, I think this can be arranged, perhaps just via dataclasses.isdataclass instead. I'm a bit concerned that this kinda contradicts to the other code — passing classes and iterating over MRO. But I'll check whether these two options can coexist in a graceful way. Perhaps we can have a fork at isinstance(thing, type) and is callable(thing). I'll check it, I like that it allows some flexibility. What I mean is: @dispatcher.Hook(dataclasses.isdataclass) def dispatch_dataclass(...): whatever() If the Hook is capable to handle callables, that'd work too. Just not dataclasses.dataclass, because this one just wraps the class and I'd like to the code to be somewhat generic and not tied to special constructs (except for builtins perhaps; these are fine). I'll investigate the ideas with callables. I think it's doable.
Now that I think about it, any hook can be expressed via callable, it's just a bit more verbose: @mdis.dispatcher.Hook(lambda typeid: issubclass(typeid, object)) def dispatch_mapping(self, instance): pass I think we can leave the short form, but internally we can implement everything as callables. Holy cow that's really powerful!
(In reply to Dmitry Selyutin from comment #155) > I think we can leave the short form, but internally we can implement > everything as callables. Holy cow that's really powerful! Nah, implementing everything as callables loses one benefit: we can have hash lookup with types, with callables we're limited to linear search. It's not that it's critical (who's crazy to establish thousands of hooks?), but it hurts the beauty of the code. I'm currently checking something more graceful.
Funny. Classes are callables. Thing turns out to be simpler.
(In reply to Dmitry Selyutin from comment #156) > (In reply to Dmitry Selyutin from comment #155) > > I think we can leave the short form, but internally we can implement > > everything as callables. Holy cow that's really powerful! > > > Nah, implementing everything as callables loses one benefit: we can have > hash lookup with types, with callables we're limited to linear search. It's > not that it's critical (who's crazy to establish thousands of hooks?), but > it hurts the beauty of the code. I'm currently checking something more > graceful. if you're worried about speed, you could cache lookups -- something like: class Visitor: def __init__(self): self.__cache = {} def run_hook(self, v): t = v if isinstance(v, type) else type(v) f = self.__cache.get(t) if f is None: for name, check in self.hooks(): if check(t): self.__cache[t] = f = getattr(self, name) break if f is None: raise TypeError("no hooks for type", t) return f(v)
(In reply to Dmitry Selyutin from comment #157) > Funny. Classes are callables. Thing turns out to be simpler. except calling classes doesn't do what we want here, which is to check if the passed in value is an instance of the class, instead of attempting to construct a new instance of that class.
import dataclasses import mdis.dispatcher import mdis.walker class CustomWalker(mdis.walker.Walker): @mdis.dispatcher.Hook(dataclasses.is_dataclass) def dispatch_dataclass(self, instance): for field in dataclasses.fields(instance): key = field.name value = getattr(instance, key) yield (key, value) yield from self((key, value)) @dataclasses.dataclass class Thing: a: int b: str walker = CustomWalker() thing = Thing(a=1, b="cocojamboo") for item in walker(thing): print(repr(item)) ('a', 1) 'a' 1 ('b', 'cocojamboo') 'b' 'cocojamboo'
(In reply to Jacob Lifshay from comment #159) > (In reply to Dmitry Selyutin from comment #157) > > Funny. Classes are callables. Thing turns out to be simpler. > > except calling classes doesn't do what we want here, which is to check if > the passed in value is an instance of the class, instead of attempting to > construct a new instance of that class. Jacob, of course I know that. My point is that I can just check callable upon Hook instantiation and raise a ValueError if it's not callable.
(In reply to Dmitry Selyutin from comment #160) > @mdis.dispatcher.Hook(dataclasses.is_dataclass) > def dispatch_dataclass(self, instance): > for field in dataclasses.fields(instance): > key = field.name > value = getattr(instance, key) > yield (key, value) > yield from self((key, value)) After I thought about it for a while, I think I'm keeping this at base Walker level. dataclasses is a standard module, hell, let's not force users to invent that again.
(In reply to Dmitry Selyutin from comment #161) > (In reply to Jacob Lifshay from comment #159) > > (In reply to Dmitry Selyutin from comment #157) > > > Funny. Classes are callables. Thing turns out to be simpler. > > > > except calling classes doesn't do what we want here, which is to check if > > the passed in value is an instance of the class, instead of attempting to > > construct a new instance of that class. > > Jacob, of course I know that. yeah, people occasionally forget. > My point is that I can just check callable > upon Hook instantiation and raise a ValueError if it's not callable. that works, though it may be better to do: def hook(cond): if isinstance(cond, type): def cond(ty, __class=cond): return issubclass(ty, __class) # edit: fix issubclass name if not callable(cond): raise TypeError ...
OK here's yet another set of updates (therefore the PyPI version is bumped): 1. The hook can be injected as a callable which gets type. First we try to lookup the typeid as is, then we patiently iterate over stuff which looks like callable. 2. We now can walk over dataclasses, yay! Implemented in terms of @dispatcher.Hook(dataclasses.is_dataclass) in walker.Walker. 3. Underscores were replaced with __all__, this adds +5 points to charisma. 4. The calls to dispatch are cached via functools.lru_cache.
Next steps are: 1. Update setup.py with mdis. Again, should we use the same form as with cached_property, with explcit git, just to be safe? I'm thinking of explicit git because it also allows to get the latest changes. 2. Drop Node, Dataclass, classmethod walk() and similar stuff. The new approach seem to cover anything we currently need without relying on inheritance. 3. Switch db.py to mdis module. Perhaps I'll also add few commands for pysvp64db (this is nice for debugging AND providing useful information simulaneously).
Oh, by the way, I just recalled that tuples et al. simply yield stuff from collection upon walking by default. class Walker(dispatcher.Dispatcher, metaclass=WalkerMeta): @dispatcher.Hook(tuple, list, set, frozenset) def dispatch_sequence(self, instance): for item in instance: yield item yield from self(item) dicts and dataclasses, on the other hand, yield key too. Should the dispatcher look like this instead? @dispatcher.Hook(tuple, list, set, frozenset) def dispatch_sequence(self, instance): for (index, item) in enumerate(instance): yield (index, item) yield from self(item) Also, whilst we're at it, we could perhaps support namedtuple default hook (yielding keys too I think). I'd like to listen to your opinion, folks, on these two.
(In reply to Dmitry Selyutin from comment #166) > dicts and dataclasses, dataclasses/namedtuples should just visit the field values and not tuples of name/value combinations (except for path tracking) imho dicts can recursively walk key/value pairs or just values -- at the choice of the visitor/walker?
(In reply to Jacob Lifshay from comment #167) > (In reply to Dmitry Selyutin from comment #166) > > dicts and dataclasses, > > dataclasses/namedtuples should just visit the field values and not tuples of > name/value combinations (except for path tracking) > > imho dicts can recursively walk key/value pairs or just values -- at the > choice of the visitor/walker? Yielding name along with the value is exactly path tracking. Plus, we iterate over collections, and just values do not really resemble dataclass in the same way just keys or just values don't resemble dict. This is not the same with set/list/similar. You don't really need an index to construct the collection. That said, I've been thinking of yielding index exactly for path tracking (though this is of no use with sets).
(In reply to Dmitry Selyutin from comment #168) > Yielding name along with the value is exactly path tracking. Plus, we > iterate over collections, and just values do not really resemble dataclass > in the same way just keys or just values don't resemble dict. i guess my point is that for dicts you'd sometimes want to recursively walk inside the keys too, not just inside values. this is unlike dataclasses, namedtuples, standard python classes, etc.
(In reply to Jacob Lifshay from comment #169) > (In reply to Dmitry Selyutin from comment #168) > > Yielding name along with the value is exactly path tracking. Plus, we > > iterate over collections, and just values do not really resemble dataclass > > in the same way just keys or just values don't resemble dict. > > i guess my point is that for dicts you'd sometimes want to recursively walk > inside the keys too, not just inside values. this is unlike dataclasses, > namedtuples, standard python classes, etc. I'd not say so. From perspective of visiting the stuff, for example, printing the hierarchy, these are not too different to me. If we don't yield key names, the only other option is to yield something else, e.g. index, which can be mapped back. And inventing this backward mapping is, well, quite tedious to handle in all dataclasses. So I don't think that we should drop this path which is obtained just for granted. My point is, that we should do it for tuples too, because index is actually a path here.
(In reply to Dmitry Selyutin from comment #170) i'm imagining that visiting {("a", 1): 3.4, 2: {5, 6}} would do the following visitor calls, more or less: with visitor.visit_dict(item={("a", 1): 3.4, 2: {5, 6}}, path=[]): with visitor.visit_dict_entry(item=dict_entry(("a", 1), 3.4), path=[("a", 1)]): with visitor.visit_tuple(item=("a", 1), path=[("a", 1), 0]): with visitor.visit_str(item="a", path=[("a", 1), 0, 0]): pass with visitor.visit_str(item=1, path=[("a", 1), 0, 1]): pass with visitor.visit_float(item=3.4, path=[("a", 1), 1]): pass with visitor.visit_dict_entry(item=dict_entry(2, {5, 6}), path=[2]): with visitor.visit_int(item=2, path=[2, 0]): pass with visitor.visit_set(item={5, 6}, path=[2, 1]): with visitor.visit_int(item=5, path=[2, 1, 5]): pass with visitor.visit_int(item=6, path=[2, 1, 6]): pass
@dataclass() class A: a: int b: "list[int]" visiting {3: A(5, [12, 3])} would do: with visitor.visit_dict(item={3: A(5, [12, 3])}, path=[]): with visitor.visit_dict_entry(item=dict_entry(3, A(5, [12, 3])), path=[3]): with visitor.visit_int(item=3, path=[3, 0]): pass with visitor.visit_A(item=A(5, [12, 3]), path=[3, 1]): # edit: fix typo with visitor.visit_int(item=5, path=[3, 1, "a"]): pass with visitor.visit_list(item=[12, 3], path=[3, 1, "b"]): with visitor.visit_int(item=12, path=[3, 1, "b", 0]): pass with visitor.visit_int(item=3, path=[3, 1, "b", 1]): pass
Jacob, I'm having difficulties reading or understanding this massive cascade of with statements. But I suspect that you misinterpret how the API is used, because the API doesn't need to explicitly mark what's visited, hooks do it. Could you please check this README and adopt the code so that I could better understand the idea? https://git.libre-soc.org/?p=mdis.git;a=blob;f=README.md;h=aa540581aa598fcb34d91aaff4234c2213e3413e;hb=HEAD
(In reply to Jacob Lifshay from comment #172) > with visitor.visit_int(item=3, path=[3, 0]): > pass > with visitor.visit_A(item=A(5, [12, 3]), path=[3, 1]): # edit: fix typo > with visitor.visit_int(item=5, path=[3, 1, "a"]):
(In reply to Dmitry Selyutin from comment #173) > Jacob, I'm having difficulties reading or understanding this massive cascade > of with statements. i was listing the different visitor functions that would be automatically called (with assumed @hook decorations) by the walker when walking the given python value, so you can see what i think should be recursively visited and what should only go in the path. basically, an execution trace. > But I suspect that you misinterpret how the API is used, > because the API doesn't need to explicitly mark what's visited, hooks do it. > Could you please check this README and adopt the code so that I could better > understand the idea? i'm not saying that's how a user will call it (it would be too complex of an API if that were how users had to call it), but what functions with argument values will be called by the walker automatically. > https://git.libre-soc.org/?p=mdis.git;a=blob;f=README.md; > h=aa540581aa598fcb34d91aaff4234c2213e3413e;hb=HEAD
If the method just prints the value with the relevant indent, what'd be the output?
(In reply to Dmitry Selyutin from comment #166) > dicts and dataclasses, on the other hand, yield key too. Should the > dispatcher look like this instead? > > @dispatcher.Hook(tuple, list, set, frozenset) > def dispatch_sequence(self, instance): > for (index, item) in enumerate(instance): > yield (index, item) > yield from self(item) like it. corresponding dispatch.py change: 84 def __call__(self, instance, *args): 85 for typeid in instance.__class__.__mro__: 86 hook = self.__class__.dispatch(typeid=typeid) 87 if hook is not None: 88 break 89 if hook is None: 90 hook = self.__class__.dispatch() 91 return hook(dispatcher=self, instance=instance, *args) which will need this instead: @dispatcher.Hook(tuple, list, set, frozenset) def dispatch_sequence(self, instance): for (index, item) in enumerate(instance): --> yield (item, index) <-- swapped: item first yield from self(item) > Also, whilst we're at it, we could perhaps support namedtuple default hook awesome!
and (in walker.py) 22 @dispatcher.Hook(dict) 23 def dispatch_mapping(self, instance): 24 for (key, value) in instance.items(): 25 yield (value, key) <--- value (instance) comes first 26 yield from self((key, value)) value comes first so that it is picked up as instance by dispatch.__call__ 84 def __call__(self, instance, *args):
(In reply to Dmitry Selyutin from comment #176) > If the method just prints the value with the relevant indent, what'd be the > output? ok, i'll have it print before the contextmanager yield. i'll include the path too, so you can see all the relevant inputs. (In reply to Jacob Lifshay from comment #171) > (In reply to Dmitry Selyutin from comment #170) > > i'm imagining that visiting {("a", 1): 3.4, 2: {5, 6}} would do the > following visitor calls, more or less: {("a", 1): 3.4, 2: {5, 6}} path=[] dict_entry(("a", 1), 3.4) path=[("a", 1)] ("a", 1) path=[("a", 1), 0] "a" path=[("a", 1), 0, 0] 1 path=[("a", 1), 0, 1] 3.4 path=[("a", 1), 1] dict_entry(2, {5, 6}) path=[2] 2 path=[2, 0] {5, 6} path=[2, 1] 5 path=[2, 1, 5] 6 path=[2, 1, 6] (In reply to Jacob Lifshay from comment #172) > @dataclass() > class A: > a: int > b: "list[int]" > > visiting {3: A(5, [12, 3])} would do: {3: A(5, [12, 3])} path=[] dict_entry(3, A(5, [12, 3])) path=[3] 3 path=[3, 0] A(5, [12, 3]) path=[3, 1] 5 path=[3, 1, "a"] [12, 3] path=[3, 1, "b"] 12 path=[3, 1, "b", 0] 3 path=[3, 1, "b", 1]
(In reply to Luke Kenneth Casson Leighton from comment #178) > and (in walker.py) > > 22 @dispatcher.Hook(dict) > 23 def dispatch_mapping(self, instance): > 24 for (key, value) in instance.items(): > 25 yield (value, key) <--- value (instance) comes first > 26 yield from self((key, value)) > > value comes first so that it is picked up as instance by dispatch.__call__ > > 84 def __call__(self, instance, *args): from comment #177 (!!) if the above is not done (passing instance 1st and *args which can accept a non-argument - rather than a mandatory argument defaulting to e.g. None - you have to do this awful/broken-ness: 83 class Dispatcher(metaclass=DispatcherMeta): 84 def __call__(self, instance): if isinstance(instance, tuple): # extract the index (instance, index) = instane else index = errr some value indicating "non-iterable leaf node" 85 for typeid in instance.__class__.__mro__: 86 hook = self.__class__.dispatch(typeid=typeid) 87 if hook is not None: which doesn't work because there is no way to discern an *actual* tuple from the "case where passing a tuple of (instance, index) whatever is passed to Dispatcher *has* to be orthogonal, but also *has* to be able to discern "iterable" from "non-iterable". None as a path *does not do this*, jacob: {3: A(5, [12, 3])} path=[] no that is not okay (path=[]). ints do not have and do not need paths. there are two solutions i can think of: 1) instance, *args - and for non-iterables *args would be empty 2) always yield a (new) "Instance" object, which contains a mandatory "instance" and an **OPTIONAL** index (which you call "path" jacob) i greatly prefer (1) because (2) forces walkers to always have "node.instance"
I like option 1, because it's short and sweet, and users can just declare method as `method(self, instance, *_)`. Also, yielding some object which wraps the original instance looks strange, because call gets not what was actually passed, but something else.
(In reply to Luke Kenneth Casson Leighton from comment #180) > {3: A(5, [12, 3])} path=[] > > no that is not okay (path=[]). ints do not have and do not need paths. you misunderstood this: path is the list of all indexes/keys/field-names needed to get from the root object to the current value, here, we're visiting the root object (not an int, it's a dict) so obviously no indexes/keys need to accessed to get from the root object to the root object, hence why path=[]. and, yes, ints *need* paths, so you can tell which int you're referring to, the path is not some intrinsic property of the int, but instead tells you how to get to the int from the root object. this is important for things like e.g. SelectableInt(123, 45) (or csv row dicts), where it will visit both the 123 and the 45 and the path is *essential* to not confuse bit-width with value. > > there are two solutions i can think of: > > 1) instance, *args - and for non-iterables *args would be empty > 2) always yield a (new) "Instance" object, which contains a > mandatory "instance" and an **OPTIONAL** index (which you > call "path" jacob) no, path is the full list of indexes/keys/field-names needed to get from the root object to the current value. it is important that nothing is skipped in the path, so you don't get 2 different values with the same path and can't easily distinguish which one does what, hence why it includes dict/set keys when visiting some entry in the dict/set and field names when visiting a python class. note that the same list can have items pushed/popped from the end -- treating it as a stack, so visiting stuff doesn't need to allocate a new list every function call
The last comment leverages my fate, since I actually wanted to write quite similar questions. But Jacob was first. :-) I have difficulties to understand how we reconstruct the object path unless indices and keys are not part of yield directives.
Ah, yes, to clarify: yielding paths applies to walker only. But the visitor should have *args, **kwargs too in its call.
Let's consider the example below: import contextlib import mdis.dispatcher import mdis.visitor import mdis.walker class Walker(mdis.walker.Walker): pass class ContextVisitor(mdis.visitor.ContextVisitor): @mdis.dispatcher.Hook(object) @contextlib.contextmanager def dispatch_object(self, instance, *, path): print(path) print(f" {instance}") yield instance walker = Walker() visitor = ContextVisitor() items = ( {("a", 1): 3.4, 2: {5, 6}}, ) for (item, path) in walker(items): with visitor(item, path=path): pass With the recent update (not yet pushed), here's what I get: [0] {('a', 1): 3.4, 2: {5, 6}} [0, ('a', 1)] 3.4 [0, 2] {5, 6} [0, 2, 5] 5 [0, 2, 6] 6 Jacob, that's less than you posted, because I didn't handle the dict_items specifically. This seems excessive, since dict_items is not actually accessible via items[0][0]. And, to be honest, yielding the element itself in set/frozenset doesn't look valid either; None perhaps? But None in the middle is strange too.
^^^ Updated the code and printouts
I think sets and frozensets paths are simply hash(item). That is: @dispatcher.Hook(set, frozenset) def dispatch_unordered_sequence(self, instance, path=[]): for item in instance: subpath = (path + [hash(item)]) yield (item, subpath) yield from self(item, path=subpath)
Oh, by the way, just so this is not missed: combining paths like this gives us depth for granted.
With a minor tweak in the visitor which accepts an additional argument: class Visitor(mdis.visitor.ContextVisitor): @mdis.dispatcher.Hook(object) @contextlib.contextmanager def dispatch_object(self, instance, *, path=[]): indent = (" " * (len(path) * 4)) path = "".join(map(lambda part: f"[{part}]", path)) print(indent, path, "=>", instance) yield instance items = ( {("a", 1): 3.4, 2: {5, 6}}, ) walker = Walker() visitor = Visitor() for (item, path) in walker(items): with visitor(item, path=path): pass The output is: [0] => {('a', 1): 3.4, 2: {5, 6}} [0][('a', 1)] => 3.4 [0][2] => {5, 6} [0][2][5] => 5 [0][2][6] => 6 Notice that path is just an additional argument to visitor __call__. Anything can be here, it's up to caller. Paths are obtained from walker, not visitor.
(In reply to Dmitry Selyutin from comment #187) > I think sets and frozensets paths are simply hash(item). That is: I strongly disagree because hash(item) can't be used to select the proper item -- both because it's a pain to access a set entry via hash and because the hash may not be unique -- you can have a set {a, b} with len == 2 and hash(a) == hash(b)
(In reply to Dmitry Selyutin from comment #185) > With the recent update (not yet pushed), here's what I get: > > [0] > {('a', 1): 3.4, 2: {5, 6}} > [0, ('a', 1)] > 3.4 that's fine when the walker never recurses into dict keys. however, when you want the walker to recurse into keys, then you do need the additional index to tell if you're visiting the key or the value. this treats a dict as a collection of key-value pairs rather than just a collection of values.
And, with a dataclass added... @dataclasses.dataclass() class A: a: int b: "list[int]" items = ( {("a", 1): 3.4, 2: {5, 6}}, {3: A(5, [12, 3])}, ) for (item, path) in walker(items): with visitor(item, path=path): pass ...that's what we get: [0] => {('a', 1): 3.4, 2: {5, 6}} [0][('a', 1)] => 3.4 [0][2] => {5, 6} [0][2][5] => 5 [0][2][6] => 6 [1] => {3: A(a=5, b=[12, 3])} [1][3] => A(a=5, b=[12, 3]) [1][3]['a'] => 5 [1][3]['b'] => [12, 3] [1][3]['b'][0] => 12 [1][3]['b'][1] => 3 That said, I still think that we must somehow designate that in case of dataclass we access not by index but by an attribute. With that in mind, perhaps the path should be somewhat more flexible than it is, and have different types: AttributePath, DictItemPath, SetPath and IndexPath. This way the repr can be handled.
(In reply to Jacob Lifshay from comment #190) > (In reply to Dmitry Selyutin from comment #187) > > I think sets and frozensets paths are simply hash(item). That is: > > I strongly disagree because hash(item) can't be used to select the proper > item -- both because it's a pain to access a set entry via hash and because > the hash may not be unique -- you can have a set {a, b} with len == 2 and > hash(a) == hash(b) My point is that the item itself cannot be used either. You simply don't have syntax where set can be accessed by an index. cf. above about different repr for different path types so that we can handle .attr vs [index] vs dict_item(a,b) vs ??? (no idea how can we express sets).
Moar tweaks with in-place constructed path classes: [0] => {('a', 1): 3.4, 2: {5, 6}} [0][('a', 1)] => 3.4 [0][2] => {5, 6} [0][2]{5} => 5 [0][2]{6} => 6 [1] => {3: A(a=5, b=[12, 3])} [1][3] => A(a=5, b=[12, 3]) [1][3].a => 5 [1][3].b => [12, 3] [1][3].b[0] => 12 [1][3].b[1] => 3
(In reply to Jacob Lifshay from comment #191) > > that's fine when the walker never recurses into dict keys. > > however, when you want the walker to recurse into keys, then you do need the > additional index to tell if you're visiting the key or the value. this > treats a dict as a collection of key-value pairs rather than just a > collection of values. Ah yes, that's what I missed. Right, keys should be handled too.
How about {key} and {value} notation? I'm not sure how to express it better in representation.
(In reply to Dmitry Selyutin from comment #196) > How about {key} and {value} notation? I'm not sure how to express it better > in representation. i just treated it as dict entries being a tuple-like thing, like is returned by dict.items
(In reply to Dmitry Selyutin from comment #193) > My point is that the item itself cannot be used either. You simply don't > have syntax where set can be accessed by an index. cf. above about different > repr for different path types so that we can handle .attr vs [index] vs > dict_item(a,b) vs ??? (no idea how can we express sets). well, for the root object [[{1, (5, 6)}]], we could use: item=6 path: [0][0];(5, 6)[1] where a set entry is just ;<key> plus parenthesis as needed.
(In reply to Dmitry Selyutin from comment #181) > I like option 1, because it's short and sweet, and users can just declare > method as `method(self, instance, *_)`. indeed. or, it can be detected: functools can find out how many arguments there are, whether there are *args and **kwargs. we use it already somewhere, in soc or ah no nmigen, for detecting if one of the function parameters is "name". on detection of the number of arguments the *caller* can omit the missing parameter. > Also, yielding some object which > wraps the original instance looks strange, because call gets not what was > actually passed, but something else. yes counter-intuitive. (In reply to Dmitry Selyutin from comment #189) > With a minor tweak in the visitor which accepts an additional argument: > > class Visitor(mdis.visitor.ContextVisitor): > @mdis.dispatcher.Hook(object) > @contextlib.contextmanager > def dispatch_object(self, instance, *, path=[]): ^^ (edit: drat different fonts. arrows point to the []) this is a serious error, it is a singleton pattern. it MUST be a tuple NOT a list. (basically anything hashable is fine, but unhashable is catastrophic. this is a very common mistake) i really do not like path as a mandatory argument for all visitors (particularly leaf-nodes and especially the primitives int bool etc). if it is optional (see above including *_) i have no problem with it. and if using the name "instance" i recommend correspondingly using "instanceid". (In reply to Jacob Lifshay from comment #182) > (In reply to Luke Kenneth Casson Leighton from comment #180) > > {3: A(5, [12, 3])} path=[] > > > > no that is not okay (path=[]). ints do not have and do not need paths. > > you misunderstood this: path is the list of all indexes/keys/field-names > needed to get from the root object to the current value, ahh ok. good explanation. > and, yes, ints *need* paths, so you can tell which int you're referring to, no, you really don't. primitive types (bool int float complex str bytes) definitely do not need a path... > like e.g. SelectableInt(123, 45) ... but types *derived* from int might: this depends on the implementor, if there are any additional properties. i.e. SelectableInt is no longer a leaf-node type, it is a tree-type due to itself having two leaf-nodes: the value and the bit-width.
(In reply to Luke Kenneth Casson Leighton from comment #199) > (In reply to Dmitry Selyutin from comment #181) > > I like option 1, because it's short and sweet, and users can just declare > > method as `method(self, instance, *_)`. > > indeed. or, it can be detected: functools can find out how many > arguments there are, whether there are *args and **kwargs. > we use it already somewhere, in soc or ah no nmigen, for detecting > if one of the function parameters is "name". No. This is _incorrect_. You shouldn't do this, because it's completely unreliable and can break at any time. The best bet would perhaps be inspect. This is just like people used to detect "unbound methods", then came Guido and dropped these once and forever. There is no justification for such tricks other than hacking or avoiding to traverse complex classes hierarchy manually (which can bring even more problems than inspect). Resorting to such constructs, other than for self-aware code or truly dynamic construction of some meta object, is almost certainly a sign of bad architecture. > this is a serious error, it is a singleton pattern. > it MUST be a tuple NOT a list. (basically anything hashable > is fine, but unhashable is catastrophic. this is a very common > mistake) Please don't be so dramatic. You condemn the artificial example, where you didn't even check the code. If you check it, you'll see that neither default value is used nor actually this is write-accessed. But this is not the code to be committed. The walkers, however, use tuples. > i really do not like path as a mandatory argument for all visitors > (particularly leaf-nodes and especially the primitives int bool etc). > if it is optional (see above including *_) i have no problem with it. Luke. You don't read the code, _again_. Which part of it says it's mandatory? Which part binds it to any visitor other than visitor from this example? I specifically and explicitly told that walkers yield it, and showed it in the example itself. It's up to users whether they use this information or not in visitors. > and if using the name "instance" i recommend correspondingly using > "instanceid". Is it reserved or you want to have it parallel with typid? I named it instance because it's exactly instance of some object, and because we use type(instance) for a double dispatxh. I'd have used type for typeid, but this one is a built-in. And, frankly, our typeid is even no longer typeid, because sometimes it's just callable (like dataclasses.is_dataclass). I couldn't come up with a good name since I implemented this, ideas are appreciated. :-) > ahh ok. good explanation. > > > and, yes, ints *need* paths, so you can tell which int you're referring to, > > no, you really don't. primitive types (bool int float complex str bytes) > definitely do not need a path... They do need paths if you want to differentiate which int comes from where. Jacob gave the nested collections for a reason. > > like e.g. SelectableInt(123, 45) > > ... but types *derived* from int might: this depends on the implementor, > if there are any additional properties. > > i.e. SelectableInt is no longer a leaf-node type, it is a tree-type > due to itself having two leaf-nodes: the value and the bit-width. This actually depends on how you treat it. In some cases you might opt to treat it as sequence of bits as well. This might be handy sometimes. But I don't really believe this should be part of visiting. This is not the same with fields, though. These are entirely different beasts. And especially with fields which mark insn fields. They are also integers in some sense, but they should be walked with exact bits in mind.
s/part of visiting/part of walking/
https://www.fluentpython.com/extra/function-introspection/ "Example 6 shows the values of __defaults__, __code__.co_varnames, and __code__.co_argcount for the clip function listed in Example 5."
briefly: * sorry i am only able to pay brief attention so i saw the [] and assumed it had been committed. * __code__ and co_* etc. have been part of python since around 2.0. it's not going away, not with 25+ years of code in the world that would break. * forcing visitors of leaf nodes to have to ignore a parameter that is meaningless i do not think is a good idea and i am looking for solutions. * another option: the decorator-of-visitors stores in the registry whether the path option is wanted or not. * alternative option: leaf-nodes on basic types never receive a path argument. they're just not necessary, so why have it? * alternative option: there are *two* functions: one for leaf-nodes (mostly primitive types or any non-iterable class) that never need a path, and one for those that are iterable * refinement: auto-identification of leaf-nodes vs tree-nodes: if the tree-iteration yields something of zero length, then it's pretty obvious that it's a leaf-node and the alternative function dedicated to leaf-nodes may be called. * walking down individual bits is unnecessary and if someone ever wanted it they should sub-class int and provide a bit-enumerator (__iter__). (nmigen does actually genuinely do this so it is not hypothetical)
(In reply to Luke Kenneth Casson Leighton from comment #203) > * forcing visitors of leaf nodes to have to ignore a parameter that > is meaningless i do not think is a good idea and i am looking for > solutions. > * another option: the decorator-of-visitors stores in the registry > whether the path option is wanted or not. > * alternative option: leaf-nodes on basic types never receive a > path argument. they're just not necessary, so why have it? > * alternative option: there are *two* functions: one for leaf-nodes > (mostly primitive types or any non-iterable class) that never need a > path, and one for those that are iterable > * refinement: auto-identification of leaf-nodes vs tree-nodes: > if the tree-iteration yields something of zero length, then it's > pretty obvious that it's a leaf-node and the alternative function > dedicated to leaf-nodes may be called. I think you have a misunderstanding of how path and leaf nodes (like int, float, bool, str) interact: critically, for each function call, path is the instructions for how to get from the root node to the current node, path doesn't include any indexes or anything describing the insides/children of the current node. Therefore, being a leaf node with no insides/children does *not* imply that the path is ignored, since the path was never describing the insides/children anyway, and since the insides/children are the only part we know to be empty due to being a leaf node. a good example of why we do *not* want to ignore the path even on leaf nodes is when visiting the CSV dict for lhau (in major.csv): {'opcode': '43', 'unit': 'LDST', <snip> 'BR': '0', 'sgn ext': '1', 'upd': '1', 'rsrv': '0', '32b': '0', 'sgn': '0', <snip>} the nodes and paths that would be visited: <snip> '0' path=['BR'] '1' path=['sgn ext'] '1' path=['upd'] '0' path=['rsrv'] '0' path=['32b'] '0' path=['sgn'] <snip> as you can see, the strings '0' and '1' get visited multiple times each, the only way the visitor can tell each '1' apart is by looking at path, which is where the critical information of *what the '1' means* is (does it mean this instruction sign-extends? or that it updates?). the only other slightly reasonable method of getting that information is relying on dict ordering, which is obviously not what we want to do.
(In reply to Luke Kenneth Casson Leighton from comment #203) > briefly: > > * sorry i am only able to pay brief attention so i saw the [] and > assumed it had been committed. If you intend to enter the discussion and especially use wordings like "catastrophic", then please DO read the code and the discussion around. > * __code__ and co_* etc. have been part of python since around 2.0. > it's not going away, not with 25+ years of code in the world > that would break. Having something for 25 years does not automatically means it will be there forever. Python developers are notorious example of breaking user examples. They broke strings. They broke iterators. They broke a lot of code which thought distinguishing unbound methods was a clever idea. You don't consider an existing experience; just stubbornly repeating they won't break it won't affect their choices at all. Again: you should not rely on this unless there are no other options. Even if you do — prefer inspect module. > * forcing visitors of leaf nodes to have to ignore a parameter that > is meaningless i do not think is a good idea and i am looking for > solutions. You don't understand what path is. Please re-read the explanations. Leaf nodes _have_ paths; root node _doesn't_, or, well, has it empty. Path means "how do I get here upon walking". But you also don't understand how the code in example works, what's even worse. You don't read my comments, so I have to repeat it several times. 1. Walkers walk over some object recursively and yield everything they found there. Basically walking over node, which yields subnodes, and does it recursively. 2. When walker has something to yield from the object, it also yields what the node that was. As if it yields string which shows paths like "left.right.left.right". Since our nodes can have arbitrary amount of subnodes, and can be walked differently, the paths are different too. 3. Unless you want to use this information in visitor — you don't need it. I intentionally used a custom hook for object in my example because I _wanted_ to pass the path to print hierarchy. And if you see how I call the visitor, you'll see that I pass argument explicitly. But nothing forces to do it. It's walker which emits this information, not visitor. And visitors are for installing handlers on how to visit stuff. > * walking down individual bits is unnecessary and if someone > ever wanted it they should sub-class int and provide a bit-enumerator > (__iter__). (nmigen does actually genuinely do this so it is not > hypothetical) You don't even read what example I bring. What's the point of this discussion then? Again: I'm talking about situation when bits are sparsed, like in instruction fields. Luke, if you enter the discussion, especially with quite strong wordings, please care to read and analyze what others already said. I'm a patient person and can repeat several times, but I'm really concerned this became a repeating pattern.
Jacob, about paths et al. Shouldn't such approach yield parent too, like os.walk() does with dirpath vs dirnames/filenames?
(In reply to Dmitry Selyutin from comment #206) > Jacob, about paths et al. Shouldn't such approach yield parent too, like > os.walk() does with dirpath vs dirnames/filenames? yeah, that would be useful too!
OK, I found a way to handle paths elegantly and keep the original identifiers as well. The trick is to yield both the original key/index/element and something which designates how to format it. https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=c025a35e2d0f1fc3701d266aea52a0cb00abd8b7
Folks, based on the example below (A is a dataclass, cut for brevity): items = ( {("a", 1): 3.4, 2: {5, 6}}, {3: A(5, [12, 3])}, ) for (item, path) in walker(items): with visitor(item, path=path): pass Here are results: [0] => {('a', 1): 3.4, 2: {5, 6}} [0]{('a', 1)} => ('a', 1) [0]{('a', 1)}[0] => a [0]{('a', 1)}[1] => 1 [0][('a', 1)] => 3.4 [0]{2} => 2 [0][2] => {5, 6} [0][2]{5} => 5 [0][2]{6} => 6 [1] => {3: A(a=5, b=[12, 3])} [1]{3} => 3 [1][3] => A(a=5, b=[12, 3]) [1][3].a => 5 [1][3].b => [12, 3] [1][3].b[0] => 12 [1][3].b[1] => 3 Does it meet your expectations? As you see I chose {} for hashes and pass the stuff which was used. Currently these things cannot be accessed directly. I think PartId, if used modified to be used with the parent, can not only format, but also access parent in a correct way (via some lambda or whatever). Do we need such functionality? Perhaps we might yield something which is correctly formatted and on its __call__ one can access the entity. Thoughts and opinion? I think it's somewhat counter-intuitive, as we discussed, I'd like to keep the original value. However, perhaps that's fine too? for item in walker(items): print(item.path) print(item.parent) with visitor(item.node, path=path): pass ...instead of this: for (node, parent, path) in walker(items): print(path) print(parent) with visitor(node, path=path): pass
(In reply to Dmitry Selyutin from comment #209) > Does it meet your expectations? As you see I chose {} for hashes and pass > the stuff which was used. lgtm, though imho Hash is a misnomer, maybe name it GetKey? > > for (node, parent, path) in walker(items): imho this or the one returning objects with node, parent, and path fields is good. I do think we should pick whichever option is faster, since the walker is likely hot code. if we can guarantee callers won't keep the returned object around, the fastest would probably to return the same object every time and just update its node, etc. fields.
(In reply to Jacob Lifshay from comment #210) > (In reply to Dmitry Selyutin from comment #209) > imho this or the one returning objects with node, parent, and path fields is > good. I do think we should pick whichever option is faster, since the walker > is likely hot code. I think yielding a tuple would be better then. Plus I kinda dislike the wrapping here, though cannot clearly formulate why (even for myself). Also, on paths. If we yield the parent (which I assumed is the original instance), perhaps the path should point to this parent, and not to the root? We currently keep the whole path.
(In reply to Dmitry Selyutin from comment #211) > (In reply to Jacob Lifshay from comment #210) > > (In reply to Dmitry Selyutin from comment #209) > > imho this or the one returning objects with node, parent, and path fields is > > good. I do think we should pick whichever option is faster, since the walker > > is likely hot code. > > I think yielding a tuple would be better then. Plus I kinda dislike the > wrapping here, though cannot clearly formulate why (even for myself). > > Also, on paths. If we yield the parent (which I assumed is the original > instance), perhaps the path should point to this parent, and not to the > root? We currently keep the whole path. you still need to know which child of the parent you are... also, if you use a list for path, and just push and pop indexes from the end rather than making a new object every time then you don't need to constantly allocate as much, which is a bit faster. also, i think the library should provide a function that has the top-level for loop running both walker and visitor
That's what I mean: nodes = ( {("a", 1): 3.4, 2: {5, 6}}, {3: A(5, [12, 3])}, ) for (node, parent, pathid, path) in walker(nodes): print(parent, pathid(path), "=>", node) ({('a', 1): 3.4, 2: {5, 6}}, {3: A(a=5, b=[12, 3])}) [0] => {('a', 1): 3.4, 2: {5, 6}} {('a', 1): 3.4, 2: {5, 6}} {('a', 1)} => ('a', 1) ('a', 1) [0] => a ('a', 1) [1] => 1 {('a', 1): 3.4, 2: {5, 6}} [('a', 1)] => 3.4 {('a', 1): 3.4, 2: {5, 6}} {2} => 2 {('a', 1): 3.4, 2: {5, 6}} [2] => {5, 6} {5, 6} {5} => 5 {5, 6} {6} => 6 ({('a', 1): 3.4, 2: {5, 6}}, {3: A(a=5, b=[12, 3])}) [1] => {3: A(a=5, b=[12, 3])} {3: A(a=5, b=[12, 3])} {3} => 3 {3: A(a=5, b=[12, 3])} [3] => A(a=5, b=[12, 3]) A(a=5, b=[12, 3]) .a => 5 A(a=5, b=[12, 3]) .b => [12, 3] [12, 3] [0] => 12 [12, 3] [1] =>
(In reply to Jacob Lifshay from comment #212) > also, if you use a list for path, and just push and pop indexes from the end > rather than making a new object every time then you don't need to constantly > allocate as much, which is a bit faster. e.g.: class Walker(...): def __init__(self): self.path = [] self.path_kind = [] @Hook(list, tuple) def walk_seq(self, value): for i, v in enumerate(value): self.path.append(i) self.path_kind.append(Index) yield (v, self.path, self.path_kind, value) yield from self(v) self.path.pop() self.path_kind.pop() @Hook(dict) def walk_dict(self, value): for k, v in value.items(): self.path.append(k) self.path_kind.append(GetKey) yield (k, self.path, self.path_kind, value) yield from self(k) self.path_kind[-1] = Index yield (v, self.path, self.path_kind, value) yield from self(v) self.path.pop() self.path_kind.pop() ...
(In reply to Dmitry Selyutin from comment #213) > That's what I mean: ah, so just the last element of the path
(In reply to Jacob Lifshay from comment #212) > (In reply to Dmitry Selyutin from comment #211) > > Also, on paths. If we yield the parent (which I assumed is the original > > instance), perhaps the path should point to this parent, and not to the > > root? We currently keep the whole path. > > you still need to know which child of the parent you are... cf. below (#213 I think?). I think if we have both parent and path to this parent we could reconstructthe whole path... Or not? > also, if you use a list for path, and just push and pop indexes from the end > rather than making a new object every time then you don't need to constantly > allocate as much, which is a bit faster. I just don't like APIs which modify builtins in-place. This can sometimes strike back. > also, i think the library should provide a function that has the top-level > for loop running both walker and visitor I'm intentionally leaving this task to caller because: 1. It's up to caller what to pass from walker to visitor. 2. It's up to caller what to pass else. 3. It's up to caller which kind of visitor to choose. 4. It's up to caller which action to after the hook was called or yielded back to caller. The best form I could think of is: def visit(root, walker, visitor, wrapper=lambda rv: rv,*args, **kwargs): if isinstance(visitor, ContextVisitor): for (node, parent, path) in walker(root): with visitor(node, parent=parent, path=path, *args, **kwargs): wrapper(rv) else: for (node, parent, path) in walker(root): wrapper(visitor(node, parent=parent, path=path, *args, **kwargs)) This seems complex and clumsy to use. Perhaps you have other ideas?
The code which yields only last path component looks much better. Considering this statement, can we ensure that the users are capable of reconstructing the whole path if needed? for (node, parent, path, kind) in walker(root): magic(...) How can the magic call construct the whole tree as it was done when we kept the whole paths?
am on atrain. late. brief. (In reply to Dmitry Selyutin from comment #208) > OK, I found a way to handle paths elegantly and keep the original > identifiers as well. The trick is to yield both the original > key/index/element and something which designates how to format it. ahh i love that idea, just be careful the "formatter" does not expand out to "and we need another feature, and another". > https://git.libre-soc.org/?p=mdis.git;a=commitdiff; > h=c025a35e2d0f1fc3701d266aea52a0cb00abd8b7 @dispatcher.Hook(dataclasses.is_dataclass) - def dispatch_dataclass(self, instance): + def dispatch_dataclass(self, instance, path=[]): <- path=() for field in dataclasses.fields(instance): key = field.name value = getattr(instance, key) - yield (key, value) - yield from self((key, value)) + part = (PartId.Attribute, key) + parts = (path + (part,)) + yield (value, parts) + yield from self(value, path=parts) + + @dispatcher.Hook(dict) + def dispatch_mapping(self, instance, path=[]): <- path=() + for (key, value) in instance.items():
Ah yes, this path sequence is almost dropped iff someone can suggest how to solve #217 in an elegant fashion. As for formatter — yes, I also have a feeling this will be extended, perhaps enum values should be replaced with separate classes.
(In reply to Luke Kenneth Casson Leighton from comment #218) > > ahh i love that idea, just be careful the "formatter" does not > expand out to "and we need another feature, and another". Updated the code with separate classes. These can be extended as we wish. The walking process stays the same: for (node, parent, path, pathcls) in walker(nodes): print(parent, pathcls(path), "=>", instance) The main idea is that path appears as it was; if the users need to format them or something else not covered by "pure" object we walked, they can use pathcls. > + def dispatch_dataclass(self, instance, path=[]): <- path=() These are already dropped: we yield only the last path component. However, I'm still not sure if we're able to build the full path on our own if we have all parents for each node. I'd really like path to be simple and be connected to parent (not to the top-level object). Ideas?
(In reply to Dmitry Selyutin from comment #220) > > Updated the code with separate classes. These can be extended as we wish. I forgot to post the commit, here it is: https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=f4c50f8ab5bcd4ead21759ea49840f549ac442d2
(In reply to Dmitry Selyutin from comment #219) > Ah yes, this path sequence is almost dropped iff someone can suggest how to > solve #217 in an elegant fashion. not a problem. > As for formatter — yes, I also have a > feeling this will be extended, yeahno then it is a bad idea. us "dictating" the format is a nonstarter. > perhaps enum values should be replaced with > separate classes. as markos said this is taking too long. path is not a necessary argument to pass given that the walker itself is supposed to perform the tracking. can we please move off of "hypothetical" development and move on to "practical use" and see how it works out, then revisit if it turns out to be necessary based on actual real-world usage? (In reply to Dmitry Selyutin from comment #217) > The code which yields only last path component looks much better. yes. we went over that last week. it should be instanceid. > Considering this statement, can we ensure that the users are capable of > reconstructing the whole path if needed? > > for (node, parent, path, kind) in walker(root): > magic(...) > > How can the magic call construct the whole tree as it was done when we kept > the whole paths? not the API's problem, that is the walker's problem. we went over this last week already (walker performs stack.push and pop). this is standard practice for walker/visitor APIs.
(In reply to Luke Kenneth Casson Leighton from comment #222) > (In reply to Dmitry Selyutin from comment #219) > > yeahno then it is a bad idea. us "dictating" the format is a nonstarter. No, I mean not dicating the formatter, but other functionality. Anyway that turned out to be simpler (cf. my previous message). > > perhaps enum values should be replaced with > > separate classes. Already done. > > as markos said this is taking too long. > > path is not a necessary argument to pass given that the walker itself > is supposed to perform the tracking. > > can we please move off of "hypothetical" development and move on > to "practical use" and see how it works out, then revisit if it > turns out to be necessary based on actual real-world usage? Well, handling paths is a real-world usage. We have `pysvp64db tree` command, which needs exactly that. Yes this _is_ taking so long, because I want to cover cases I have in mind, and also because [recall the communication we had so far]. Don't think I don't want to finish this task, the sooner it's finished the sooner I can start with binutils. > > (In reply to Dmitry Selyutin from comment #217) > > The code which yields only last path component looks much better. > > yes. we went over that last week. it should be instanceid. You're mixing two different unrelated things. The object itself and the way we reached it. Anyway, as for naming, I think "node" is a perfect name. Are you fine with it? > > Considering this statement, can we ensure that the users are capable of > > reconstructing the whole path if needed? > > > > for (node, parent, path, kind) in walker(root): > > magic(...) > > > > How can the magic call construct the whole tree as it was done when we kept > > the whole paths? > > not the API's problem, that is the walker's problem. This is not about who yields the path. This is about what's yielded. You're probably speaking of something else. My question is whether the path needs to be full (to root, upon which we started walking) or to relative (parent). Anyway, I'm mostly fine with how it looks. I'll return to paths issue when I handle `pysvp64db tree` command.
(In reply to Dmitry Selyutin from comment #223) > Already done. awesome. > Well, handling paths is a real-world usage. We have `pysvp64db tree` > command, which needs exactly that. Yes this _is_ taking so long, because I > want to cover cases I have in mind, can you do it as a sub-class or mix-in class? not part of the primary API but an "example that someone might commonly use"? and if it becomes common enough it gets moved *into* mdis? > You're mixing two different unrelated things. almost certainly. i'm not able to keep up fully at the moment, sorry. > The object itself and the way > we reached it. Anyway, as for naming, I think "node" is a perfect name. Are > you fine with it? yes. > This is not about who yields the path. This is about what's yielded. You're > probably speaking of something else. My question is whether the path needs > to be full (to root, upon which we started walking) or to relative (parent). relative. > Anyway, I'm mostly fine with how it looks. I'll return to paths issue when I > handle `pysvp64db tree` command. subclass or mixin.
Wow. Just wow. Implementation of "pysvp64db operands". The main issue with operands is that their instantiation is delayed; the record itself just contains the Operands tuple, which has an operand class to be instantiated and necessary arguments. And each operand then gets record instance and kinda "binds" it to itself. With old API, we had to cheat, and established hooks on Record instead, then traversed operands manually. With the new API, it's simple: class OperandsVisitor(InstructionVisitor): def __init__(self): self.__record = None return super().__init__() @mdis.dispatcher.Hook(Record) @contextlib.contextmanager def dispatch_record(self, node): self.__record = node yield node @mdis.dispatcher.Hook(Operands) @contextlib.contextmanager def dispatch_operands(self, node): for (cls, kwargs) in node: operand = cls(record=self.__record, **kwargs) print(operand.name, ", ".join(map(str, operand.span))) yield node The trick here is MRO; Operands is simply declared as "class Operands(tuple)", and the first hook to be matched is picked. Previously we had to create classes like Dict, Tuple and similar, now we just match MRO. And, since we know Operands will be eventually called from Record (actually Record.MarkdownRecord.Operands), we simply catch the record which was visited before and use it for instantiation. No custom Tuple classes, no cheat hooks on Records.
Here's how the visiting looks like: walker = Walker() for (node, *_) in walker(root): with visitor(node): pass As you see all visitors don't need anything but node. I had to temporarily deprecate TreeVisitor, though; it's the only one which needs paths.
(In reply to Dmitry Selyutin from comment #147) > Roadmap is: > 1. Update setup.py and install the module. Perhaps we should install it with > devscripts? > 2. Deprecate Node, walk() and similar stuff. > 3. Switch db.py to new approach. > 1. Introduced dependency in hdl-dev-repos instead: https://git.libre-soc.org/?p=dev-env-setup.git;a=commitdiff;h=31539e2fdbce1ac5f16792440e6b1acf8a5f1a32 2. Node, walk, Dict et al. are deprecated: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=89958b87cedb6dda485e00049f82703ce19f739a 3. db.py is switched to new approach (ibid.). I had to drop TreeVisitor for a while, though. Despite that formally this task is completed, at considering the steps I mentioned, I'd like to practice a bit with other pysvp64db commands. I can handle it in the scope of this task. I'm going to add few commands, because pysvp64db seems to be quite a useful tool. After this we can switch to other tasks which use visitors (I suggest codegen again, but this time in visitor API). Any objections?
(In reply to Dmitry Selyutin from comment #227) > 1. Introduced dependency in hdl-dev-repos instead: > https://git.libre-soc.org/?p=dev-env-setup.git;a=commitdiff; > h=31539e2fdbce1ac5f16792440e6b1acf8a5f1a32 perfect. beyond that if people don't run scripts they are on their own > 2. Node, walk, Dict et al. are deprecated: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=89958b87cedb6dda485e00049f82703ce19f739a looks waay better. > > Despite that formally this task is completed, at considering the steps I > mentioned, I'd like to practice a bit with other pysvp64db commands. I can > handle it in the scope of this task. I'm going to add few commands, because > pysvp64db seems to be quite a useful tool. er yes? :) i will mention it on OPF discuss. > After this we can switch to other tasks which use visitors (I suggest > codegen again, but this time in visitor API). Any objections? go for it, just remember there are numerous budgets kicking around, do a quick review https://bugs.libre-soc.org/buglist.cgi?email1=ghostman&emailassigned_to1=1&emailtype1=substring&list_id=4672&query_format=advanced&resolution=--- drat not so useful, must have forgotten to set you as assignee for some of the visitor ones.
I've refactored extras command. Now it's pure classes-based and demonstrates some powerful tricks with visitors. The output is perhaps not that pretty, but now we don't cheat and, what's more important, demonstrate how to use visitors. $ pysvp64db extras sv.ldu in1 RA0 in2 CONST_DS in3 NONE out RT cr_in NONE cr_in2 NONE cr_out NONE in1 RA0 in2 NONE in3 NONE out RT out RA cr_in NONE cr_in2 NONE cr_out NONE EXTRA0 d:RT EXTRA1 d:RA EXTRA1 s:RA Here's the relevant commit: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=bf208c7a6548cf6ae6ce33977a2fb8e3a463114a. The implementation is somewhat less obvious than when we cheat, but is interesting for demonstration purposes.
Also, the next commit also introduces selectors command; unlike extras, this isn't specific for svp64 instructions. Since extras are quite useless without selectors, the extras visitor simply inherits selectors visitor! https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=348016ff0f96d051e74d78e3dfd97d5f5606ceea
(In reply to Dmitry Selyutin from comment #229) > Here's the relevant commit: > https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff; > h=bf208c7a6548cf6ae6ce33977a2fb8e3a463114a. The implementation is somewhat > less obvious than when we cheat, but is interesting for demonstration > purposes. yeah it's really short. suggestion, initialise index to 0 then increment *after* the yield. set to -1 then increment before is counter-intuitive.
(In reply to Luke Kenneth Casson Leighton from comment #231) > > yeah it's really short. suggestion, initialise index to 0 > then increment *after* the yield. set to -1 then increment > before is counter-intuitive. I chose -1 as an alternative to accessing (index - 1). But both approaches are somewhat annoying. The thing is that _all_ code in parent is called _before_ child's code. Here's how it looks simplified: for (node, *) in walk(nodes): # recursively node.__enter__() node.code() node.__exit__() If we iterate over some child node, we've already called __enter__ and __exit__ for its parent. Frankly I'm not sure how to circumvent this. I like the semantic you assumed, though. This would kinda be similar to `for` + `with` combined.
(In reply to Dmitry Selyutin from comment #232) > > Frankly I'm not sure how to circumvent this. > Hm. I think if we only enter one-level we should resolve it: def visit(root, visitor, walker): with visitor(root): for node in walker(root): return visit(node, visitor, walker) Prerequisite is that walker doesn't go deeper than 1 level. UPD: considered root node
Yes this should do the trick. Ignore `return visit`, this should be just `visit`.
Yes, this works! I'm waiting for CI to make sure nothing is broken, but with the relevant changes in mdis, I can increment index after the yield. The code therefore becomes: def traverse(root, visitor, walker): with visitor(root): for (node, *_) in walker(root): traverse(root=node, visitor=visitor, walker=walker) traverse(root=root, visitor=visitor, walker=Walker())
OK, same 5 failures as before: https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4353802 I've merged the updated pysvp64db and also updated mdis code. Several new commands added, can be checked by `pysvp64db --help`. I suppose the approach is viable and extendable. I suggest therefore to mark this task as completed and move to something more practical (read "code generation"). Any objections?
Based on discussion in IRC, I'm marking this as resolved. Strangely it has never been marked as "in progress". :-)
After discussion, I've decided to reimplement arguments detection. From now on, we always expect at least 2 positional arguments: a dispatcher instance and a node. However, the caller can pass any additional arguments, be it positionals or keywords. The detection is done via inspect. Also, the concrete hook now wraps the original function; therefore ConcreteHook.__call__ refers to the original method we wrapped. Example for Walker.dispatch_unordered_sequence (irrelevant details snipped): class ConcreteHook(Hook) | ConcreteHook(*typeids) | | Methods defined here: | | __call__ = dispatch_unordered_sequence(self, node) https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=02ef49949991a739662a663ebe64ef00215c4e4d https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=0364578c804080d24f4e8b656c58c83f9d0abb51 https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=e132cae998a7dce910efe4c2f160e7cc3c93e5ee https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=f91c0f7cfeb121688b26191f7a1cadcfaa1ceea2 Is it sufficient to mark this task as completed, or we have other ideas?
What'd be the correct task for "reimplement the code generation based on visitors so that it's usable for both binutils and anything else"?
(In reply to Dmitry Selyutin from comment #239) > What'd be the correct task for "reimplement the code generation based on > visitors so that it's usable for both binutils and anything else"? that's a massive task that has to go on an entirely new grant. bug #1211 and bug #1212.