1094 – insndb instruction database visitor-walker is needed

Bug 1094 - insndb instruction database visitor-walker is needed

Summary: insndb instruction database visitor-walker is needed

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Dmitry Selyutin

URL:

Depends on:
Blocks:	737 1003 1211 1212 979 980
	Show dependency tree / graph

Reported:	2023-05-29 22:54 BST by Luke Kenneth Casson Leighton
Modified:	2024-03-10 07:02 GMT (History)
CC List:	3 users (show)

See Also:	1003 1020 1211
NLnet milestone:	NLnet.2022-08-107.ongoing
total budget (EUR) for completion of task and all subtasks:	4000
budget (EUR) for this task, excluding subtasks' budget:	4000
parent task for budget allocation:	1003
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	[lkcl] amount = 500 submitted = 2024-02-26 paid = 2024-03-08 [ghostmansd] amount = 3000 submitted = 2024-02-26 paid = 2024-03-08 [jacob] amount = 500 submitted = 2024-02-26 paid = 2024-03-08

Attachments
visitor demo (1.00 KB, text/x-python) 2023-06-03 13:01 BST, Luke Kenneth Casson Leighton	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-05-29 22:54:03 BST

a visitor-walker of the instruction database is needed.

1. 1st priority is disassembly (binary-to-{visitor})
2. 2nd priority is assembly (asciiinstruction-to-{visitor})

* TODO read https://medium.com/design-patterns-in-python/visitor-pattern-b9227759d6be
* TODO evaluate StreamBasedExpressionVisitor https://pyomo.readthedocs.io/en/stable/_modules/pyomo/core/expr/visitor.html
* TODO a drop-in replacement for PowerDecoder (bug #737) is a good candidate.
* TODO also ISACaller needs a *separate* decoder that is not yield-based
* TODO c-based Power ISA decoder bug #979
* TODO c-based Power ISA instructions bug #980
* TODO ISACaller parser (instruction-walking) bug #NNNN
* TODO ...
* TODO ...

links:

* https://libre-soc.org/irclog/%23libre-soc.2023-06-01.log.html#t2023-06-01T15:00:46
* https://libre-soc.org/irclog/%23libre-soc.2023-05-29.log.html#t2023-05-29T22:11:41
* https://docs.python.org/3/library/contextlib.html#contextlib.contextmanager
* https://docs.python.org/3/library/html.parser.html
* https://github.com/realistschuckle/pyvisitor/blob/master/src/visitor.py
* https://github.com/python/cpython/blob/2.7/Lib/ast.py
* https://github.com/python/cpython/blob/2.7/Lib/HTMLParser.py


<programmerjake> with visitor.db(db):
<programmerjake>     for insn in db:
<programmerjake>         with visitor.insn(insn):
<programmerjake>             for operand in insn:

<lkcl> class MyInsnDBParser:
<lkcl>    def handle_db_start(self, ....)
<lkcl>    def handle_db_end(self, ....)
<lkcl>    def handle_record_start(self, ....)
<lkcl>    def handle_record_end(self, ....)

<lkcl> if hasattr(visitor, "handle_db"): visitor->handle_db(db)

<lkcl> visitor->handle_db_start(insn)
<lkcl> for each insn do
<lkcl>       visitor->handle_insn_start(insn)
<lkcl>       for each record in insn do
<lkcl>             visitor->handle_record(record)
<lkcl>       visitor->handle_insn_end(insn)
<lkcl> visitor->handle_db_end(insn)

Comment 1 Jacob Lifshay 2023-05-29 23:03:59 BST

an idea i had for making it easier -- use with and contextmanager:

def visit(db, visitor):
    with visitor.db(db):
        for i, insn in enumerate(db):
            with visitor.insn(i, insn):
                for i, operand in enumerate(insn):
                    visitor.operand(i, operand)

class MyDemoVisitor:
    @contextmanager
    def db(self, db):
        print("before insns")
        yield
        print("after insns")

    @contextmanager
    def insn(self, insn_index, insn):
        print("before operands")
        yield
        print("after operands")

    def operand(self, operand_index, operand):
        print("operand")

generates output like:
before insns
before operands
operand
operand
after operands
before operands
operand
operand
operand
after operands
after insns

Comment 2 Luke Kenneth Casson Leighton 2023-06-01 16:32:53 BST

https://pyomo.readthedocs.io/en/stable/_modules/pyomo/core/expr/visitor.html

this is really good (StreamBasedExpressionVisitor).

it's generic, and looks dead-easy to use.

i will create a "visitor_test" branch adding that and
see what it does

Comment 3 Luke Kenneth Casson Leighton 2023-06-01 18:51:07 BST

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/visitor_test

ok added, and "fixed" some of the unit tests (as best i can).
i had to create dummy classes, they're not perfect, but seem
to convince (most of) the unit tests that "all is good".

the idea that occurred to me is to actually add these functions
(beforeChild etc.) *actually to the insndb classes*.

the reason for that is that the context about "what to do"
is actually maintained *in* the class itself.

Comment 4 Luke Kenneth Casson Leighton 2023-06-01 19:05:54 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)

> the idea that occurred to me is to actually add these functions
> (beforeChild etc.) *actually to the insndb classes*.

part of that will be:

    "what sub-things do you want me to call a visitor on?"

Comment 5 Dmitry Selyutin 2023-06-01 20:33:41 BST

I checked the visitor branch, and I have some thoughts to share.
1. If we take a 3rd-party code, please keep it as is. Yes this includes formatting. No need to adopt.
2. I don't really like what I see in pyomo. It seems that the overall implementation is bloated, complex and too pyomo-way-centric (unsurprisingly). This seems way more complex than it should've been.
3. The APIs you refer to (beforeChild, afterChild, etc.) are actually deprecated.

What I like, in contrast:
1. Something dead simple as html.parser visitors. Yes they are not as generic as pyomo, but there's no need to be generic in this exact case. We handle insns, not anything in the world.
2. The html.parser approach can be augmented with contextlib, like Jacob suggested. I have to check and play around the code, because I rarely used context managers before.

Now on the order of the subtasks. I'd start with binutils. I know you already tired of this never-ending-journey, but here's the rationale.
1. binutils code must be updated anyway with respect to assembly (and to lesser degree disassembly). You remember that I started binutils when we had the old assembler, so binutils assembly follows old assembler's principles. Disassembler, in contrast, relies much more on autogenerated code; but still not enough. Too much manually-written code. With visitors, we can generate much more code to avoid manual synchronization between these code bases.
2. Despite that we added missing specifiers, we never had a chance to sync and especially check whether the manually written code works everywhere (quite unlikely). Fixing these scenarios, especially for assembly, will simply waste the time unless we rely more on the generated code.
3. binutils is the most active insndb user, and I know its structure (almost) perfectly.
4. If implemented carefully, the same principles from C-generator-visitors (encoding/decoding) can be reused for C-based Power ISA decoder. Perhaps even to the point that we can generate the same code and use it in different places.
5. It's a good candidate considering the size (about 700 LOC).

In an ideal world, even assembly/disassembly *inside* insndb could be implemented via visitors. However, this is too fragile and complex target to start with. The C-generator-visitors are much better targets, and binutils seems to be the best to sharpen a skill and test the overall approach.

What do you think?

Comment 6 Luke Kenneth Casson Leighton 2023-06-01 20:37:40 BST

(In reply to Dmitry Selyutin from comment #5)
> I checked the visitor branch, and I have some thoughts to share.
> 1. If we take a 3rd-party code, please keep it as is. Yes this includes
> formatting. No need to adopt.

it's too big - normally i would agree. so i had to cut it back

> 2. I don't really like what I see in pyomo. It seems that the overall
> implementation is bloated, complex and too pyomo-way-centric

not surprising

> (unsurprisingly). This seems way more complex than it should've been.
> 3. The APIs you refer to (beforeChild, afterChild, etc.) are actually
> deprecated.

i didn't actually look closely! just "got it running"

> What I like, in contrast:
> 1. Something dead simple as html.parser visitors. Yes they are not as
> generic as pyomo, but there's no need to be generic in this exact case. We
> handle insns, not anything in the world.

i don't mind that at all.

> 2. The html.parser approach can be augmented with contextlib, like Jacob
> suggested. I have to check and play around the code, because I rarely used
> context managers before.

ack

> Now on the order of the subtasks. I'd start with binutils.

because you know it well

Comment 7 Dmitry Selyutin 2023-06-02 17:31:23 BST

I've almost completed a hierarchy reorganization for insndb. Now it's really insndb:
$ ls src/openpower/insndb/
__init__.py  __pycache__  asm.py  dis.py  types.py

pysvp64.py -> asm.py
pysvp64dis.py -> dis.py
power_insn.py -> types.py

This is done in order to keep structure clean and obvious. The setuup.py script is updated as well. Visitors will reside here too. Next, I'll implement some script which simply dumps the instruction database based on visitors: it came to my mind that this is much simpler to do for starters instead of binutils.

Since this change might affect other uses outside of openpower-isa, so this will be pushed to branch and tested with CI (thanks again, Jacob, I cannot even stress how grateful I am).

Comment 8 Jacob Lifshay 2023-06-02 19:02:24 BST

(In reply to Dmitry Selyutin from comment #7)
> Since this change might affect other uses outside of openpower-isa, so this
> will be pushed to branch and tested with CI (thanks again, Jacob, I cannot
> even stress how grateful I am).

well, technically openpower-isa ci just runs pytest in openpower-isa, having it run more tests is something we should probably put in an auto-updated integration repo with submodules for all our repos.

Comment 9 Luke Kenneth Casson Leighton 2023-06-02 21:23:26 BST

(In reply to Dmitry Selyutin from comment #7)
> I've almost completed a hierarchy reorganization for insndb. Now it's really
> insndb:
> $ ls src/openpower/insndb/
> __init__.py  __pycache__  asm.py  dis.py  types.py
> 
> pysvp64.py -> asm.py
> pysvp64dis.py -> dis.py
> power_insn.py -> types.py

db.py instead of types.py? "insn/db.py" "insn/asm.py"?

maybe types.py works better, dunno.

> This is done in order to keep structure clean and obvious. The setuup.py
> script is updated as well. Visitors will reside here too.

suggest a separate "visitor.py" file.

> Next, I'll
> implement some script which simply dumps the instruction database based on
> visitors: it came to my mind that this is much simpler to do for starters
> instead of binutils.

yes very much so, iirc it's kinda "the canonical thing to do" with
visitor-walkers.

insndb/asciidump.py?  insndb/print.py?

btw if you want a hint on doing indentation: the "child_begin" function
increments "self.tabdebth", and "child_end" function decrements it.

real simple.

Comment 10 Dmitry Selyutin 2023-06-03 10:47:17 BST

(In reply to Luke Kenneth Casson Leighton from comment #9)
> 
> db.py instead of types.py? "insn/db.py" "insn/asm.py"?

Ah sorry, I totally missed this. Well, I also thought about db.py, but came to conclusion that db.py module would provide a special script itself. Now it's in repository:
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=78231140441402bec1dedbe4e97238cb9597b96f

This commit provides pysvp64db script, which now can list all instructions. I'll play more with this script and its possible commands.

> maybe types.py works better, dunno.

I was thinking about core.py or something like this. But for now I'm keeping types.py to concentrate on visitors.

> suggest a separate "visitor.py" file.

Perhaps yes, if we'll have some concrete visitor implementations to be reused. For now I just added a "default" Visitor class into types.py, and that, combined with approach suggested by Jacob, actually was already sufficient to write a simple script which dumps all instructions.
Another part of "visitors" is that we actually have more stuff to be "visited". Not only the database itself, but, for example, all possible fields in RM. Or all possible specifiers. And so on. But at this point it seems that all these chunks might be just separate context managers (like with db and record methods).

> insndb/asciidump.py?  insndb/print.py?
> 
> btw if you want a hint on doing indentation: the "child_begin" function
> increments "self.tabdebth", and "child_end" function decrements it.
> 
> real simple.

I think this should be handled in concrete visitors: not all of them need tabulation. Perhaps the inherited visitor could simply track this on their own? I'll think about it when I'll move to binutils.

Comment 11 Dmitry Selyutin 2023-06-03 11:00:33 BST

A new command is available: pysvp64db opcodes INSN. Example:

$ pysvp64db opcodes svshape 
010110---------------0000-011001
010110---------------0001-011001
010110---------------0010-011001
010110---------------0011-011001
010110---------------0100-011001
010110---------------0101-011001
010110---------------0110-011001
010110---------------0111-011001
010110---------------1010-011001
010110---------------1011-011001
010110---------------1100-011001
010110---------------1101-011001
010110---------------1110-011001
010110---------------1111-011001

$ pysvp64db opcodes svshape2 
010110---------------100--011001

Comment 12 Luke Kenneth Casson Leighton 2023-06-03 12:58:45 BST

(In reply to Dmitry Selyutin from comment #10)
> (In reply to Luke Kenneth Casson Leighton from comment #9)
> > 
> > db.py instead of types.py? "insn/db.py" "insn/asm.py"?
> 
> Ah sorry, I totally missed this. Well, I also thought about db.py,

i ran *immediately* into import problems due to common-names "import dis",
and the same would happen on "import types".

so *temporarily* i invented some names

commit aaaecc47a8bbe21bc047e589e34bec6e18f928f6 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jun 3 12:47:30 2023 +0100

    using names of modules that are identical to commonly-used python modules
    (even at the leaf-node) is causing import problems

commit 9546d80ef122b4f53b61dc1f3997a2ad70c139bd
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sat Jun 3 12:45:35 2023 +0100

    import dis overloads naming of modules already in python3,
    and stops functions from importing


> but came
> to conclusion that db.py module would provide a special script itself. Now
> it's in repository:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=78231140441402bec1dedbe4e97238cb9597b96f

like it

> This commit provides pysvp64db script, which now can list all instructions.
> I'll play more with this script and its possible commands.
> 
> > maybe types.py works better, dunno.
> 
> I was thinking about core.py or something like this. But for now I'm keeping
> types.py to concentrate on visitors.
> 
> > suggest a separate "visitor.py" file.
> 
> Perhaps yes, if we'll have some concrete visitor implementations to be
> reused. For now I just added a "default" Visitor class into types.py, and
> that, combined with approach suggested by Jacob, actually was already
> sufficient to write a simple script which dumps all instructions.
> Another part of "visitors" is that we actually have more stuff to be
> "visited". Not only the database itself, but, for example, all possible
> fields in RM.

aaabsolutely.  it will extend down further.

> or all possible specifiers. And so on. But at this point it
> seems that all these chunks might be just separate context managers (like
> with db and record methods).

     with Visitor.record.rm.extra(blah)

> I think this should be handled in concrete visitors: not all of them need
> tabulation.

ok so we are basically subdividing into Model-View-Controller here.

i *think* - i am not certain - that you might be conflating/merging
two of those two together.

> Perhaps the inherited visitor could simply track this on their
> own? I'll think about it when I'll move to binutils.

the idea of MVC is that Visitors are passed *both* the data *and*
"what to do on it".

the "what to do on it" has an API (child_before, child, child_end)
which the *Visitor* knows about.

the idea of "tabulation" is *not* for "All Visitors".  it's for
*that one Visitor* to keep, as a *local variable*, the "current tab depth".



def visit(db, visitor):
    with visitor.db(db):
        for i, insn in enumerate(db):
            with visitor.insn(i, insn):
                for i, operand in enumerate(insn):
                    visitor.operand(i, operand)

class MyDemoVisitor:
    def __init__(self):
        self.tab = 0
    @property
    def tabs(self):
        return " " * 4 * self.tab
    @contextmanager
    def db(self, db):
        self.tab += 1
        print(self.tabs + "before insns")
        yield
        print(self.tabs + "after insns")
        self.tab += 1

    @contextmanager
    def insn(self, insn_index, insn):
        self.tab += 1
        print(self.tabs + "before operands")
        yield
        print(self.tabs + "after operands")
        self.tab += 1

    def operand(self, operand_index, operand):
        self.tab += 1
        print(self.tabs + "operand")
        self.tab -= 1

x = MyDemoVisitor()
visit([["op1", "op2"], ["insn1_op2", "insn2_op2"]], x)

output: (note extra indent, this is because self.tab increase *before* print)

```
    before insns
        before operands
            operand 0 op1
            operand 1 op2
        after operands
                before operands
                    operand 0 insn1_op2
                    operand 1 insn2_op2
                after operands
                    after insns
```

Comment 13 Luke Kenneth Casson Leighton 2023-06-03 13:01:31 BST

Created attachment 191 [details]
visitor demo

Comment 14 Dmitry Selyutin 2023-06-03 13:05:09 BST

Luke, could you, please, clarify, where and how you observe importing issues? I don't quite get the names collision here since I import stuff like `from openpower.insndb.dis` and haven't got any problems.

Comment 15 Dmitry Selyutin 2023-06-03 13:06:48 BST

As for tabs, that's exactly what I meant: tabs go to concrete visitor implementation.

Comment 16 Luke Kenneth Casson Leighton 2023-06-03 15:05:38 BST

(In reply to Dmitry Selyutin from comment #14)

> Luke, could you, please, clarify, where and how you observe importing
> issues? I don't quite get the names collision here since I import stuff like
> `from openpower.insndb.dis` and haven't got any problems.

$ history | more
 ...
 1787  python3 insndb/db.py
 ...

(python3.7 - the project default version)

note that's from that directory

(In reply to Dmitry Selyutin from comment #15)

> As for tabs, that's exactly what I meant: tabs go to concrete visitor
> implementation.

yes.  and as an example it's a good one, because it shows that it's
the visitor that contains its own context.
(we need examples)

Comment 17 Dmitry Selyutin 2023-06-03 15:22:55 BST

(In reply to Luke Kenneth Casson Leighton from comment #16)
> (In reply to Dmitry Selyutin from comment #14)
> 
> > Luke, could you, please, clarify, where and how you observe importing
> > issues? I don't quite get the names collision here since I import stuff like
> > `from openpower.insndb.dis` and haven't got any problems.
> 
> $ history | more
>  ...
>  1787  python3 insndb/db.py
>  ...

I'd say that's not the intended usage anyway: the module is used as pysvp64db script (cf. setup.py). Also, Python has _so_ many modules that, regardless of the name used, you end up with collision anyway (unless perhaps you drop directories and use underscore in names instead).
I don't see dbtypes as clear name. But neither is types. How about core?

Comment 18 Dmitry Selyutin 2023-06-03 15:30:38 BST

Also, a corresponding PEP acknowledges the issue:
https://peps.python.org/pep-0328/#rationale-for-absolute-imports

So, TL;DR: there's always a chance a naming collision happens since we don't control which modules Python brings with a new versions. In order to solve this, there are correct import statements (which are used throughout our code base). So no need for this change other than "types.py simply doesn't fit this module well, its contents are not limited to types".

So I suggest renaming the module to core.py, but never _ever_ bother with "donnerwetter we used standard module name" problem (as long as we import correctly). Objections?

Comment 19 Dmitry Selyutin 2023-06-03 15:39:43 BST

I see that dis got renamed too. I like dis name, it's used as "something completely opposite to asm", it's present in test and corresponding executable script. Again, it's not a problem as long as the correct imports are used (and setup develop does things right, compared to python3 src/openpower/insndb/dis.py command). But the behavior of a shitty interpreter, which puts the module's directory first in imports, is not our issue.

The module is either used with correct imports or as standalone script, in which case it must be installed. After all, we don't get ashamed when some code runs correctly only after it's installed (example: binaries which are installed with libraries, and cannot be run unless rpath is changed or installed to well-known places the linker is aware of).

Please rollback these changes, that's not the intended use scenario.

Comment 20 Dmitry Selyutin 2023-06-03 15:59:53 BST

OK I rolled them back so that I could continue developing, and renamed types into core.

Comment 21 Dmitry Selyutin 2023-06-03 16:04:32 BST

Introduced operands command:

$ SILENCELOG=true pysvp64db operands add. 
RT
RA
RB
OE=0
Rc=1

Notice the completely redundant and useless SILENCELOG incantation here. I'll add the same hack we did to pysvp64dis (setenv), but still: do we have any plance to switch to logging module instead of prints?

Comment 22 Luke Kenneth Casson Leighton 2023-06-03 16:23:30 BST

(In reply to Dmitry Selyutin from comment #20)
> OK I rolled them back so that I could continue developing, and renamed types
> into core.

dmitry!  please don't make that kind of decision without an acknowledgement!

i now cannot do any work!

i am trying to track down the RA_OR_ZERO bug and have found it's
not RA_OR_ZERO at all, but now have to terminate working on this.

i know "dis" is short - i like it as well, because it matches with "asm"
which is also 3 letters.

but if there's a serious problem on global namespace clashes those names
have to be considered "off-limits"

generally, function/object names and keywords such as "type" are also
off-limits.


lkcl@fizzy:~/src/libresoc/openpower-isa/src/openpower$ python3 insndb/db.py
Traceback (most recent call last):
  File "insndb/db.py", line 8, in <module>
    from openpower.insndb.core import (
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 4, in <module>
    import dataclasses as _dataclasses
  File "/usr/lib/python3.7/dataclasses.py", line 5, in <module>
    import inspect
  File "/usr/lib/python3.7/inspect.py", line 35, in <module>
    import dis
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/dis.py", line 9, in <module>
    from openpower.insndb.core import (
ImportError: cannot import name 'Style' from 'openpower.insndb.core' (/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py)

Comment 23 Dmitry Selyutin 2023-06-03 16:26:30 BST

Luke, please just use pysvp64db instead, as mentioned in comments above. I gave the rationale before reverting this, it's just the misuse. $(setup.py develop) installs these scripts correctly. Using them from source tree is not valid use case.

https://bugs.libre-soc.org/show_bug.cgi?id=1094#c18
https://bugs.libre-soc.org/show_bug.cgi?id=1094#c19

Comment 24 Dmitry Selyutin 2023-06-03 16:28:55 BST

One might argue that this is issue related to implementation (Python should not bring module's internals atop of sys.path list); I personally think so. However, Python has tools and guidelines for doing this correctly. Our setup.py also does the correct thing and syncs the script path too. So I suggest using the script as intended by PEP and setup.py and overall logic.

Comment 25 Luke Kenneth Casson Leighton 2023-06-03 19:11:00 BST

(In reply to Dmitry Selyutin from comment #24)

> One might argue that this is issue related to implementation (Python should
> not bring module's internals atop of sys.path list); 

they likely feel they had to get it to the front of sys.path
so as not to have user-imports cause problems.

if it is at the back it is still a problem but one that makes
the user's code break (because when using the module they get
the *system* module) rather than "python wudnt import mi fing"
errors which look like *python* is broken.

either way it's still a problem, but one way avoids the python devs
from getting blamed for it.


> I personally think so.
> However, Python has tools and guidelines for doing this correctly. Our
> setup.py also does the correct thing and syncs the script path too. So I
> suggest using the script as intended by PEP and setup.py and overall logic.

this i feel would be reasonable if it wasn't for running the module itself
as a way to check for syntax errors and import errors - on a standalone
basis.  i actually do this, and running a program (/usr/local/bin/...)
which imports that script isn't the same thing.

also under some circumstances, to put unit tests in.

example:

insndb/db.py:

    import unittest

    class Visitor: etc

    # create some Visitor unit tests
    class unittest.Blah(....)
        ...

    # the main import entry-point for the script pysvp64db
    def main():
       ....

    if __name__ == '__main__':
        unittest.main()

now that would be a good reason to run that script directly: to fire
the unit tests for it, which as long as they are not massive, and
are more like "use-cases", i feel are kinda ok to have right next to
the actual class itself.

do in this case we need to put (small, demo) unit tests in there?
unlikely.

will i try running the script to check syntax-errors and import-errors
when writing it?  yes.

Comment 26 Dmitry Selyutin 2023-06-03 19:20:23 BST

Unlike to docs where this is fine for documenting the classes and routines I have a strong vomit reflex for putting tests in the place the code lives, especially when tests are mixed with the code. This just complicates the major case where everything works and you just want to get how it works, and rarely help when something doesn't work and you want to know why (because no test can replace a debugging, even via prints). Also, I'd rather had this:

if __name__ == "__main__":
    main()

I find it exceptionally confusing that stuff under "main" executes something other than main function. And it's especially strange if the script called as standalone executable does a totally different thing than the same script but called via python. This doesn't make any sense to do one stuff via shebang and another via calling almost the same stuff but directly.

Many other modules follow this principle too (all these test_XXX.py).

Comment 27 Luke Kenneth Casson Leighton 2023-06-04 09:15:46 BST

(In reply to Dmitry Selyutin from comment #26)
> Unlike to docs where this is fine for documenting the classes and routines I
> have a strong vomit reflex for putting tests in the place the code lives,
> especially when tests are mixed with the code. 

:)

if they're small enough (and intended as "documented demos") they can
be very useful. but not if they are dozens and dominate the module.

> Many other modules follow this principle too (all these test_XXX.py).

those are not intended as external programs, that's for sure.

Comment 28 Dmitry Selyutin 2023-06-04 10:41:24 BST

Provided two new visitors: pcode and extras. Supported sv. prefixed instructions. Example:

$ pysvp64db pcode addi 
RT <- (RA|0) + EXTS(SI)
$ pysvp64db extras sv.setvl 
in1
    sel RA0
    reg RA
    seltype SRC
    idx EXTRA1
in2
    sel NONE
    reg NONE
    seltype NONE
    idx NONE
in3
    sel NONE
    reg NONE
    seltype NONE
    idx NONE
cr_in
    sel NONE
    reg NONE
    seltype NONE
    idx NONE
cr_in2
    sel NONE
    reg NONE
    seltype NONE
    idx NONE
out
    sel RT0
    reg RT
    seltype DST
    idx EXTRA0
out2
    sel NONE
    reg NONE
    seltype NONE
    idx NONE
cr_out
    sel CR0
    reg CR0
    seltype NONE
    idx NONE

Note that the latter command only works with sv.-prefixed instructions since the extras make sense only with these.

Comment 29 Dmitry Selyutin 2023-06-04 10:43:05 BST

Other commands simply discard sv. prefix where it makes sense (e.g. `pysvp64db pcode sv.addi` does the same as `pysvp64db pcode addi` would).

Comment 30 Luke Kenneth Casson Leighton 2023-06-04 10:50:12 BST

(In reply to Dmitry Selyutin from comment #28)
> Provided two new visitors: pcode and extras.

let's not go down the rabbithole of walking the entire pcode AST
quite just yet :)  (there's python-ply for that)

> Supported sv. prefixed instructions. Example:
> 
> $ pysvp64db pcode addi 
> RT <- (RA|0) + EXTS(SI)

cool!

> $ pysvp64db extras sv.setvl 
> in1
>     sel RA0
>     reg RA
>     seltype SRC
>     idx EXTRA1

this is going to be incredibly useful.  i can see scenarios where
latex auto-conversion is possible, and much more.

Comment 31 Dmitry Selyutin 2023-06-04 11:04:40 BST

Supported operand spans:

$ pysvp64db operands setvl 
RT 6,7,8,9,10
RA 11,12,13,14,15
SVi 16,17,18,19,20,21,22
vf 25
vs 24
ms 23
Rc=0 31

Note that this does not support inline extras, since these operate on concrete operand values to determine whether extras must be present or not. No idea how to handle this case yet.

Comment 32 Dmitry Selyutin 2023-06-04 11:14:56 BST

(In reply to Luke Kenneth Casson Leighton from comment #30)
> (In reply to Dmitry Selyutin from comment #28)
> > Provided two new visitors: pcode and extras.
> 
> let's not go down the rabbithole of walking the entire pcode AST
> quite just yet :)  (there's python-ply for that)

Yeah, this should also be a separate visitor. This should make the generation simpler. Anyway, not in this task scope. :-)

> > $ pysvp64db extras sv.setvl 
> > in1
> >     sel RA0
> >     reg RA
> >     seltype SRC
> >     idx EXTRA1
> 
> this is going to be incredibly useful.  i can see scenarios where
> latex auto-conversion is possible, and much more.

Now that you mentioned this, I thought I should provide a separate .visit method for handling these, like Database and Record classes have. These should come handy for C code generation too.

Comment 33 Dmitry Selyutin 2023-06-04 13:02:04 BST

Introduced "visitable" extra object and the corresponding handler. This, however, got tricky, since visitors had to track what exact instructions they visit. Please check the recent commits, I'd like have some ideas on improvements to make it less messy.

Comment 34 Luke Kenneth Casson Leighton 2023-06-04 13:04:17 BST

see https://bugs.libre-soc.org/show_bug.cgi?id=1056#c52

what is needed here is an enumerator of the instruction,
*re-creating* and back-annotating the SVP64 operands
*into* the markdown spec page.

aka *a visitor*!

it will need to provide clear visual tables showing
how to decode SVP64, it is pretty much identical to
the "verbose decode disasm debug print" option,
but it needs to be much more compact.

Comment 35 Luke Kenneth Casson Leighton 2023-06-04 13:39:44 BST

(In reply to Dmitry Selyutin from comment #32)

> Now that you mentioned this, I thought I should provide a separate .visit
> method for handling these, like Database and Record classes have. These
> should come handy for C code generation too.

i believe you may be getting to the point where my suggestion to have
the visitor-walking be much more part of each DB-class itself makes 0
sense.

in this way *at the class definition* it describes its "sub-things"

Database :

* declares to all Vistors that it has 32-bit instructions
* declares to all Vistors that it has SVP64 instructions
* declares to all Visitors that it has Fields and Forms


(in the python-html DOM/SAX visitor these are all identified by
the function names).



Record:

* declares that it has {CSV-properties}


and the Visitors walk that *entire lot* looking for corresponding
function/object-names and calling them with the data.

(In reply to Dmitry Selyutin from comment #33)

> Introduced "visitable" extra object and the corresponding handler. This,
> however, got tricky, since visitors had to track what exact instructions
> they visit. 

yes they do.  it is never easy to do visitors because the *declaration*
of the database defines how to walk it, but that declaration never
contains the data.

(we are back to things like BNF, SemanticWeb, XSLT etc *shudder*)



> Please check the recent commits, I'd like have some ideas on
> improvements to make it less messy.


https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=7ffad8f7202

 
 class BaseVisitor(Visitor):
-    def __init__(self, **_):
-        pass
+    def __init__(self, **arguments):
+        self.__arguments = types.MappingProxyType(arguments)
+        self.__current_db = None
+        self.__current_record = None
+        self.__current_extra = None
+        return super().__init__()
 
ahhh.... i need to check: a Vistor *itself* may store
state-information, but *not all visitors need to do so*.

the *walking* aspect ("how to walk") is intended to be utterly
separate from the "tracking" aspect, and for many Visitor-instances
they hold an open file-handle and the visited-functions simply
call "write" on it, using their arguments.

so the *walker* simply ends up with stuff on the stack.  lots of it.
as it drills down into more child-data.

if the Vistor *actually needed* any prior state (a parent),
it is the responsibility *of that Visitor instance on its begin function*
to make a note of it.


so i need to check: are you intending *ALL* Visitors to critically
rely on the existence of self.__current_db and so on?

because i am concerned that all three of these have been added

+        self.__current_db = None
+        self.__current_record = None
+        self.__current_extra = None

record is part of *Record* visiting.

extra is part of *Extra* visiting.

there should not even really be **arguments.
(and certainly no use of python types module)

python's standard module "types" - types.MappingProxyType(arguments)
really should not be used.

Comment 36 Dmitry Selyutin 2023-06-04 13:53:38 BST

Sorry, on the phone, so briefly.

On **arguments: this is just a way to pass CLI arguments, nothing else. Some commands operate on a single instruction and I need to know _which_ one.
On tracking db, record and extra: yes the code is, at best, shitty. Basically what I want is "only filter some specific records to be visited, where record name matches the CLI arguments". I don't know how to do it, because visitor expects all records to be visited. Another option would be to pass "parent" as part of arguments for each visit method (e.g. Record.visit might get db too, and VisitableExtra would have db and record in its visit method). But this can be somewhat clumsy too.

I'd like to keep the walking logic inside insndb and provide a way to caller to adjust how to handle the fact that something is traversed. Callback-based way suffers from the same issue, though, so it's not specific to context manager.

Comment 37 Dmitry Selyutin 2023-06-04 13:55:07 BST

Ah yes, on MappingProxyType: that's the standard way of declaring a dict which cannot be changed in Python. Kinda RO-wrapper around dict.

Comment 38 Luke Kenneth Casson Leighton 2023-06-04 14:57:02 BST

> --- Comment #36 from Dmitry Selyutin <ghostmansd@gmail.com> ---
> Sorry, on the phone, so briefly.
>
> On **arguments: this is just a way to pass CLI arguments, nothing else. Some
> commands operate on a single instruction and I need to know _which_ one.

ok so here's a really important insight into Visitors: the pattern that they use
has to be completely independent, such that a third-party does not have
to inherit from *any* of the classes used or created by anyone else.

> On tracking db, record and extra: yes the code is, at best, shitty. Basically
> what I want is "only filter some specific records to be visited, where record
> name matches the CLI arguments".

ok so you:

* create an entirely-new class, not having *any* inheritance
  from the *entire* set of code created so far;
* have its constructor store the records; def __init__(self, *args): self.args = args
* create an instance of that object
* pass it to the walker-function which performs the traversal.


> I don't know how to do it, because visitor
> expects all records to be visited. Another option would be to pass "parent" as
> part of arguments for each visit method (e.g. Record.visit might get db too,
> and VisitableExtra would have db and record in its visit method). But this can
> be somewhat clumsy too.

again: you create an entirely-new class, which has *no* inheritance
at all from the entire set of code created so far,

> I'd like to keep the walking logic inside insndb and provide a way to caller to
> adjust how to handle the fact that something is traversed.

i think what you're describing is not a Visitor pattern, it's
something else.

> Callback-based way
> suffers from the same issue, though, so it's not specific to context manager.

and yet they are still used/developed, and python programmers the world
over continue to use/develop the visitor pattern.

context-manager is actually a bit of a problem because it 


(In reply to Dmitry Selyutin from comment #37)

> Ah yes, on MappingProxyType: that's the standard way of declaring a dict
> which cannot be changed in Python. Kinda RO-wrapper around dict.

so if that's envisaged as needed then there is something very wrong with
the Visitor API.

Visitor APIs *have* to be completely independent of the data.

now, if the *use* of the Visitor API needs that, then that's their
problem, but there literally should be no types, no data, absolutely
nothing, in the *API itself*/

the *only* thing it should do is:

* call a function/method at the start, with the object at current-depth
* in a for-loop:
  - call a child-begin-function with the sub-object (and index)
  - call a child-do    function with the sub-object (and index)
  - call a child-end   function with the sub-object (and index)
* call a function/method at the end, before dropping out to prev-depth

we can look at doing "polish-notation walking" and
"reverse-polish-notation walking" later.

can i suggest going back again to the pyomo visitor code, as the
complexities of Visitor-Walking are actually really hard to
express?

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/visitor.py;h=7ad025cd096

and no, "using some other project's code as-is" is not going to work
for this project, for many reasons.  one of which is "we do not want
additional dependencies" and another is "they make assumptions about
*their* code and *their* structure which bleeds into the modules"

additionally: Visitor API walker patterns are very compact but extremely
complex code, being developed usually over many months.

Comment 39 Luke Kenneth Casson Leighton 2023-06-04 15:12:38 BST

(In reply to Dmitry Selyutin from comment #36)

> Basically what I want is "only filter some specific records to be visited,
> where record name matches the CLI arguments".

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/power_decoder.py;h=9389dd3a#l328

i pass in a col_subset (a set - by CSV-name - of which columns)
and a row_subsetfn.

row_subsetfn receives the row and returns True/False.

however the *visitor* would simply go:

     def child_data_function(self, child_index, child_data):
          if some_filter_criteria(child_data):
               return
          do_something_with_child_data(child_data)

ahhh i know what you've missed.  look again at the pyomo Visitor API:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/visitor.py;h=7ad025cd0#l97

the "enterNode()" function is where the Visitor API allows for "results"
to be accumulated - partially - and keeps all necessary state information.

  83        initializeWalker(expr) -> walk, result
  84        enterNode(N1) -> args, data
  85        {for N2 in args:}
  86          beforeChild(N1, N2) -> descend, child_result
  87            enterNode(N2) -> N2_args, N2_data
  88            [...]
  89            exitNode(N2, n2_data) -> child_result
  90          acceptChildResult(N1, data, child_result) -> data
  91          afterChild(N1, N2) -> None
  92        exitNode(N1, data) -> N1_result
  93        finalizeWalker(result) -> result

so the Visitor-callbacks receive all the information they need to "keep track".

one of the things they can "keep track" of is: an object with the
filter arguments.  anything else can also be passed "up the stack"
of the Visitor-callbacks.

Comment 40 Dmitry Selyutin 2023-06-04 15:37:29 BST

(In reply to Luke Kenneth Casson Leighton from comment #39)
> ahhh i know what you've missed.  look again at the pyomo Visitor API:

It's not that I missed it, that's a deliberate omission. With context managers, everything before yield is beforeNode; everything after is afterNode. The way I currently implemented it is that folks override the manager instead of overriding three methods. We must choose one approach, either Jacob's or html-parser-like. Jacob's looks simpler, but makes the user know that "yield" acts kinda onStuff, everything before yield is beforeStuff, and everything after is afterStuff.
On the other hand, if we don't allow overriding the manager, we can hide the walking logic, so I'm unsure what to choose. Ideas?

Comment 41 Luke Kenneth Casson Leighton 2023-06-04 16:16:57 BST

(In reply to Dmitry Selyutin from comment #40)
> (In reply to Luke Kenneth Casson Leighton from comment #39)
> > ahhh i know what you've missed.  look again at the pyomo Visitor API:
> 
> It's not that I missed it, that's a deliberate omission. With context
> managers, everything before yield is beforeNode; everything after is
> afterNode.

contextmanager - which only has constructor, __enter__ and __exit,
are actually too simple.

now, it turns out that *in some cases* (particularly things that write
directly to file-streams) you just don't need and don't care about
"ongoing state", and *in those cases* a contextmanager is perfectly
sufficient.

(these are effectively "one-pass" in compiler terminology)

but for things that i am envisioning, contextmanager is nowhere near enough.
imagine that you want to do a full language-translation (pseudocode-to-c),
and that (aieee) we do actually put iterative-node-walking down into the RTL.

but the big advantage of pyomo's Visitor walker-function is: it does *both*.


> On the other hand, if we don't allow overriding the manager, we can hide the
> walking logic, so I'm unsure what to choose. Ideas?

these two should in no way be mutually-exclusive.

as i have used Visitors for such a long time i have absolutely no
problem at all with the "there's these functions you provide and they
get called" idea.

the walking-logic should be its own completely separate and independent
function.

    https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/visitor.py;h=7ad025#l506

note, sigh, ha ha: pyomo designed the code so that you *can* inherit from
the VisitorClass.  but look carefully at the notes:

 168      Clients interact with this class by either deriving from it and
 169      implementing the necessary callbacks (see above), assigning callable
 170      functions to an instance of this class, or passing the callback
 171      functions as arguments to this class' constructor.

and another way would be to pass in an "instance which has those same methods"

Comment 42 Dmitry Selyutin 2023-06-04 16:25:18 BST

I'd like something intermediate between context managers (which are simple but not flexible) and pyomo (which perhaps powerful but overkill). How about three methods to be overriden?

Comment 43 Luke Kenneth Casson Leighton 2023-06-04 17:40:50 BST

(In reply to Dmitry Selyutin from comment #42)
> I'd like something intermediate between context managers (which are simple
> but not flexible) and pyomo (which perhaps powerful but overkill).

trust me it only looks that way because it is a new concept and use-cases
haven't been envisaged/understood yet :)

pyomo's code can actually call contextmanagers.


> How about three methods to be overriden?

ngggh, the Visitor-pattern doesn't quite work that way
(the python-html-walker doesn't either: the functions
 all conform to "handle_{database_sub-type}_{start/end}" )

okok one way to *make* it conform is for the functions to
be:

   def handle_start(DATABASE_TYPE, node, data):
   def handle_child(DATABASE_TYPE, node, child_idx, child_data):
   def handle_end(DATABASE_TYPE, node, data):

instead of handle_db_start handle_db_child handle db_end
it is

    handle_start("db",...)
    handle_child("db", ...)
    handle_en   ("db", ...)

but this way also lies another form of madness: a massive suite
of "if DATABASE_TYPE == 'db' do this else do that"
in every visitor-instance

Comment 44 Dmitry Selyutin 2023-06-04 18:03:43 BST

I meant three methods per context manager (three for db, three for record, etc.). Ok, if no objections, I'll switch to this way.

Comment 45 Luke Kenneth Casson Leighton 2023-06-04 18:31:06 BST

(In reply to Dmitry Selyutin from comment #44)
> I meant three methods per context manager (three for db, three for record,
> etc.). 

yes, i understood. didn't say it, but understood it.

> Ok, if no objections, I'll switch to this way.

none here - the only thing i will say is: please please make
absolutely certain to separate "the act of walking" from
"the visitor which is called by the walker".

(i know that pyomo has the walking-function as part of the Visitor class)

the next question to solve is: how do you "stack" them? how do you
have db know about record, how do you have record know about span/extra/etc.?
i don't mean the *data*.

and the answer to that is where the "three functions" paradigm breaks down,
and i _believe_ you're trying to solve it by placing local variables down
at the BaseVisitor... and trust me when i say that's not the right way
to write a Visitor API

really, when you have time, please do go over StreamBasedExpressionVisitor,
bear in mind it can do both DOM *and* SAX - and we will need both.

(DOM: in-place in-memory tree modifying, used for compiler AST modifying.
 SAX: streaming-only.  think "files", zero context.
      good for e.g. turning certain words into CAPITALS)

Comment 46 Luke Kenneth Casson Leighton 2023-06-04 22:36:48 BST

ah!

i think i know how to describe Visitor-walking.

* imagine that the original data (and its format) has been lost.
  there is no chance of recovery.
* imagine that you investigate and find a "trace log" - a print-out - of
  the ENTIRE database contents
  (a good example would be a "SQL DUMP")
* each new line in the trace-log has:
  - the TYPE of the thing
  - its CONTENTs
  - what its PARENT is

now write two COMPLETELY SEPARATE programs:

* program 1) can PRODUCE that trace-log
* program 2) can RECREATE the database - both its format and its data -
             purely FROM the trace-log

that's how Visitor-walking works.

------------------------------------

ah. i found another one, which says it is "so small you might as well
cut/paste it into your project"

    https://github.com/mbr/visitor

here's the tree-walk repro example:

https://github.com/mbr/visitor/blob/9b7dbe83ed2163b2cfc4424926b555bb636492b0/tests/example.py#L18

    def visit_list(self, node):
        self.indent += 1
        s = '[\n' + '  ' * self.indent
        s += (',\n' + '  ' * self.indent).join(self.visit(item)
                                               for item in node)
        self.indent -= 1
        s += '\n' + '  ' * self.indent + ']'
        return s

ahh yuk.  notice "self.visit(child_item)"?  that's exactly the kind of
thing that good Visitor APIs *don't* do: impose on the caller to explicitly
continue the recursive-descent.

(that is probably why you believe that pyomo's Visitor API is "complex"
 because they have abstracted-out the "tree-walking" part entirely
 from the "taking action" part)

yes, it really is this simple!

   https://github.com/mbr/visitor/blob/master/visitor/__init__.py

that's literally one function!

class Visitor(object):
    """Base class for visitors."""

    def visit(self, node):

        if isinstance(node, type):
            mro = node.mro()
        else:
            mro = type(node).mro()
        for cls in mro:
            meth = getattr(self, 'visit_' + cls.__name__, None)
            if meth is not None:
                return meth(node)

        raise NotImplementedError('No visitation method visit_{}'
                                  .format(node.__class__.__name__))


note absolutely zero use of typing!

Comment 47 Luke Kenneth Casson Leighton 2023-06-04 23:08:07 BST

another example:

https://stackoverflow.com/questions/1515357/simple-example-of-how-to-use-ast-nodevisitor/26374977#26374977

which is the same programming-concept as this
https://github.com/mbr/visitor/blob/master/visitor/__init__.py

but made much more generic.

that leads me to ast.py in python itself:

https://github.com/python/cpython/blob/1237fb6a4b177ce8f750949b9006c58f9f22942e/Lib/ast.py#L1667

BLECH!

    def visit_Lambda(self, node):
        with self.require_parens(_Precedence.TEST, node):
            self.write("lambda")
            with self.buffered() as buffer:
>>>>>>          self.traverse(node.args)   <<< EUURRRRRRG!
            if buffer:
                self.write(" ", *buffer)
            self.write(": ")
            self.set_precedence(_Precedence.TEST, node.body)
            self.traverse(node.body)


NOOOO! :)


i have a horrible feeling that stackoverflow answer is referring to an
older version of python.... ah yes it was

https://github.com/python/cpython/blob/2.7/Lib/ast.py

that's MUCH better.

ah HA!  see iter_fields(node).  it is *self-describing* thanks to fields,
and performs "yield"... 

  def iter_fields(node):
    for field in node._fields:
        try:
            yield field, getattr(node, field)
        except AttributeError:
            pass

and... ah HA! 

  class NodeVisitor(object):
      def visit(self, node):
          """Visit a node."""
          method = 'visit_' + node.__class__.__name__
          visitor = getattr(self, method, self.generic_visit)
          return visitor(node)

      def generic_visit(self, node):
        """Called if no explicit visitor function exists for a node."""
        for field, value in iter_fields(node):
            if isinstance(value, list):
                for item in value:
                    if isinstance(item, AST):
                        self.visit(item)
            elif isinstance(value, AST):
                self.visit(value)

that's... that's awesome.  it's abstracting-out the field-definitions
and allowing for walking of them.

ah ha!  we have _fields already in the insndb!

ta-daaaa!

Comment 48 Luke Kenneth Casson Leighton 2023-06-07 01:54:34 BST

found another one that looks extremely nice
https://github.com/realistschuckle/pyvisitor

except that on close investigation it again relies on
visitors having the "control" over the recursion/nesting-depth
by forcing the visitor class-instance to call an "accept"
method on child-objects... which i don't think is a good idea.

however overriding __call__ i think has a lot of potential
https://github.com/realistschuckle/pyvisitor/blob/master/src/visitor.py

notice again how brutally-short that code is.  and how it
does *not* "import typing" or "import types". that's really
important!

Comment 49 Luke Kenneth Casson Leighton 2023-06-07 02:07:49 BST

https://github.com/python/cpython/blob/2.7/Lib/HTMLParser.py

forget the actual html-parsing, search for instances "handle_"
in that code, and notice that HTMLParser *itself* performs
the recursive-descent walking but does not expect the *handler*
functions to do - or have any control over, or get involved with -
the *act* of walking.

there is no need at all in the *visitor* functions to perform
"self.visit(somesubnode" or even worse "for child in thing self.visit(child)"

i know i am emphasising it a lot, it's hard to express, but i
feel that "actions taken on {thing}" should be completely
separate from "act of tree-walking".

Comment 50 Luke Kenneth Casson Leighton 2023-06-07 02:30:19 BST

https://medium.com/design-patterns-in-python/visitor-pattern-b9227759d6be

ah this is really clear and very nice: it clearly accentuates and
illustrates the difference between *visiting* and *accepting*.

they call it:

    class IVisitor(metaclass=ABCMeta):
        "An interface that custom Visitors should implement"

    class IVisitable(metaclass=ABCMeta):
        """
        An interface the concrete objects should implement that allows
        the visitor to traverse a hierarchical structure of objects
        """

Visitor and Visitable.  ha!  that's the phrases i was looking for.

* at the base class of *all* the insndb classes should be an
  accept(self, visitor) function

* that function should, like python 2.7 ast.py, be capable of
  doing *full* recursive self-introspection, based on the
  insndb fields, *not* based on explicit implementation of
  accept() in *every* single insndb class (of which we have...
  100?)

the only thing "bad" about the example there is that IVisitor.visit()
is a static method.  but there is probably a good reason for it?

Comment 51 Dmitry Selyutin 2023-06-07 10:50:21 BST

I've come up with the idea based on AST example you posted, and updated the code. I think it takes best from the ideas you posted and Jacob's managers.

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=5f07a672ec7eb1cf3206f87ddc00ea10f4430a24

The idea is: all this stuff inherits from some generic Node. Each Node-derived can specify what subnodes it has (also of class Node).

The visiting algorithm is then described as:

class Visitor:
    @_contextlib.contextmanager
    def Node(self, node, depth):
        yield node
        for subnode in node.subnodes:
            manager = subnode.__class__.__name__
            manager = getattr(self, manager, self.Node)
            with manager(node=subnode, depth=(depth + 1)):
                pass

    def __getattr__(self, attr):
        return self.Node

    def __call__(self, node, depth):
        manager = node.__class__.__name__
        manager = getattr(self, manager, self.Node)
        return manager(node=node, depth=depth)

def visit(visitor, node):
    with visitor(node=node, depth=0):
        pass

Note the way it's called: __call__ yields either overriden context manager or default Node context manager (and the latter just goes over subnodes).

Then anyone can specify how the specific types are handled. Here's the visitor which filters the concrete record in database and prints its opcodes. Note that I also have __setitem__ in this _concrete_ visitor (not on top-level) which is used to access command line arguments (I could've used just **kwargs here, or call setattr(concrete_visitor, cmd_arg), or whatever).

class InstructionVisitor(BaseVisitor):
    @contextlib.contextmanager
    def Database(self, node, depth):
        yield node
        for subnode in node.subnodes:
            if subnode.name == self["insn"]:
                with self(node=subnode, depth=(depth + 1)):
                    pass

class OpcodesVisitor(InstructionVisitor):
    @contextlib.contextmanager
    def Record(self, node, depth):
        for opcode in node.opcodes:
            print(opcode)
        yield node

Comment 52 Dmitry Selyutin 2023-06-07 10:53:03 BST

Note that __getattr__ is kinda redundant: we always feed default self.Node in getattr calls. This, however, is more explicit. Just pointing that it can be reduced even more.

Comment 53 Luke Kenneth Casson Leighton 2023-06-07 11:39:34 BST

(In reply to Dmitry Selyutin from comment #52)
> Note that __getattr__ is kinda redundant: we always feed default self.Node
> in getattr calls. This, however, is more explicit. Just pointing that it can
> be reduced even more.

it looks really neat.  the amount of code really should be this short:
anything beyond 80 lines, there's something off.

i'd really like to see "Visitor" separate from "Visiting",  let me drop
a copy of the medium.com/sbcode.net code here
https://sbcode.net/python/visitor/#visitorvisitor_conceptpy

the design that you have, the "Visitor" and "Visiting" are still merged
into one: in the current code it is the *Visitor* that is performing
(explicitly) the walking, which it should not be doing.

  def Database(self, node, depth):
    yield node
      for subnode in node.subnodes: <- this is merging "Visitor" with "Visiting"

  def Record(self, node, depth):
    for opcode in node.opcodes:  <- again, "Visitor" is merged with "Visiting"
        print(opcode)

that will get complex *real* fast, approximately... 5 to 6 nested-depths of
for-loops.


here, they have the concept "accept()" which instead we want the ability
to yield from a contextmanager (i like that - "accept" sounds kinda dumb)



# pylint: disable=too-few-public-methods
# pylint: disable=arguments-differ
"The Visitor Pattern Concept"
from abc import ABCMeta, abstractmethod

class IVisitor(metaclass=ABCMeta):
    "An interface that custom Visitors should implement"
    @staticmethod
    @abstractmethod
    def visit(element):
        "Visitors visit Elements/Objects within the application"

class IVisitable(metaclass=ABCMeta):
    """
    An interface the concrete objects should implement that allows
    the visitor to traverse a hierarchical structure of objects
    """
    @staticmethod
    @abstractmethod
    def accept(visitor):
        """
        The Visitor traverses and accesses each object through this
        method
        """

class Element(IVisitable):
    "An Object that can be part of any hierarchy"

    def __init__(self, name, value, parent=None):
        self.name = name
        self.value = value
        self.elements = set()
        if parent:
            parent.elements.add(self)

    def accept(self, visitor):
        "required by the Visitor that will traverse"
        for element in self.elements:
            element.accept(visitor)
        visitor.visit(self)

# The Client
# Creating an example object hierarchy.
Element_A = Element("A", 101)
Element_B = Element("B", 305, Element_A)
Element_C = Element("C", 185, Element_A)
Element_D = Element("D", -30, Element_B)

# Now Rather than changing the Element class to support custom
# operations, we can utilise the accept method that was
# implemented in the Element class because of the addition of
# the IVisitable interface

class PrintElementNamesVisitor(IVisitor):
    "Create a visitor that prints the Element names"
    @staticmethod
    def visit(element):
        print(element.name)

# Using the PrintElementNamesVisitor to traverse the object hierarchy
Element_A.accept(PrintElementNamesVisitor)

class CalculateElementTotalsVisitor(IVisitor):
    "Create a visitor that totals the Element values"
    total_value = 0

    @classmethod
    def visit(cls, element):
        cls.total_value += element.value
        return cls.total_value

# Using the CalculateElementTotalsVisitor to traverse the
# object hierarchy
TOTAL = CalculateElementTotalsVisitor()
Element_A.accept(CalculateElementTotalsVisitor)
print(TOTAL.total_value)

Comment 54 Luke Kenneth Casson Leighton 2023-06-07 12:30:10 BST

to make Instruction "conform" to the IVisitable pattern:


class Instruction(str):
    def __new__(cls, string):
        svp64 = False
        if string.startswith("sv."):
            string = string[len("sv."):]
            svp64 = True
        self = super().__new__(cls, string)
        self.__svp64 = svp64
        return self

    @property
    def svp64(self):
        return self.__svp64

add this:

    def accept(self, visitor):
        "required by the Visitor that will traverse"
        for opcode in self.opcodes:
            opcode.accept(visitor)
        visitor.visit(self)

and if we want to extend it to include "filtering":

    def accept(self, visitor, accept_filter=lambda x:x):
        "required by the Visitor that will traverse"
        for opcode in self.opcodes:
            if accept_filter(opcode):
                opcode.accept(visitor)
        visitor.visit(self)


then, OpcodesVisitor becomes:

class OpcodesPrintVisitor:
    def Record(self, opcode):
        print(opcode)

and that's all!!

for Records:

  class Record:
     .....
    def accept(self, visitor, accept_filter=lambda x:x):
        "required by the Visitor that will traverse"
        for operand in self.dynamic_operands:
            if accept_filter(operand): operand.accept(visitor)
        for operand in self.static_operands:
            if accept_filter(operand): operand.accept(visitor)
        visitor.visit(self)


at which point it becomes possible to do this:

  class Record:
     .....
    def accept(self, visitor, accept_filter=lambda x:x):
        "required by the Visitor that will traverse"
        for field in dataclass.fields(self):
            for subthing in field:
                if accept_filter(subthing): subthing.accept(visitor)
        visitor.visit(self)

Comment 55 Dmitry Selyutin 2023-06-07 12:36:32 BST

OK, so basically the idea is that visitor calls special magic method. I have no problems with that, that's just another option to reach the same goal.

Comment 56 Luke Kenneth Casson Leighton 2023-06-07 12:42:16 BST

and Record filter for what is currently in OperandsVisitor:

    accept_filter = lambda operand: operand.name not in ("PO", "XO")

Comment 57 Luke Kenneth Casson Leighton 2023-06-07 12:45:27 BST

(In reply to Dmitry Selyutin from comment #55)
> OK, so basically the idea is that visitor calls special magic method.

there i was copying the "accept()" method from here:
https://sbcode.net/python/visitor/#visitorvisitor_conceptpy

i don't actually like the name of the function, but the *separation*
of the actual-walking, that's the important bit.

i really like the contextmanager and __call__() here, much better than
"accept()".


> I have
> no problems with that, that's just another option to reach the same goal.

it's very subtle, it's part of the "Model View Controller" paradigm.

what you have done is {ModelViewController-all-in-one}

Comment 58 Dmitry Selyutin 2023-06-07 14:20:39 BST

Another option. We visit everything always, but now can filter based on another visitor:

def visit(node, handler, matcher=Matcher()):
    node.visit(handler=handler, matcher=matcher, depth=0)

class Database:
    def visit(self, handler, matcher, depth):
        if matcher(node=self, depth=depth):
            with handler(node=self, depth=depth):
                for record in self:
                    record.visit(depth=(depth + 1),
                        handler=handler, matcher=matcher)

Here's how instruction matching looks:

class InstructionMatcher(Matcher):
    def Record(self, node, depth):
        return (node.name == self["insn"])

The trick is that we have two different visitors, one calls the check and another calls the corresponding context manager. Both resort to `return True` and `yield node` respectively.

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=84b7982235332e51ca157e8bbe500a3f1e51910e

Comment 59 Luke Kenneth Casson Leighton 2023-06-07 18:18:57 BST

(In reply to Dmitry Selyutin from comment #58)
> Another option. We visit everything always, but now can filter based on
> another visitor:

i initially went "ah ha!" but then realised that the idea
is missing the fundamental, and it's that the "walking" is
still not separate from the "action".

until the *act of walking* is separated from the "action taken on the walk"
there will be serious issues later.

here is how it is separated in ast.py:

   https://github.com/python/cpython/blob/2.7/Lib/ast.py#L203

there are three functions involved: walk, iter_child_nodes, and iter_fields.

by creating a Base Class from which all aspects of insndb derive,
it will be possible to replace "if isinstance(field, AST):" (at line
179 and others) with "if isinstance(field, InsnDB_BaseClass)"
and it will be possible to lift that recursive-walk code from ast.py
pretty much as-is.


> def visit(node, handler, matcher=Matcher()):
>     node.visit(handler=handler, matcher=matcher, depth=0)

(btw the depth argument needs to be removed.  it
 is the Visitor instance that must keep its *own*
 track of "depth")

> class InstructionMatcher(Matcher):
>     def Record(self, node, depth):
>         return (node.name == self["insn"])
> 
> The trick is that we have two different visitors, one calls the check and
> another calls the corresponding context manager. Both resort to `return
> True` and `yield node` respectively.

unfortunately whilst on the face of it i thought "ah ha! excellent"
i realised this just moves the problem of recursive-iteration from
the Visitor into the Matcher.



> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=84b7982235332e51ca157e8bbe500a3f1e51910e

yep it's increasing code-size rather than decreasing it.

the fundamental code *really* should be no greater than around
100 lines, this is the "litmus test".

Comment 60 Dmitry Selyutin 2023-06-07 21:02:01 BST

Decoupled visitors from walking:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=7de83fc89028c55a36af2210b80c426e24e6b5b9

The record filtering is now implemented in terms of concrete visitor. That's the whole code:


class Visitor:
    def __call__(self, node):
        method = node.__class__.__name__
        method = getattr(self, method, self.Node)
        return method(node=node)

    @_contextlib.contextmanager
    def Node(self, node):
        for subnode in node.subnodes:
            with self(subnode):
                pass
        yield node

class Record(Node):
    @property
    def subnodes(self):
        for (name, fields) in self.extras.items():
            yield Extra(name=name, **fields)

class Database(Node):
    @property
    def subnodes(self):
        yield from self


Obviously there's more in Record and Database to be visited, just a practical example. FWIW, here's how the filtering now looks:

class RecordNameVisitor(Visitor):
    def __init__(self, name):
        self.__name = name
        self.__records = set()
        return super().__init__()

    @contextlib.contextmanager
    def Record(self, node):
        if node.name == self.__name:
            self.__records.add(node)
        yield node

    def __iter__(self):
        yield from self.__records

match = RecordNameVisitor(name=args["insn"])
with match(node=db):
    nodes = frozenset(match)

Comment 61 Luke Kenneth Casson Leighton 2023-06-08 11:42:24 BST

(In reply to Dmitry Selyutin from comment #60)
> Decoupled visitors from walking:
> 
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=7de83fc89028c55a36af2210b80c426e24e6b5b9
> 
> The record filtering is now implemented in terms of concrete visitor. That's
> the whole code:
> 
> 
> class Visitor:
>     def __call__(self, node):
>         method = node.__class__.__name__
>         method = getattr(self, method, self.Node)
>         return method(node=node)
> 
>     @_contextlib.contextmanager
>     def Node(self, node):
>         for subnode in node.subnodes:
>             with self(subnode):
>                 pass
>         yield node
> 
> class Record(Node):
>     @property
>     def subnodes(self):
>         for (name, fields) in self.extras.items():
>             yield Extra(name=name, **fields)
> 
> class Database(Node):
>     @property
>     def subnodes(self):
>         yield from self
> 
> 
> Obviously there's more in Record and Database to be visited, just a
> practical example.

that's a *massive* amount of work, adding for-loops on every single
object on every single class.   all of it unnecessary.
here's the thing: they *all* can be done by introspection of the
fields.

you're already using dataclasses.fields():

   https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/insndb/core.py;h=2876e787#l127

in ast.py simply replace "for field in node._fields" with
                         "for field in dataclasses.fields(self)"

   https://github.com/python/cpython/blob/2.7/Lib/ast.py#L166

also in iter_child_nodes, "if isinstance(field, AST):"
replace with              "if isinstance(field, Node)

   https://github.com/python/cpython/blob/2.7/Lib/ast.py#L179

then make sure that *all* InsnDb classes inherit from Node.
a possible exception is Instruction because it inherits from
str.


you're either going to need to bite the bullet on use of "isinstance",
(please don't consider it "crap", it is pretty normal in this kind
of code, and in python in general), or come up with an awful-hack
of having a hidden identifier function/property of Node that is unique
and is *not* present in str, list, tuple, bool, int, or any other
base datatype *or class instance*, then using "hasattr()" to detect
it.

and that will violate the "Principle of Least Surprise", it is
using a non-standard pattern for a *known* expected pattern
(isinstance).  it will become a maintenance and usage nightmare.


(EDIT: removed a bit (you can see it in the history) i need to
re-think as the RecordVisitorExample *might* be ok, i need to re-read
it again.

Comment 62 Luke Kenneth Casson Leighton 2023-06-08 11:51:13 BST

https://github.com/python/cpython/blob/2.7/Lib/ast.py#L203

ah - i know how to make this do breadth-first(?) order:

def walk(node):
    from collections import deque
    todo = deque([node])
    todonext = deque()
    while todo:
        node = todo.popleft()
        todonext.extend(iter_child_nodes(node))
        yield node
        if not todo:
            todo = todonext
            todonext = deque()

and that should do breadth-first node-walking.  i am not entirely sure
how to do depth-first.. is it that you just move the "yield node" up?
i leave it with you to work out?

Comment 63 Dmitry Selyutin 2023-06-08 12:02:20 BST

(In reply to Luke Kenneth Casson Leighton from comment #61)
> (In reply to Dmitry Selyutin from comment #60)
> 
> that's a *massive* amount of work, adding for-loops on every single
> object on every single class.   all of it unnecessary.
> here's the thing: they *all* can be done by introspection of the
> fields.

You _again_ skip the rationale I posted. I'm worrying this became a _fixed_ pattern. I already told the rationale why there's a property, please finally _read_ this. TL;DR: this is too low-level yet to be used in dataclasses.fields.  If you have budget for adopting insndb for higher-level — fine, I'm OK to do it. Meanwhile even this task doesn't have the budget allocated.

> 
> you're already using dataclasses.fields():

Yes I do, because I know the low-level details of insndb. The users of visitors shouldn't.

> 
>   
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/insndb/
> core.py;h=2876e787#l127
> 
> in ast.py simply replace "for field in node._fields" with
>                          "for field in dataclasses.fields(self)"
> 
>    https://github.com/python/cpython/blob/2.7/Lib/ast.py#L166
> 
> also in iter_child_nodes, "if isinstance(field, AST):"
> replace with              "if isinstance(field, Node)
> 
>    https://github.com/python/cpython/blob/2.7/Lib/ast.py#L179
> 
> then make sure that *all* InsnDb classes inherit from Node.
> a possible exception is Instruction because it inherits from
> str.
> 
> 
> you're either going to need to bite the bullet on use of "isinstance",
> (please don't consider it "crap", it is pretty normal in this kind
> of code, and in python in general)

No it isn't for many scenarios, other than restricting the inputs. Almost any scenario can be solved either by pattern matching or polymorphism. Please don't break duck typing unless really unavoidable.

> *, then using "hasattr()" to detect

No need for hasattr. Again: duck typing and polymorphism. You already have Node class. If all fields of insndb have this type, they are traversible for free. Refer to the above.

> and that will violate the "Principle of Least Surprise", it is
> using a non-standard pattern for a *known* expected pattern
> (isinstance).  it will become a maintenance and usage nightmare.

That's basically the same thing inverted. "Propely-named" methods are no way better than isinstance in this regard.

> > FWIW, here's how the filtering now looks:
> > 
> > class RecordNameVisitor(Visitor):
> >     def __init__(self, name):
> >         self.__name = name
> >         self.__records = set()
> >         return super().__init__()
> > 
> >     @contextlib.contextmanager
> >     def Record(self, node):
> >         if node.name == self.__name:
> >             self.__records.add(node)
> >         yield node
> > 
> >     def __iter__(self):
> >         yield from self.__records
> 
> ok so this is storing all the records then replaying them?
> that's not necessary to do because the database can be
> considered static (it's not going to change).

Yes, one can walk db, collect all relevant entries, then visit these. That's be better.

> also it can be replaced with a single-line lambda function for
> a fixed match:
>    
>    matcher = lambda node: node.name == "XO"
> 
> and for one that is more generic:
> 
> class RecordMatcher: # no derivation from any other class
>    def __init__(self, name): self.name = name
>    def __call__(self, node): return self.name == node.name

There's no difference. You can affect visitor's call either way.

Comment 64 Dmitry Selyutin 2023-06-08 12:08:05 BST

TL;DR. I'm really extremely concerned that you keep avoiding to _read_ my comments. This is a severe issue, it complicates the development.

1. I'm leaving a standalone entry to __call__ where anyone can happily insert these isinstance calls.
2. I can make all (most?) fields inherit from Node, but beware that these details are way too low-level. Any sane caller won't likely use them other for enumeration; the users are expected to use high-level stuff.

Comment 65 Luke Kenneth Casson Leighton 2023-06-08 12:16:35 BST

(In reply to Dmitry Selyutin from comment #60)

>     @contextlib.contextmanager
>     def Record(self, node):
>         if node.name == self.__name:
>             self.__records.add(node)
>  --->   yield node  <---- REMOVE THIS

"yield Node" should be removed, and then the walking *which is now entirely
decoupled* uses walk() from ast.py, and it becomes:

    v = RecordVisitor("XO")
    for node in walk(db):
        v(node)

or, more to the point, those two lines "for node in walk(db): v(node)"
are moved behind a function which RecordVisitorExample inherits from,
just like in pyomo and in that IVisitor pattern

class IVisitor:
    @staticmethod
    def walk_fn(cls, db, visitor):
       for node in walk(db):
           v(node)

then it is:
    v = RecordVisitor("XO") # inherits from IVisitor
    v.walk_fn(db, v)


*no* Visitor should also perform or be involved in or responsible
for child-node-walking *at all*, at *any* time.


i appreciate this is... complex, it's some of the most ridiculously-short
python code, but unless it is short it very much gets out-of-hand very
quickly.

Comment 66 Luke Kenneth Casson Leighton 2023-06-08 21:41:07 BST

(In reply to Luke Kenneth Casson Leighton from comment #65)
> (In reply to Dmitry Selyutin from comment #60)
> 
> >     @contextlib.contextmanager
> >     def Record(self, node):
> >         if node.name == self.__name:
> >             self.__records.add(node)
> >  --->   yield node  <---- REMOVE THIS
> 
> "yield Node" should be removed, and then the walking *which is now entirely
> decoupled* uses walk() from ast.py, and it becomes:

nope, scratch that: worked it out
https://libre-soc.org/irclog/%23libre-soc.2023-06-08.log.html#t2023-06-08T12:39:16

Comment 67 Dmitry Selyutin 2023-06-09 17:25:09 BST

I've added some new types, and also refactored Dataclass type. From now on, there's no need to manually use dataclasses.dataclass decorator: we always use frozen dataclasses, since database is static, and allow check for equality, since we indeed compare many of them.

Comment 68 Dmitry Selyutin 2023-06-09 17:39:57 BST

I think the users of API will have some difficulties differentiating which stuff comes from where, especially with nested data types (e.g. dataclasses or tuple-like or similar).
Options are:
1. For dataclasses or dict-like, yield field names along with nodes. For tuples, yield index. Something else for others?
2. Have the types decoupled, so that by type it's possible to determine the origin. Seems kinda dumb and overkill.
3. Yield the node parent along with the node itself.

Ideas and suggestions are welcome.

Comment 69 Dmitry Selyutin 2023-06-09 19:10:13 BST

Option 4, which is, I think, the best one: have a full element path. That is, for example: db.record.ppc, or db.record.fields[0]. db.record.mdwn.operands[0]... and so on.

Comment 70 Dmitry Selyutin 2023-06-09 21:03:47 BST

Meanwhile I've added a functionality to walk over classes in a generic way. Something intermediate between classmethod decorator and usual method, with the except that the newly added decorator can accept either class or instance.
Example:

>>> print(tuple(Record.walk()))
(<class 'openpower.insndb.core.String'>, <class 'openpower.insndb.core.Section'>, <class 'openpower.insndb.core.PPCRecord'>, <class 'openpower.insndb.core.Fields'>, <class 'openpower.insndb.core.MarkdownRecord'>, <class 'openpower.insndb.core.SVP64Record'>)

Comment 71 Dmitry Selyutin 2023-06-09 21:04:47 BST

The overall usage looks kinda insane, so any ideas on improvements are welcome:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=61e53d70045e090620d6acccb92d344baaae0b37

Comment 72 Luke Kenneth Casson Leighton 2023-06-09 22:55:49 BST

(In reply to Dmitry Selyutin from comment #68)
> I think the users of API will have some difficulties differentiating which
> stuff comes from where, especially with nested data types (e.g. dataclasses
> or tuple-like or similar).

well, we are the users, and if we understand it (and, importanty, create
enough examples), then it's fine.  but also of course, documentation
comments etc. are also crucial.

this is compiler / language-translator technology, basically:
data-transformation.  it is always very tricky.


> Options are:
> 1. For dataclasses or dict-like, yield field names along with nodes. For
> tuples, yield index. Something else for others?

(for tuples and lists, and anything else iterable, yield the index, yes)


> 2. Have the types decoupled, so that by type it's possible to determine the
> origin. Seems kinda dumb and overkill.
> 3. Yield the node parent along with the node itself.

(3) i recall seeing in pyomo.


oh. i know a trick that can help: create the following:

* a "database printer to file-handle" Visitor
* a "database reader from file-handle" Visitor.
* a "database *comparator* Visitor.

then a unit test can be written which:

* prints out the entire database to a file
* reads the entire database from the file
* compares the database contents against the original

through the decoupling to a file we *will* know:

* firstly if the API looks sane,
* secondly if everything works, and
* thirdly have some absolutely fantastic (and complete) examples
  demonstrating how it all works.

then run it on:

* an instance of the Database class
* the Database class itself.

the latter might need a different comparator-Visitor

Comment 73 Luke Kenneth Casson Leighton 2023-06-09 23:03:55 BST

(In reply to Dmitry Selyutin from comment #70)
> Meanwhile I've added a functionality to walk over classes in a generic way.
> Something intermediate between classmethod decorator and usual method, with
> the except that the newly added decorator can accept either class or
> instance.
> Example:
> 
> >>> print(tuple(Record.walk()))
> (<class 'openpower.insndb.core.String'>, <class

NICE! (correction: frickin awesome)  is the hierarchy preserved?
Record.walk() isn't going recursively down the other classes, is it?
hang on let me check...

@_functools.total_ordering
class Record(Dataclass):
    name: String
    section: Section
    ppc: PPCRecord
    fields: Fields
    mdwn: MarkdownRecord
    svp64: SVP64Record = None

ha! cool!

Comment 74 Luke Kenneth Casson Leighton 2023-06-09 23:11:17 BST

ahhh... this may have been a misunderstanding

current:

  class ListVisitor(Visitor):
    @contextlib.contextmanager
    def __call__(self, node):
        if isinstance(node, Record):
            print(node.name)
        yield node

below is much better and readable, and it is a really beneficial
part of the API

yesterday(?):

  class ListVisitor(Visitor):
    @contextlib.contextmanager
    def Record(self, node):
        print(node.name)
        yield node

was there a technical reason to make this change?

when i was referring to use of isinstance, i was referring to the
(now decoupled) database-walking-function, not the visitors.
(now called Dataclass.walk)

Comment 75 Luke Kenneth Casson Leighton 2023-06-09 23:30:07 BST

ah. just spotted this:

  class Dataclass(metaclass=DataclassMeta):
    @walkmethod
    def walk(clsself, match=None):
        if match is None: match = lambda subnode: True
        def field_type(field): return field.type
        def field_value(field): return getattr(clsself, field.name)
        field = (field_type if isinstance(clsself, type) else field_value)
                               ^^^^^^^^^^^

ahhhmmm.... ahmhahm....  i'm not sure that this is needed.  if you are
calling the Visitor on a class (not an instance), you *know* that the
visitor-function is going to receive types, not instances.

so there will be no "mixing"...

... but if we _did_ want to mix (have for example *both* visiting of
the database-types *and* visiting of the database-instances) then
one way to handle that would be.... [going back to the old Visitor API]

class ListVisitor(Visitor):
    @contextlib.contextmanager
    def Record(self, node):
        print(node.name)
        yield node
           -- vvvv ---
    def RecordType(self, node):
           -- ^^^^ ---
        print(node.type)
        yield node

it's.... six of one and half a dozen of the other: one option would be
just to call two separate Visitors.

* the first one to print out a c header file,
* the second one to print out the c source file.

but i kinda like the idea of the Visitor being (optionally) able to
handle both (and the same Visitor instance be able to output both
the header and source file)

Comment 76 Luke Kenneth Casson Leighton 2023-06-09 23:45:57 BST

(In reply to Dmitry Selyutin from comment #69)
> Option 4, which is, I think, the best one: have a full element path. That
> is, for example: db.record.ppc, or db.record.fields[0].
> db.record.mdwn.operands[0]... and so on.

mmmm.... this kind of thing is what the Visitor should do (itself),
by having all of the functions

class BiggerVisitor(BaseVisitor):
    def __init__(self):
        self.stack_of_stuff = []
    @contextlib.contextmanager
    def Record(self, node, depth):
        self.stack_of_stuff.append(node.name)
        print(".".join(self.stack_of_stuff))
        yield node
        self.stack_of_stuff.pop(-1) # i think
    def Opcode(self, node, depth):
        self.stack_of_stuff.append(node.name)
        print(".".join(self.stack_of_stuff))
        yield node
        self.stack_of_stuff.pop(-1) # i think
  
urrr... those are both the same.  urrr....      

ahhhh... i think i know why you replaced def Record (etc) with def __call__,
because the above is just this, because the two do the same thing
    
    def __call__(self, node):
        self.stack_of_stuff.append(node.name)
        print(".".join(self.stack_of_stuff))
        yield node
        self.stack_of_stuff.pop(-1) # i think

hmmmm hmmm hmmm....   this is solved in python 2.7 ast.py with
a getattr(), isn't it?

https://github.com/python/cpython/blob/2.7/Lib/ast.py#L237

    def visit(self, node):
        """Visit a node."""
        method = 'visit_' + node.__class__.__name__
        visitor = getattr(self, method, self.generic_visit)
        return visitor(node)

the equivalent here (for us) would be:

        visitor = getattr(self, method, self.__call__)

ahhhh that's it.  that's the solution.  demonstrating back in the original
(from yesterday?):

@_dataclasses.dataclass(eq=True, frozen=True)
class Dataclass:
    def subnodes(self, match=None):
        if match is None:
            match = lambda subnode: True

        def subnode(field):                  vvvvvvvvvvvvvvv
            return getattr(self, field.name, field.__call__)
                                             ^^^^^^^^^^^^^^^
        yield from filter(match, map(subnode, _dataclasses.fields()))

that will cope with both... is that right?


class Dataclass(metaclass=DataclassMeta):
    @walkmethod
    def walk(clsself, match=None):
        if match is None:
            match = lambda subnode: True

        def field_type(field):
            return field.type

        def field_value(field):                 vvvvvvvvvvvvvv
            return getattr(clsself, field.name, field.__call__)
                                                ^^^^^^^^^^^^^^
        field = (field_type if isinstance(clsself, type) else field_value)

        yield from filter(match, map(field, _dataclasses.fields(clsself)))

Comment 77 Dmitry Selyutin 2023-06-09 23:54:28 BST

I'm too tired today to check the comments in details, I'll address all these later, OK (maybe tomorrow)? I've just finished a sketch for paths, but it's so ugly that it doesn't deserve to be put in master, cf. paths branch.
There's a `pysvp64db tree` command which shows what I want. Some entities are not of Node type yet, so the output is far from perfect, but demonstrates the idea.

Everything else later, cannot even think straight.

Comment 78 Dmitry Selyutin 2023-06-10 00:02:43 BST

Caveat: don't treat anything you see in paths branch seriously, this got so messy and fricking ugly I'm thinking of some other approach. I'll check your ideas with stacks too, on the first glance this should work, but only if Node class has some field property (name in your example, but this doesn't fit, since some classes have such field too).
But I'd like to avoid properties and reuse field.name from dataclasses. No idea how to do it in a graceful way.
The code in paths branch should really be put out of misery.

Comment 79 Luke Kenneth Casson Leighton 2023-06-10 00:26:46 BST

(In reply to Dmitry Selyutin from comment #77)
> I'm too tired today to check the comments in details, I'll address all these
> later, OK (maybe tomorrow)? 

i wasn't expecting you to be up!!  yes definitely not tonight.

> I've just finished a sketch for paths, but it's
> so ugly that it doesn't deserve to be put in master, cf. paths branch.

:)

> There's a `pysvp64db tree` command which shows what I want.

likewise will look tomorrow :) but when you are awake i think
you'll find that what you want is likely achievable with the
"stack_of_stuff" from comment #76. 

> Some entities
> are not of Node type yet, so the output is far from perfect, but
> demonstrates the idea.

fantastic.

> Everything else later, cannot even think straight.

take it easy ok? i know the lure of "LCD screen" substituting
for "sunlight" very well :)

Comment 80 Luke Kenneth Casson Leighton 2023-06-10 00:41:49 BST

(In reply to Luke Kenneth Casson Leighton from comment #79)

> likewise will look tomorrow :) but when you are awake i think
> you'll find that what you want is likely achievable with the
> "stack_of_stuff" from comment #76. 

ok yes it was easy to check - took a look: definitely "no" :)

it is perfectly reasonable to have the Visitor itself be
aware of the "order" of things, and to do "stuff" such as
push the node onto a stack that is *local* to the Visitor.

to reproduce what you did, i believe it is:

    def __call__(self, node):
        print("/".join(self.stack_of_stuff), node)
        self.stack_of_stuff.append(node.name)
        yield node
        self.stack_of_stuff.pop(-1) # i think

as simple as that.

* no need for pathlib when "/".join() will do (complete overkill)
* the names are accumulated as the __call__ goes down the depth

basically if there is a temptation to add any parameters (like the
depth parameter), it should be resisted :)

the *visitor* function becomes responsible for *using* the fact
that it is being called.

the only good additional parameter, as jacob pointed out, would be
to have the node index.

    def __call__(self, node, node_idx):
    def Record(self, node, node_idx):
    ...

Comment 81 Luke Kenneth Casson Leighton 2023-06-10 00:55:19 BST

   class String(Node, str):
        pass

class PPCRecord(Dataclass):
-    comment2: str = ''
+    comment2: String = String()

i really don't think this is a good route to go down (converting _all_
types to inherit from Node): indirectly it implies ultimately that
SelectableInt and FieldSelectableInt must derive from Node and that
is really not a good idea.

(plus, bool cannot be inherited, and we need it).

it's... if the basic types work (str, int, bool) and do the same
job, then creating a class to do the same job... yyeah :)

Comment 82 Dmitry Selyutin 2023-06-10 09:21:59 BST

Long story short. I share your opinion almost on everything, but I had to at least probe this route to understand its pros and cons.
The only exception is generic __call__. This actually comes from your isinstance hints (about another section?), but I realized that relying on class name is too fragile. isinstance refers to an object (yes, types are objects too in Python); name just refers to some string. Whilst it looks simpler, it's indeed less intuitive than explicit checks. If somebody wants this functionality (per-class context managers), they can do it directly in concrete visitors:

class WeirdVisitor(Visitor):
    @contextlib.contextmanager
    def __call__(self, node):
        method = getattr(self, node.__class__.__name__, self.Node)
        return method(node)

    @contextlib.contextmanager
    def Node(self, node):
        raise 666

    @contextlib.contextmanager
    def Record(self, node):
        raise 42

TL;DR: I'm removing paths branch. However, there's still one issue with the approach you suggest.

    def __call__(self, node):
        print("/".join(self.stack_of_stuff), node)
        self.stack_of_stuff.append(node.name)
        yield node
        self.stack_of_stuff.pop(-1) # i think

I'f we're going to drop the Node-inherited classes (Path, String and similar stuff), there's no way we could call node.name (or, again, some other generic field, because "name" is occupied). `str` or `bool` don't have such interface.

As for bool inheritance, that's not a problem: whilst bool isn't inheritable, int is.

Comment 83 Dmitry Selyutin 2023-06-10 09:27:14 BST

My point is, inheriting all stuff from Node gives this:
1. No need to check walk method availibility via isinstance/try+catch/hasattr. We can _assume_ everything "walkable" is Node-derived.
2. We can therefore extend Node-based stuff without problems. The first extension might be Node.path (or whatever name we can invent for it).
3. Some things can be stubs or generic (cf. Nodes.walk, Dataclass.walk; perhaps I'll add some stuff for Tuple-like, because we have several tuple-based things which should also be walkable).
4. It's just logical that everything which can be used in walk/visit API corresponds to some interface.

Comment 84 Dmitry Selyutin 2023-06-10 09:30:28 BST

Ah, and one note about a single __call__. Even though it's an unlikely scenario, two classes might share the same name (e.g. one is on global level and other is class-nested). What'd be the callbacks then? So, all in all, I think that, as much as I don't like isinstance checks, they're much better than checking class names.

Comment 85 Luke Kenneth Casson Leighton 2023-06-10 10:04:46 BST

(In reply to Dmitry Selyutin from comment #82)
> Long story short. I share your opinion almost on everything, but I had to at
> least probe this route to understand its pros and cons.
> The only exception is generic __call__. This actually comes from your
> isinstance hints (about another section?),

this is where the misunderstanding is: i was talking at the time
about ast.py iter_child_nodes use of isinstance:

def iter_child_nodes(node):
    for name, field in iter_fields(node):
        if isinstance(field, AST): yield field
        elif isinstance(field, list):
            for item in field:
                if isinstance(item, AST):  yield item

* you solved the first use there by doing contextmanager.
* you solved the second use by doing Node.subnodes (now Nodes.walk)
* the third instance could again use another contextmanager

whereas i was talking about the use in the *visitor*, which in
ast.py is this:
https://github.com/python/cpython/blob/2.7/Lib/ast.py#L240

  def visit(self, node):
        method = 'visit_' + node.__class__.__name__
        visitor = getattr(self, method, self.generic_visit)
        return visitor(node)

which for us would be:

  def visit(self, node):
        method = 'visit_' + node.__class__.__name__
        visitor = getattr(self, method, self.__call__)
        return visitor(node)


> but I realized that relying on
> class name is too fragile.

it's not "fragile", it's the best aspect of the entire visitor pattern
when applied to python.

plus it breaks the "Principle of Least Surprise".  *everyone* who has ever
used Visitors in python expects to have function-names in the Visitor named
after the "data type of the database object being walked".

you are proposing to break that expectation in effect.

i don't mind self.__call__ being the default for 

> isinstance refers to an object (yes, types are
> objects too in Python); name just refers to some string. Whilst it looks
> simpler, it's indeed less intuitive than explicit checks. If somebody wants
> this functionality (per-class context managers), they can do it directly in
> concrete visitors:
> 
> class WeirdVisitor(Visitor):
>     @contextlib.contextmanager
>     def __call__(self, node):
>         method = getattr(self, node.__class__.__name__, self.Node)
>         return method(node)

the *very* first thing that i will do on creating *any* visitors is
precisely and exactly that, copying that exactly and *only* using
that!  and in all documentation i will strongly recommend using
that and not the Visitor!


>     @contextlib.contextmanager
>     def Node(self, node):
>         raise 666
> 
>     @contextlib.contextmanager
>     def Record(self, node):
>         raise 42
> 
> TL;DR: I'm removing paths branch. However, there's still one issue with the
> approach you suggest.
> 
>     def __call__(self, node):
>         print("/".join(self.stack_of_stuff), node)
>         self.stack_of_stuff.append(node.name)
>         yield node
>         self.stack_of_stuff.pop(-1) # i think
> 
> I'f we're going to drop the Node-inherited classes (Path, String and similar
> stuff), there's no way we could call node.name (or, again, some other
> generic field, because "name" is occupied). `str` or `bool` don't have such
> interface.

then the *Visitor* should do "if isinstance(node, str) self.stack.append(node)"

again, like the "depth" parameter, the walker-function should not be
compromised by what the *Visitor* does.

> As for bool inheritance, that's not a problem: whilst bool isn't
> inheritable, int is.

you see how going down that path is a danger?  using bool is clear about
the intent.

additionally i would really like this code to be useable elsewhere in the
project, and even a separate git repo created for it (and put onto pypi).

in particular at some point i would like to replace the existing nmigen
visitor-pattern (which is the usual "faulty" design) with this one.


(In reply to Dmitry Selyutin from comment #84)
> Ah, and one note about a single __call__. Even though it's an unlikely
> scenario, two classes might share the same name (e.g. one is on global level
> and other is class-nested).

that would be completely confusing and seriously inadviseable from a project
maintainability perspective!

as an outside possibility that will immediately get picked up by the user
when they find that they can't call a Visitor on that nested-class, they'll
soon change the name.

with Visitors being such a huge integral part of a good AST-walking /
Language-translator / Compiler API, "users" (us at the moment but
other people in future if we do drop this onto pypi) *will* keep a clean
global namespace.

think about HTML/DOM: imagine the total chaos if HTML's DOM
(Domain Object Model) had an object-type that was locally different
from the global type namespace, somewhere deep inside the structure.

if DOM.Window was different from DOM.Document.Window for example.

or if c had a different meaning for "struct" just because it was
inside a "switch" statement?

part of a clean AST / language / compiler is verrrry much about picking
a clean global namespace - *definitely* don't do nested classes in
insndb!!

again (just on nested-classes): "Principle of Least Surprise".

(i aim to propose to use insndb for work inside the ISA Working Group,
for walking the specification document which is now in latex)

Comment 86 Dmitry Selyutin 2023-06-10 10:07:47 BST

I just realized. No need to inherit everything in code from Node, unless required. We can just yield a proxy object, which wraps the original one but is inherited from Node.

Comment 87 Dmitry Selyutin 2023-06-10 10:11:29 BST

About methods named as classes. If can you propose something that relies on _objects_, not _names_, I'm fine with that. If not -- that's a strong no. No this does not comply with principle of a least surprise, it breaks it, in fact, even more than isinstance. The method generation should at least be hidden and not be explicit. I'll think how to handle it, but this should not be based on names from caller's perspective.

Comment 88 Luke Kenneth Casson Leighton 2023-06-10 10:29:00 BST

(In reply to Dmitry Selyutin from comment #86)
> I just realized. No need to inherit everything in code from Node, unless
> required. We can just yield a proxy object, which wraps the original one but
> is inherited from Node.

eeehmmm... ehmehmehm... i'd love to see that in action, i can sort-of
subconsciously / intuitively see it would work but i don't know how :)

(In reply to Dmitry Selyutin from comment #87)

> About methods named as classes. If can you propose something that relies on
> _objects_, not _names_, I'm fine with that.

think it through: if we publish this as a generic pypi package
(and assume that the users are intelligent), will they freak out
at needing to inherit everything from "class Node" in their
project?

(remember i am considering replacing nmigen's Visitor-pattern with
 this code)

> If not -- that's a strong no. No
> this does not comply with principle of a least surprise, it breaks it, in
> fact, even more than isinstance. The method generation should at least be
> hidden and not be explicit. I'll think how to handle it, but this should not
> be based on names from caller's perspective.

yyeah, i think we start to see why pyomo has a list of "builtin types",
and also why they have "register this class type with the walker system"
functions:

https://github.com/Pyomo/pyomo/blob/main/pyomo/common/numeric_types.py#L70

native_types is used back in the visitor code (in their equivalent
of the tree-walk function) to decide what to yield to the Visitor:

https://github.com/Pyomo/pyomo/blob/main/pyomo/core/expr/visitor.py#L1013

        elif type(child) in native_types:
            return False, child

actually... "type(child) in native_types" might be the right thing to do, here,

class Dataclass(metaclass=DataclassMeta):
    @walkmethod
    def walk(clsself, match=None):

        ...
        field = (field_type if isinstance(clsself, type) else field_value)
        yield from filter(match, map(field, _dataclasses.fields(clsself)))

-->

        field = (field_value if type(child) in native_types else
                 (field_type if isinstance(clsself, type) else
                  field_value))
        yield from filter(match, map(field, _dataclasses.fields(clsself)))

would that work? i'm running on instinct here.

Comment 89 Luke Kenneth Casson Leighton 2023-06-10 11:16:29 BST

(In reply to Dmitry Selyutin from comment #87)
> About methods named as classes. If can you propose something that relies on
> _objects_, not _names_, I'm fine with that. 

okaaay, i grasp this now.  some clues: i wondered if it was possible
to just assume that dataclasses are iterable, because if they are
then Node.subnodes may not even be necessary!

https://stackoverflow.com/questions/74393947/make-python-dataclass-iterable
https://github.com/ericvsmith/dataclasses/issues/21

drat, that won't work (unless first doing "getattr(node, 'astuple')",
plus __iter__ is unlikely to work on objects.

* bool is a singleton type so could be special-cased
  https://stackoverflow.com/a/2172204

    field = field_value if isinstance(field, bool) else...

* list/tuple/str/int likewise...

no it really is looking like this is the right thing to do here:

  field = field_value if \
          type(field) in [bool,list,tuple,str,int,float,complex,bytes]) else ...


wait... no... not list/tuple and not field_value
  
  field = field if \
          type(field) in [bool,str,int,float,complex,bytes]) else ...

and for list/tuple to be walked separately... ahhh *that's* why python ast.py
did "elif isinstance(field, list):"

https://github.com/python/cpython/blob/2.7/Lib/ast.py#L181

so the order of "field = ...." should be something like:

* bool first (as a singleton type) yields... *field* not field_value
* type(field) in [str,int,foat,complex,bytes] also yield field
* if class-type is a dataclass, use the dataclass.fields trick
  (field_type+field_value)
* otherwise just iterate on it.

something like this:

class Dataclass(metaclass=DataclassMeta):
    @walkmethod
    # https://www.google.com/search?q=python+are+lambda+functions+hashable
    # lambda is a singleton, so there is no danger
    def walk(clsself, match=lambda subnode:True):
        def field_type(field):            return field.type
        def field_value(field):           return getattr(clsself, field.name)

        if type(clsself) in [bool,str,int,float,complex,bytes]: yield clsself
        # https://www.programiz.com/python-programming/methods/built-in/issubclass
        # this will work because "class is considered a subclass of itself"
        elif issubclass(clsself, dataclass):
            # i prefer the "getattr(field, ...., field.__call__)" thing be used
            field = (field_type if isinstance(clsself, type) else field_value)
            yield from filter(match, map(field, _dataclasses.fields(clsself)))
        else:
            yield from clsself # assumes that the object is iterable somehow

something like that?

combined with the proxy-idea i *think* that code becomes fully-independent
of the class types...

it's almost there :)

Comment 90 Luke Kenneth Casson Leighton 2023-06-10 13:23:19 BST

okok i have an idea.  first, add visitor and match to visit fn:

 123 def visit(visitor, node, match=None):
         if not match(node): return
 124     with visitor(node=node):
 125         if hasattr(node, "walk"):
 126             for subnode in node.walk(visitor, match):
 127                 visit(visitor=visitor, node=subnode, match=match)

(match missing would have been spotted soon enough, the
 key is adding visitor as an arg to Node.walk() and to def visit())

now add it to walk:

  92 class Dataclass(metaclass=DataclassMeta):
  93     @walkmethod
  94     def walk(clsself, visitor, match=None):

and now, ta-daaa! this can be done:

         if not match(clsself): return # not the clever bit yet...
         yield clsself
         yield from filter(match, visitor.get_any_child_nodes(clsself))
   
ta-daaa! and the rest moves into Visitor:

      
class Visitor:
     ....

     def get_any_child_nodes(self, node):
         if type(node) in [bool, str, float, complex, bytes]:
             yield node
         elif ...

and in this way anyone may override that, do whatever the hell they like,
we don't care :)

ahh this is good, it moves the existence of Node out where it
belongs: in Visitor, which is what knows about the format of
the Database.

as long as nobody calls any Database class "get_any_child_nodes"
it is fine :)

but we also start to see why python HTMLlib used "do_" as a prefix,
and why ast.py has "visit_....." as a prefix.

hmmm that's probably a good practice, with *all* Visitors.
otherwise a search you wonder "what the heck def Record and
class Record, moo??"

Comment 91 Dmitry Selyutin 2023-06-10 14:11:19 BST

Luke, you again contradict to your own words. Please *re-read* what you wrote before. Passing *visitors* is not part of walking.

Comment 92 Dmitry Selyutin 2023-06-10 14:17:00 BST

That's what I suggest instead all of these "let's exclude special types", "let's invent a magic method which has class name", etc.

$ cat interface.py
class Visitor:
    @contextlib.contextmanager
    def __call__(self, node):
        (visitorcls, nodecls) = map(type, (self, node))
        mgr = visitor.registry.get(visitorcls, {}).get(nodecls, None)
        if mgr is not None:
            with mgr(self, node=node) as ctx:
                yield ctx
        else:
            yield node

def visitor(visitorcls, nodecls):
    def decorator(fn):
        mgr = contextlib.contextmanager(fn)
        visitor.registry[visitorcls][nodecls] = mgr
        return Visitor.__call__
    return decorator

visitor.registry = collections.defaultdict(dict)

$ cat implementation.py
class StubVisitor(Visitor):
    pass

@visitor(StubVisitor, str)
def coco(self, node):
    print("BY GOD IT'S A STRING!!!", node)
    yield node

@visitor(StubVisitor, int)
def jamboo(self, node):
    print("DONNERWETTER WE HAVE AN INTEGER!", node)
    yield node

stub = StubVisitor()
with StubVisitor().__call__("hi Luke"):
    pass
with StubVisitor().__call__(42):
    pass

Comment 93 Dmitry Selyutin 2023-06-10 14:19:35 BST

The only disadvantage that such handlers cannot be done inline in the class, and the visitor class must be passed along with the node class to be hooked. The decorator cannot deduce the wrapped class from the method itself, since by the time the decorator is called these methods are not bound yet.

However, this is damn simple, straightforward, and never ever relies on class names, it relies on objects. And any type is an object in Python.

Comment 94 Dmitry Selyutin 2023-06-10 14:22:43 BST

Oh, and by the way, if some weirdo ever wants to override the way the default __call__ behaves, they can override __call__ in the derived class. Exactly like those visit and generic_visit in AST.

Comment 95 Dmitry Selyutin 2023-06-10 14:32:16 BST

Oh, and yes, if visitor class is passed into the decorator, there's no need for "registry" to be global: it can be per-visitor-class.

Comment 96 Dmitry Selyutin 2023-06-10 15:06:33 BST

class VisitorMeta(type):
    def __init__(cls, name, bases, ns):
        cls.__registry = {}
        return super().__init__(cls)

    def __contains__(self, nodecls):
        return self.__registry.__contains__(nodecls)

    def __getitem__(self, nodecls):
        return self.__registry.__getitem__(nodecls)

    def __setitem__(self, nodecls, call):
        return self.__registry.__setitem__(nodecls, call)

    def __iter__(self):
        yield from self.__registry.items()


class Visitor(metaclass=VisitorMeta):
    @contextlib.contextmanager
    def __call__(self, node):
        (visitorcls, nodecls) = map(type, (self, node))
        if nodecls in visitorcls:
            with visitorcls[nodecls](self, node=node) as ctx:
                yield ctx
        else:
            yield node


class visitormethod:
    def __init__(self, visitorcls, nodecls):
        self.__visitorcls = visitorcls
        self.__nodecls = nodecls
        return super().__init__()

    def __call__(self, call):
        self.__visitorcls[self.__nodecls] = contextlib.contextmanager(call)
        return Visitor.__call__

Comment 97 Dmitry Selyutin 2023-06-10 15:07:43 BST

The patch above allows per-class registration, makes any visitor class object behave like dict (e.g. StubVisitor[str] will give the hook), enables iteration, etc.

Comment 98 Luke Kenneth Casson Leighton 2023-06-10 15:13:31 BST

(In reply to Dmitry Selyutin from comment #91)
> Luke, you again contradict to your own words. Please *re-read* what you
> wrote before. Passing *visitors* is not part of walking.

that was me trying to say "when the visitor function was itself
performing the walk" rather than it being the separate function
it is now (not very well but we got there)

a "well-defined collaboration with default behaviour" between Visitor
and walker-function (def visit()) is perfectly fine.... but... ha!
don't need it!

love this:

    @visitor(StubVisitor, str)
    def coco(self, node):
        print("BY GOD IT'S A STRING!!!", node)
        yield node

oo does that.... oo it separates the thing from the thing, oooo.
does this work?

    class StubVisitor(Visitor):
        def __init__(self):
             self.stuff = "something"
        @visitor(StubVisitor, str)
        def coco(self, node):
            print(self.stuff, "BY GOD IT'S A STRING!!!", node)
            yield node

can this be made to work without needing the 1st argument?
        @visitor(str)

registry as a global variable will break things, it needs to be
part of Visitor itself?

class Visitor:
     registry = defaultdict()
     def visitor(cls, nodecls):
          registry(....)

but for inheritance, sigh, the registry dict should be separate?
ohhhh you made it double-nested, a dict-of-dicts, niiice

more later, need food.

Comment 99 Dmitry Selyutin 2023-06-10 15:27:04 BST

I'll come up with the patch to inline methods later. Note that the methods will need to have different names; the following won't work:

class StubVisitor(Visitor):
    @visitormethod(str)
    def __call__(self, node):
        print("string", node)
        yield node

    @visitormethod(int)
    def __call__(self, node):
        print("integer", node)
        yield node

...but the following will:

class StubVisitor(Visitor):
    @visitormethod(str)
    def coco(self, node):
        print("string", node)
        yield node

    @visitormethod(int)
    def jamboo(self, node):
        print("integer", node)
        yield node

The names can be whatever you like, they just need to be distinct.

Comment 100 Luke Kenneth Casson Leighton 2023-06-10 15:32:11 BST

(In reply to Dmitry Selyutin from comment #94)
> Oh, and by the way, if some weirdo ever wants to override the way the
> default __call__ behaves, they can override __call__ in the derived class.
> Exactly like those visit and generic_visit in AST.

as long as the entire codebase is not full of "isinstance" checks,
worse, "if isinstance elif isinstance elif isinstance" i don't
mind.

ohh i get it.

the decorator and especially the defaultdict *does the same job*
as the isinstance(s).

holy cow that's powerful.

btw try this:

        mgr = visitor.registry.get(visitorcls).get(nodecls, None)

...
visitor.registry = collections.defaultdict(collections.defaultdict)

Comment 101 Luke Kenneth Casson Leighton 2023-06-10 15:34:42 BST

(In reply to Dmitry Selyutin from comment #99)
> I'll come up with the patch to inline methods later. Note that the methods
> will need to have different names; the following won't work:
> 
> class StubVisitor(Visitor):
>     @visitormethod(str)
>     def __call__(self, node): ...
>     @visitormethod(int)
>     def __call__(self, node): ...

assert not in dict already (overwrite not permitted).

> The names can be whatever you like, they just need to be distinct.

yeah totally got it. love it.

Comment 102 Luke Kenneth Casson Leighton 2023-06-10 17:01:24 BST

from https://realpython.com/python-metaclasses/:

class VisitorMeta(type):
    def __new__(cls, name, bases, dct):
        x = super().__new__(cls, name, bases, dct)
        x.registry = collections.defaultdict
        return x

class Visitor(metaclass=VisitorMeta): pass
class Visitor2(Visitor): pass

Visitor.registry[5] = 2
Visitor2.registry[9] = 7

print (Visitor.registry)   # {5: 2}
print (Visitor2.registry)   # {9: 7}

ta-daaaaa :)

Comment 103 Luke Kenneth Casson Leighton 2023-06-10 17:05:28 BST

(In reply to Dmitry Selyutin from comment #96)
> class VisitorMeta(type):
>     def __init__(cls, name, bases, ns):
>         cls.__registry = {}
>         return super().__init__(cls)

this is placing the registry in *VisitorMeta*.
it needed to be the other way round:

        x = super().__new__(cls, name, bases, dct)
        x.registry = collections.defaultdict
        return x

(remember also to pass *all* the arguments to type() otherwise it will
 (a) have no base classes (b) have no functions... i mean *maybe* this
 is good? doesn't seem like it, given that there are other functions
 in VisitorMeta)

oh wait... you meta'd __init__ - that python-metaclasses meta'd __new__.

Comment 104 Dmitry Selyutin 2023-06-10 17:08:53 BST

(In reply to Luke Kenneth Casson Leighton from comment #103)
> this is placing the registry in *VisitorMeta*.
> it needed to be the other way round:

No it is not. The relationship between __new__ and __init__ in metaclasses the same as between __new__ and __init__ in classes. In fact, metaclasses are classes too. Just use `print(cls)`.

Comment 105 Dmitry Selyutin 2023-06-10 17:12:12 BST

Pushed the holy-cow implementation.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=8a43ffad8d938f83cb4ac6c9963a404e0d8ab38b
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=61cc08f2d11fce0989db4fb4d502e14efc56d80a

Took more code than you like, but most of it are just convenience wrappers (e.g. some methods in VisitorMeta) and checks (e.g. checking `callable` or `isinstance(nodecls, type)`). This is just so that users don't shoot their feet and at the same time could use some cool tricks (listing all hooks the visitor has, checking whether there's a hook for some type, etc.).

Comment 106 Dmitry Selyutin 2023-06-10 19:04:23 BST

The proxy object I mentioned before is a bad option: it'll break our visitor checks (because proxy object is a new type).

A new "path" property is a bad option too: it needs awful hacks with frozen dataclasses which don't allow overriding attributes after instantiation, and, what's worse, doesn't work with builtins (e.g. setting var.field fails if var is of str type). Not an option either.

I'm still looking for options on how to yield fields which have names without refactoring the walking process. But I'd say I don't have an inspiration yet.

Comment 107 Luke Kenneth Casson Leighton 2023-06-10 19:26:50 BST

"overriding visitor method:"

->

"overwriting existing visitor method"

Comment 108 Dmitry Selyutin 2023-06-10 19:39:29 BST

I had to introduce some stuff like Tuple type (like with Dataclass class, it's just used to avoid duplicating the walk() method code).
I've added the `pysvp64db tree` command. We're almost here. It handles not all fields yet, but many of them. The depth tracking is indeed easy; but I'm not sure how to handle paths. If only we had something in each node to be accessed...

Comment 109 Dmitry Selyutin 2023-06-10 20:05:16 BST

OK, take 2: I committed a simplified version of walkers which yield paths too. But, this time, the path of the parent is not passed to the children walk algorithm. Each component just yields paths to its children nodes along with the nodes itself, and passes these to visitors. Luke, please take a look, I have no idea how to do it better, but this at least seems cleaner than the nonsense I posted yesterday.
You can try it with `pysvp64db tree` command.

Comment 110 Dmitry Selyutin 2023-06-10 20:06:12 BST

If this commit still looks like nonsense, we can revert it. But I think this is at least acceptable.

Comment 111 Dmitry Selyutin 2023-06-10 20:29:13 BST

Meanwhile made tree more beautiful (truncated output a bit to keep comment simple, can be reproduced via `pysvp64db tree`):

/0
/0/name
    'tdi'
/0/section
/0/section/csv
    PosixPath('major.csv')
/0/section/bitsel
    [0:5]
/0/section/suffix
    None
/0/section/mode
    <Mode.INTEGER: 1>
/0/section/opcode
    None
/0/section/priority
    <Priority.NORMAL: 0>
/0/ppc
/0/ppc/0
/0/ppc/0/opcode
/0/ppc/0/opcode/value
    00000000000000000000000000000010
/0/ppc/0/opcode/mask
    00000000000000000000000000111111
/0/ppc/0/comment
    'tdi'
/0/ppc/0/flags
/0/ppc/0/flags/0
    '32b'
...
/0/fields
    {'BF': (6, 7, 8), 'D': (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31), 'FRS': (6, 7, 8, 9, 10), 'FRT': (6, 7, 8, 9, 10), 'L': (10,), 'RA': (11, 12, 13, 14, 15), 'RS': (6, 7, 8, 9,
 10), 'RT': (6, 7, 8, 9, 10), 'SI': (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31), 'TO': (6, 7, 8, 9, 10), 'UI': (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)}
/0/mdwn
/0/mdwn/pcode
/0/mdwn/operands
    {'TO': (<class 'openpower.insndb.core.DynamicOperand'>, {'name': 'TO'}), 'RA': (<class 'openpower.insndb.core.GPROperand'>, {'name': 'RA'}), 'SI': (<class 'openpower.insndb.core.SignedOperand'>, {'
name': 'SI'})}
/0/svp64
/0/svp64/name
    'tdi'
/0/svp64/ptype
/0/svp64/ptype
    1P
/0/svp64/etype
    <SVEType.EXTRA2: 1>
/0/svp64/msrc
    <SVMaskSrc.NO: 0>
/0/svp64/in1
    <In1Sel.RA: 1>
/0/svp64/in2
    <In2Sel.NONE: 0>
/0/svp64/in3
    <In3Sel.NONE: 0>
/0/svp64/out
    <OutSel.NONE: 0>
/0/svp64/out2
    <OutSel.NONE: 0>
/0/svp64/cr_in
    <CRInSel.NONE: 0>
/0/svp64/cr_in2
    <CRIn2Sel.NONE: 0>
/0/svp64/cr_out
    <CROutSel.NONE: 0>
/0/svp64/extra
    {0: [], 1: [], 2: [], 3: []}
/0/svp64/conditions
    ''
/0/svp64/mode
    <SVMode.NORMAL: 1>

Comment 112 Luke Kenneth Casson Leighton 2023-06-10 20:46:08 BST

(In reply to Dmitry Selyutin from comment #109)
> OK, take 2: I committed a simplified version of walkers which yield paths
> too. But, this time, the path of the parent is not passed to the children
> walk algorithm. Each component just yields paths to its children nodes along
> with the nodes itself, and passes these to visitors.

yep no on paths, the visitor instance should be tracking the
path, not the walker. the only thing the walker might also pass
is the index (as an integer not a string): enumerate(subnodes)

this was ok: https://bugs.libre-soc.org/show_bug.cgi?id=1094#c82

sidenote, treewalker does not need depth, self.depth == len(self.paths)

Comment 113 Luke Kenneth Casson Leighton 2023-06-10 20:47:48 BST

(In reply to Dmitry Selyutin from comment #111)
> Meanwhile made tree more beautiful (truncated output a bit to keep comment
> simple, can be reproduced via `pysvp64db tree`):


awesome.

> /0
> /0/name
>     'tdi'

so little code, so many possibilities...

Comment 114 Dmitry Selyutin 2023-06-10 21:12:50 BST

(In reply to Luke Kenneth Casson Leighton from comment #112)
> (In reply to Dmitry Selyutin from comment #109)
> this was ok: https://bugs.libre-soc.org/show_bug.cgi?id=1094#c82

This was OK interface-wise, but our nodes lack the names. If you have an idea how to augment them with name -- I'm totally fine to update it. Until then I see no better choice.

Details here: https://bugs.libre-soc.org/show_bug.cgi?id=1094#c106

Comment 115 Dmitry Selyutin 2023-06-10 21:16:05 BST

Switched operands and fields to Dict class (like Dict, but "frozen" and has default walk method). Example output:

/6/fields
/6/fields/RA
    (11, 12, 13, 14, 15)
/6/fields/RB
    (16, 17, 18, 19, 20)
/6/fields/RC
    (21, 22, 23, 24, 25)
/6/fields/Rc
    (31,)
/6/fields/RT
    (6, 7, 8, 9, 10)
/6/fields/XO
    (26, 27, 28, 29, 30)
/6/mdwn/operands
/6/mdwn/operands/RT/class
    'GPROperand'
/6/mdwn/operands/RA/class
    'GPROperand'
/6/mdwn/operands/RB/class
    'GPROperand'
/6/mdwn/operands/RC/class
    'GPROperand'
/6/mdwn/operands/Rc/class
    'StaticOperand'
/6/mdwn/operands/Rc/value
    0

Comment 116 Luke Kenneth Casson Leighton 2023-06-11 00:17:23 BST

whoops...

$ python3 decoder/isa/test_caller_svp64_fp.py >& /tmp/f

======================================================================
ERROR: test_sv_fpload (__main__.DecoderTestCase)
>>> lst = ["sv.lfsx *2, 0, *0"
----------------------------------------------------------------------
Traceback (most recent call last):
  File "decoder/isa/test_caller_svp64_fp.py", line 23, in test_sv_fpload
    lst = list(lst)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 61, in __iter__
    yield from self.trans
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 114, in translate
    yield from self.translate_one(insn)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 106, in translate_one
    arguments=fields, specifiers=opmodes)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3520, in assemble
    insn = super().assemble(record=record, arguments=arguments)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 1900, in assemble
    arguments=arguments, operands=cls.dynamic_operands(record=record))
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 930, in __new__
    raise ValueError("operands count mismatch")
ValueError: operands count mismatch

Comment 117 Luke Kenneth Casson Leighton 2023-06-11 00:21:29 BST

(In reply to Luke Kenneth Casson Leighton from comment #116)
> whoops...
> 
> $ python3 decoder/isa/test_caller_svp64_fp.py >& /tmp/f
> 

commit 963fa4fcadf76740083d9fce3ffe986a81f247fe
Author: Dmitry Selyutin <ghostmansd@gmail.com>
Date:   Sat Jun 10 23:08:15 2023 +0300

    insndb/core: switch Fields to Dict class

>     raise ValueError("operands count mismatch")
> ValueError: operands count mismatch

Comment 118 Dmitry Selyutin 2023-06-11 09:14:38 BST

Luke, thank you for your investigation! I'm sorry for missing this ussue, felt myself kinda invincible. :-) I've fixed, this is again the same error I always do: using isinstance instead of issubclass.

Comment 119 Luke Kenneth Casson Leighton 2023-06-11 10:12:41 BST

(In reply to Dmitry Selyutin from comment #118)
> Luke, thank you for your investigation! I'm sorry for missing this ussue,
> felt myself kinda invincible. :-) 

:)

> I've fixed, this is again the same error I
> always do: using isinstance instead of issubclass.

still some work to do:

python3 decoder/isa/test_caller_svp64_fp.py >& /tmp/f

======================================================================
ERROR: test_sv_fpmadds (__main__.DecoderTestCase)
>>> lst = ["sv.fmadds *6, *2, *4, 8"
----------------------------------------------------------------------
Traceback (most recent call last):
  File "decoder/isa/test_caller_svp64_fp.py", line 161, in test_sv_fpmadds
    lst = list(lst)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 61, in __iter__
    yield from self.trans
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 114, in translate
    yield from self.translate_one(insn)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/asm.py", line 107, in translate_one
    yield from insn.disassemble(record=record, style=Style.LEGACY)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3556, in disassemble
    specifiers = sorted(rm.specifiers(record=record))
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3486, in __getattr__
    return getattr(self.rm, key)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 3422, in rm
    search = ((mode << 1) | self.record.Rc)
  File "/usr/local/lib/python3.7/dist-packages/cached_property.py", line 36, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 1125, in Rc
    return self["Rc"].value
  File "/home/lkcl/src/libresoc/openpower-isa/src/openpower/insndb/core.py", line 1119, in __getitem__
    return cls(record=self, **kwargs)
TypeError: type object argument after ** must be a mapping, not tuple

Comment 120 Luke Kenneth Casson Leighton 2023-06-11 14:02:43 BST

ok i put "paths" branch at the former master, reverted some things,
then re-patched them back in by hand.  nothing is lost, pytest works
at commit 37e33e6c5ee1c4e3e97704b5b76bf13c3665470c (HEAD -> master, origin/master)

paths branch:

https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=refs/heads/paths

Comment 121 Luke Kenneth Casson Leighton 2023-06-11 14:34:12 BST

ok i have some (small) changes that got mixed up with the broken-ness:

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=37e33e6c5
    reduce down one more line in walkmethod.__get__

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=bcab1f6
    lambda is a singleton (hashable) therefore can be safely set as the
    default option to a function parameter
    https://bugs.libre-soc.org/show_bug.cgi?id=1094#c89

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=67a3df94
    reduce number of lines slightly by using yield from filter(match, ...)

with this being so low-level / critical, now, and i think things have
stabilised, let's move to the usual "branch-and-test-and-review-and-rebase"?

Comment 122 Luke Kenneth Casson Leighton 2023-06-12 00:00:27 BST

oh one thing: path needs renaming to nodepath.  otherwise it gets utterly
confused with "import path".

likewise all python keywords need avoiding ("type=..." and "id=...."
especially)

i like short as well but short-and-conflicting is extremely bad for all
sorts of reasons.

Comment 123 Dmitry Selyutin 2023-06-12 02:11:43 BST

(In reply to Luke Kenneth Casson Leighton from comment #122)
> oh one thing: path needs renaming to nodepath.  otherwise it gets utterly
> confused with "import path".
> 
> likewise all python keywords need avoiding ("type=..." and "id=...."
> especially)
> 
> i like short as well but short-and-conflicting is extremely bad for all
> sorts of reasons.

We already had this debate. No it doesn't. Path is short and sweet, and Python developers use all keywords and names, it's up to the caller to differ our path and somebody else's.
As I said, upon using practically any word there's a chance of having a collision with Python builtins or modules or whatever. EWONTFIX.

Comment 124 Luke Kenneth Casson Leighton 2023-06-12 03:16:44 BST

(In reply to Dmitry Selyutin from comment #123)

> We already had this debate. No it doesn't. Path is short and sweet, and
> Python developers use all keywords and names, 

dmitry: sorry to have to do this: as the *project leader* responsible
for ensuring that the code developed is maintainable, and having over
20 years experience in writing and maintaining large software projects,
i am setting a *project policy* that the use of python keywords and python
modules as local variable-names, class names or local modules, is prohibited.

i will update the developer guide to reflect this, over the next couple
of days, as i have an international guest staying for a week and need to
leave for the airport in 5 hours.

Comment 125 Dmitry Selyutin 2023-06-12 06:31:40 BST

Ok, if we're choosing this path, then you need to reflect the exact way how to check that we don't have a collision. Then, whenever we update Python version, we need to check whether the entire code base matches the criteria you established. Then, obviously, you have to update all code (obviously including yours) according to the mentioned high standards.

As project maintainer, you absolutely have the will. The way you dictate your will restricts other's willingness to contribute to the project. I hope that 20 years expertise can address this aspect too.

Comment 126 Dmitry Selyutin 2023-06-12 06:34:26 BST

Updated the task accordingly.

Comment 127 Dmitry Selyutin 2023-06-12 07:20:59 BST

Fixed the error in paths branch; will wait for CI to see whether something else is broken too.

Comment 128 Dmitry Selyutin 2023-06-12 09:00:09 BST

Tests are completed, same failures as in master.
https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4299105

Comment 129 Luke Kenneth Casson Leighton 2023-06-12 09:02:47 BST

(In reply to Dmitry Selyutin from comment #128)
> Tests are completed, same failures as in master.
> https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4299105

thank you dmitry. confirmed ran locally here, all good. branch can rebase fine.
so sorry about my extreme-limited communication.

Comment 130 Dmitry Selyutin 2023-06-12 09:53:49 BST

Please check the task changes, they are pretty self-explanatory. It is impossible to develop efficiently when I constantly have a spoke in a wheel. Each and every technical solution is met with extremely negative attitude, dislike, distaste. You basically force me to implement any recent code according to your personal preferences. Arguments and rationale beyond my choices are ignored or met with "I'm maintainer" argument. If this is the path you choose to follow — you can end up maintaining the code with only one developer.

Not a single line of code until there's a formal document with valid technical rationale explaining why and how the choice is done. Yes this document should be discussed and accepted until this is enforced. If the rationale is "I'm the maintainer with 20 years of experience", then, please, don't even bother writing this document. If you treat persons around the project as a team, this should be discussed and accepted by team members.

I'm really fed up with this nonsense. So much energy down the drain.

Comment 131 Dmitry Selyutin 2023-06-12 13:03:49 BST

https://libre-soc.org/irclog/%23libre-soc.2023-06-12.log.html#t2023-06-12T12:07:42

Two ideas:
1. Paths or indices can be nodes as well, and be visitable. This allows to solve the issue on how and what to pass to visitor method.
2. We should support loose matches for visitor handlers, likely traversing MRO. The visitor below handles any Dict in a generic way but has a special logic for Operands:

class MagicVisitor(Visitor):
    @visitmethod(Operands)
    def Operands(self, node):
        print("Oh and this is Operands node!", node)
        yield node

    @visitmethod(Dict)
    def Dict(self, node):
        print(node)
        yield node

And basically the default __call__ becomes @visitormethod(object)!

Comment 132 Dmitry Selyutin 2023-06-12 21:37:42 BST

I liked the idea so much that I decided to play more with it. I think I finally found the way to handle everything in a graceful way. Perhaps eventually I'll provide a standalone module, maybe even on PyPI.

As this is a demo and as I remember these debates about underscores and other extremely important stuff, I took a liberty to write this code as I like. Since our project does not welcome "as I like" approach, I posted it to anonymous repository at repo.or.cz. Keep in mind that's just a concept; it's unlikely this will satisfy personal preferences unless modified.

https://repo.or.cz/pydispatch.git

Example:

from pydispatch import visitor
from pydispatch import walker

class SpecialDict(dict):
    pass

class CustomWalker(walker.Walker):
    @walker.hook(dict)
    def dict(self, instance):
        for (key, value) in instance.items():
            yield (value, key)
            yield from self((value, key))

class CustomVisitor(visitor.Visitor):
    @visitor.hook(object)
    def object(self, instance):
        print("enter")
        yield instance
        print("leave")

visitor = CustomVisitor()
walker = CustomWalker()
for item in walker(SpecialDict({"a": 1, "b": 2})):
    with visitor(item) as node:
        print("hook", repr(item))

Comment 133 Dmitry Selyutin 2023-06-12 21:38:58 BST

The output:

enter
hook (1, 'a')
leave
enter
hook 1
leave
enter
hook 'a'
leave
enter
hook (2, 'b')
leave
enter
hook 2
leave
enter
hook 'b'
leave

Comment 134 Dmitry Selyutin 2023-06-12 21:45:41 BST

Again, this is a concept. I'm simply tired of all this discussion and needed some sort of distraction. This concept demonstrates several aspects:

1. The key entity is Dispatcher class. It's used to wrap methods so that the call is dispatched based on the argument type.
2. Two classes, Visitor and Walker, inherit the Dispatcher (with some magic involved with metaclasses). Walker describes the way the objects are traversed, the visitor just enables the context managers.
3. The code supports loose type matches. In the example above, Dict inherits dict, but is caught loosely by CustomWalker handler for dict.

Comment 135 Dmitry Selyutin 2023-06-12 22:05:05 BST

What I like here:
1. There's no need to update classes with some magic methods. If the walker's handler is registered, this will be caught.
2. It's loosely typed; inheritance is taken into account.
3. This has no problems with builtins and their "walkability".
4. The use is dead-simple.

What I don't like:
1. I don't know if it's possible to make super() work with it (that is, call the superclass handler). Not sure if we need it, though: common things can be moved to usual methods or functions, and these can be called in hooks.
2. Perhaps this is more complex than it should've been. I tried to make it generic, though, and somewhat flexible.

Comment 136 Luke Kenneth Casson Leighton 2023-06-13 15:06:55 BST

(In reply to Dmitry Selyutin from comment #130)
> You basically force me to implement any recent code
> according to your personal preferences.

dmitry: they really aren't. i am the one who has to think
through the impliciations "when these few lines of code
are integrated into the remaining 200,000+ lines of code
that i am responsible for, how will that go? what problems
will arise?"

which is *very* different from the (small, focussed) task
that you are doing, which at the moment is something that
you need to be enjoying (and *not* worrying about the
bigger picture, which is my responsibility).

now, with the insndb before it was integrated deeply into
the rest of the code, there was no problem: it was self-contained.
but now it is becoming an absolutely critical part of the
entire codebase - through the Walker API - i now have to think
of the ramifications.

so i have a potential solution / proposition: if *i* make any
minor corrections (adding comments, renaming of variables),
which are superficial but clearly project-management-related, you don't
object to them.  if we *do* have some disagreement - if there is something
that you genuinely don't like *please speak up*, and we move it to
a branch and discuss further.

i already put in comments into insndb.py before the restructuring
(over the past few months) and you didn't object, so i know you're
fine with me doing that.

this way, you are satisfied by being able to design something
you love doing, and i can also make sure that what you are doing
fits with the wider project for which *i* am responsible (not you).

how does that sound?

Comment 137 Dmitry Selyutin 2023-06-13 15:37:45 BST

I already told my opinion. If you have some certain criteria on naming, use of particular constructs, and want to permit or ban some techniques — this should be documented and, if you respect the team, at least be brought to discussion (the form "I'm responsible and I have 20 years of expertise" certainly lacks the discussion bits). As I said: the problem is not in particular decisions per se, the problem resides in lack of technical rationale behind. I don't ask to document each and every aspect of using Python or any other programming language, that'd be rather impossible. But those cases where we had disagreement must be documented. Stating that you are responsible explains nothing, it lacks the rationale and certain criteria.

The current communication format just converts the development into quite a stressing process. I don't even know how to write the code anymore, because you keep interfering and, especially, make confusing statements which in the form they are done contradict to each other. I'm already afraid to make any following step, because you already pushed the situation to degree where I simply think that any code I publish will be rejected.

Many projects start with coding style and certain guidelines. I think it's the good for us to have those. Not that I think it's the right time. But you told in many similar situations that you don't have time to address these issues yet, and I begin to worry whether you're going to address them at all.

Comment 138 Dmitry Selyutin 2023-06-13 15:42:20 BST

Again: I'm OK with the strategy, but I need certain criteria and rationale. Again, I'd rather prefer discussing all these questions.

Comment 139 Luke Kenneth Casson Leighton 2023-06-13 17:03:19 BST

(In reply to Dmitry Selyutin from comment #137)

> make confusing statements which in the
> form they are done contradict to each other.

yes.  i have a combination of "writer's block", a form of dyslexia,
*and* Asperger's.  it means i can *recognise* something when i see
(or hear it) but can in no way properly *express* it.  often it takes
over three weeks to clearly answer just one "simple" question that
anyone else would find it perfectly reasonable to expect an answer,
right now, now, now.  overall going back 30 years it has been terribly 
frustrating for me, and very difficult for everyone else i have ever
worked with or for.

but also there is this: it isn't "me being difficult", the entire
project is difficult - a level so far above standard computer science
that only last week Mitch Alsup on comp.arch said "it is a pity that
you have to study an entire lifetime before understanding enough to
design a processor, and by then you are already at retirement age".

i am barely keeping up holding all the knowledge in my head (hence
the "writer's block"), and to expect every member of the team to
do the same, in order to be able to make properly-informed project
management and development decisions, is completely unreasonable.

but forget all of that: it's my responsibility to deal with.

my primary goal in is making sure that you have things that you can
enjoy doing, that are also self-contained tasks.  there's actually
one that's high-priority but should be very short (bear in mind that
notes for an "internet-blog" writeup of the algorithm is part of the
task)
https://bugs.libre-soc.org/show_bug.cgi?id=1085

and another one:
https://bugs.libre-soc.org/show_bug.cgi?id=1044

and a third one that would be useful as a standard vector demo: daxpy
https://bugs.libre-soc.org/show_bug.cgi?id=953

all of those should only be between 8-12 lines of SVP64 assembler.

there are now three of us (me, jacob and konstantinos/markos) who can
help with questions, if you're interested?

Comment 140 Dmitry Selyutin 2023-06-13 17:54:54 BST

(In reply to Luke Kenneth Casson Leighton from comment #139)
> (In reply to Dmitry Selyutin from comment #137)
> 
> overall going back 30 years it has been terribly 
> frustrating for me, and very difficult for everyone else i have ever
> worked with or for.

I've seen many people, including, but not limited to people who develop, and believe me, you're far away from being an unbearable and difficult person. My particular issue is that I'm somewhat technology-centric; I can be annoyed with some technical solutions rather than with persons who stand beyond them.

> but also there is this: it isn't "me being difficult", the entire
> project is difficult - a level so far above standard computer science
> that only last week Mitch Alsup on comp.arch said "it is a pity that
> you have to study an entire lifetime before understanding enough to
> design a processor, and by then you are already at retirement age".

Couldn't agree more. This statement can be applied to many areas, though.

> but forget all of that: it's my responsibility to deal with.

It might be your responsibility, but you don't have to suffer and deal with it *alone*. Despite that sometimes I express things in a quite irritating and and annoying manner, like all of us, I want to do the best and want to help. I hope that the whole fact that I raised this whole discussion already clearly demonstrates how much I care about the project and the team. And that "the team" I mention obviously includes you.

> my primary goal in is making sure that you have things that you can
> enjoy doing, that are also self-contained tasks.
> there are now three of us (me, jacob and konstantinos/markos) who can
> help with questions, if you're interested?

I like this task, actually, and would like to complete it. I have problems with some technical aspects, but the task is nice and lovely, despite how many debates it involves. Also I wouldn't say I'm a suitable person for math-related algorithms (you do remember that I'm a philologist, right?).

Most of all I like architecture-related tasks (architecture in a broad sense, like designing APIs and data structures relationship). Perhaps the whole problem is that, since I like designing architectures, I treat some moments in a bit personal way, and our definitions of a good architecture differ, so this naturally leads to debates. That's why I suggested to establish some common understanding in a form of documentation (you didn't address this yet, if I'm not confused).

Comment 141 Dmitry Selyutin 2023-06-13 21:27:09 BST

I had to change the name for pydispatch project, since this name is already occupied in PyPI. Renamed the project to mdis, and published it to PyPI. That's the second thing I published to PyPI, so this might be done wrong.

https://repo.or.cz/mdis.git

Again, to clarify: this is a self-contained implementation, and for our needs it perhaps mostly serves as an illustration. I don't insist we should use this module, I can duplicate its functionality, or make some excerpts of it. I like the approach, though, and I think I'll try maintaining and developing this. You might look at it as "what Dmitry thinks is fine". I haven't yet addressed some of Jacobs comments and suggestions, though.

Comment 142 Luke Kenneth Casson Leighton 2023-06-16 00:28:24 BST

dmitry, hi, my guest is going to be here until at least the weekend,
i can only be brief - much briefer than i want to be.

1) i have enough money from the OPF grant.
2) API is great, my suggestion replace "path" with "nodeid"
3) separating as an independent repo, my suggestion make Node
   an optional argument.  def visit(.... , basetreeclass=Node)
    if isinstance(node, basetreeclass): ...
4) prior to separation: my suggestion create a file named visitor.py
   in insndb/

all i can manage at the moment, will be free next week.

Comment 143 Dmitry Selyutin 2023-06-16 21:31:37 BST

Meanwhile I've updated the module, incorporating suggestions from Jacob as well. Changes:

1. From now on we keep the original names for the functions and no longer bother constructing the new ones on-the-fly. This actually proved to be difficult, because I had to reconsider how and where keep the established hooks.
2. There is no explicit separation between call hooks and typeid hooks, corresponding to different decoration stages. Since both need class information, we construct the call hook dynamically, and have only one class: Hook. This type migrated into dispatcher module.
3. If there's a conflict in hooks, that is, a user attempted to hook the same type several times, the exception will show all conflicting functions.
4. Dropped the "wrapper" argument. As Jacob suggested, if somebody needs context managers, they should add these explicitly.
5. The representation for hooks is greatly simplified, this affects the help text too. The example for a CustomWalker installing hook for dict (irrelevant details snipped):

class CustomWalker(mdis.walker.Walker)
 |  Data and other attributes defined here:
 |  dispatch_dict = <dict>
 |  
 |  Data and other attributes inherited from mdis.walker.Walker:
 |  dispatch_mapping = <dict>
 |  dispatch_object = <object>
 |  dispatch_sequence = <tuple, list, set, frozenset>

Note that the example above had a method called dispatch_mapping at dict; but instead the top-level method will be called due to MRO.


Again, until we decide on how to store the module, I published it to PyPI and repo.or.cz (the reply from Luke considered an outdated state of art). I tend to think we might have a standalone repository here, but at the same time have this module in PyPI. On the other hand, the users would probably like something more HW-agnostic; but please not Github (I still have an account there, but would like something more liberal and open).

https://pypi.org/project/mdis
https://repo.or.cz/mdis.git

Comment 144 Luke Kenneth Casson Leighton 2023-06-16 23:30:12 BST

(In reply to Dmitry Selyutin from comment #143)

> Again, until we decide on how to store the module, I published it to PyPI
> and repo.or.cz (the reply from Luke considered an outdated state of art).

the entire project has to be self-sufficient.  i've created mdis on gitolite3.

Comment 145 Dmitry Selyutin 2023-06-18 12:17:09 BST

Luke, it seems there was something strange with commits: "initial implementation" commit was repeated twice. That's clearly wrong. I've done a force push for everything, now everything is fine.

Comment 146 Dmitry Selyutin 2023-06-18 12:29:19 BST

I've updated the project description to reflect our git and our bug tracker, and also incorporated some minor updates (introduced a base class for visitor which has its object hook implemented in terms of context managers). A new version is released.

I'm wondering what'd be the right way to use this module in our project. Should I simply update setup.py install_requires? If so, should I just use mdis (since we're unique on PyPI), or I need to follow the way it's done in cached_property?

Comment 147 Dmitry Selyutin 2023-06-18 12:32:37 BST

Roadmap is:
1. Update setup.py and install the module. Perhaps we should install it with devscripts?
2. Deprecate Node, walk() and similar stuff.
3. Switch db.py to new approach.

Once it's done, should we consider this task closed, and switch to task for implementing binutils in visitor API?

Comment 148 Dmitry Selyutin 2023-06-18 15:16:49 BST

I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll add it too. I think this will be a bit trickier to make it generic, though, since dataclasses do not inherit from some special class; we, however, do.

Comment 149 Luke Kenneth Casson Leighton 2023-06-18 16:27:59 BST

(In reply to Dmitry Selyutin from comment #148)
> I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll
> add it too. I think this will be a bit trickier to make it generic, though,
> since dataclasses do not inherit from some special class; we, however, do.

it'll need dynamic detection: i'm fairly certain you can do better than
isinstance(dataclass) and ha, probably by registering with @thing just
like @register(object) i suspect @register(dataclass) would work just as
well

oh btw one thing: i do strongly suggest removing all "import x as _x", it is
non-standard practice across the entire python community.  i recommend using
the standard python practice established for over 20 years of using "__all__"
instead.  aside from anything it reduces the number of words and symbols,
and reduces complexity.

Comment 150 Jacob Lifshay 2023-06-18 16:29:19 BST

(In reply to Dmitry Selyutin from comment #148)
> I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll
> add it too. I think this will be a bit trickier to make it generic, though,
> since dataclasses do not inherit from some special class; we, however, do.

just use:
https://docs.python.org/3.7/library/dataclasses.html#dataclasses.is_dataclass
and then call dataclasses.fields since that avoids needing to raise a typeerror, which is kinda slow iirc

also you can check for plain_data instances (feel free to add a plain_data.is_plain_data() function based off plain_data.fields(), since creating exceptions is kinda slow):
https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/plain_data.py;h=7bde6ba2f27ff3786beab9c036019337e36bbfe7;hb=4bf2f20bddc057df1597d14e0b990c0b9bdeb10e#l298

nmutil should probably be an optional import:
try:
    # renamed to not conflict with dataclasses
    from nmutil.plain_data import is_plain_data as _is_plain_data, fields as _plain_data_fields
except ImportError:
    def _is_plain_data(v):
        return False
    # _plain_data_fields not needed since _is_plain_data always returns False

Comment 151 Jacob Lifshay 2023-06-18 16:30:16 BST

(In reply to Luke Kenneth Casson Leighton from comment #149)
> (In reply to Dmitry Selyutin from comment #148)
> > I forgot to mention 1 step in roadmap: we miss support for dataclasses. I'll
> > add it too. I think this will be a bit trickier to make it generic, though,
> > since dataclasses do not inherit from some special class; we, however, do.
> 
> it'll need dynamic detection: i'm fairly certain you can do better than
> isinstance(dataclass) and ha, probably by registering with @thing just
> like @register(object) i suspect @register(dataclass) would work just as
> well

there is no dataclass class

Comment 152 Dmitry Selyutin 2023-06-18 17:11:56 BST

Sorry, typing from phone.

I know that we can use isdataclass in Hook(object). What I was trying to express that we cannot do Hook(dataclass) because there's no dataclass class.
However, there is Dataclass class in our code, so we can just hook this one: all our dataclasses inherit it.
Jacob, sorry, I'm not sure why you brought nmutil. I assumed we don't need nmutil in decoupled mdis module.

I imagined something like this (pay attention to file name):
$ cat insndb/db.py
class Walker(mdis.walker.Walker)
    @mdis.dispatcher.Hook(core.Dataclass)
    def dispatch_dataclass(self, instance):
        for field in dataclasses.fields(instance):
            yield (field.name, getattr(instance, field.name))

Same for similar cases, if that's what you mean by bringing nmutil.
As for underscores — OK, I'll remove underscores.

Comment 153 Dmitry Selyutin 2023-06-18 17:15:30 BST

Ah I think I got what you mean by nmutil. No, there's no place for nmutil in generic code, and never will. This is too specific. It's like bringing special support for PyQt to generic code. Hard no. If some code using nmutil needs this — they can do it in nmutil-related module, perhaps by providing their base walker or visitor with their own hooks twists.

Comment 154 Dmitry Selyutin 2023-06-18 18:19:46 BST

Hm. Luke, I re-read your comment, and I think I now understand what you mean. You likely mean that we can detect the dataclasses.dataclass function itself... Yes, I think this can be arranged, perhaps just via dataclasses.isdataclass instead. I'm a bit concerned that this kinda contradicts to the other code — passing classes and iterating over MRO. But I'll check whether these two options can coexist in a graceful way. Perhaps we can have a fork at isinstance(thing, type) and is callable(thing). I'll check it, I like that it allows some flexibility.

What I mean is:

@dispatcher.Hook(dataclasses.isdataclass)
def dispatch_dataclass(...): whatever()

If the Hook is capable to handle callables, that'd work too. Just not dataclasses.dataclass, because this one just wraps the class and I'd like to the code to be somewhat generic and not tied to special constructs (except for builtins perhaps; these are fine).

I'll investigate the ideas with callables. I think it's doable.

Comment 155 Dmitry Selyutin 2023-06-18 18:27:52 BST

Now that I think about it, any hook can be expressed via callable, it's just a bit more verbose:

@mdis.dispatcher.Hook(lambda typeid: issubclass(typeid, object))
def dispatch_mapping(self, instance):
    pass

I think we can leave the short form, but internally we can implement everything as callables. Holy cow that's really powerful!

Comment 156 Dmitry Selyutin 2023-06-18 19:10:40 BST

(In reply to Dmitry Selyutin from comment #155)
> I think we can leave the short form, but internally we can implement
> everything as callables. Holy cow that's really powerful!


Nah, implementing everything as callables loses one benefit: we can have hash lookup with types, with callables we're limited to linear search. It's not that it's critical (who's crazy to establish thousands of hooks?), but it hurts the beauty of the code. I'm currently checking something more graceful.

Comment 157 Dmitry Selyutin 2023-06-18 19:23:30 BST

Funny. Classes are callables. Thing turns out to be simpler.

Comment 158 Jacob Lifshay 2023-06-18 19:26:27 BST

(In reply to Dmitry Selyutin from comment #156)
> (In reply to Dmitry Selyutin from comment #155)
> > I think we can leave the short form, but internally we can implement
> > everything as callables. Holy cow that's really powerful!
> 
> 
> Nah, implementing everything as callables loses one benefit: we can have
> hash lookup with types, with callables we're limited to linear search. It's
> not that it's critical (who's crazy to establish thousands of hooks?), but
> it hurts the beauty of the code. I'm currently checking something more
> graceful.

if you're worried about speed, you could cache lookups -- something like:

class Visitor:
    def __init__(self):
        self.__cache = {}
    def run_hook(self, v):
        t = v if isinstance(v, type) else type(v)
        f = self.__cache.get(t)
        if f is None:
            for name, check in self.hooks():
                if check(t):
                    self.__cache[t] = f = getattr(self, name)
                    break
            if f is None:
                raise TypeError("no hooks for type", t)
        return f(v)

Comment 159 Jacob Lifshay 2023-06-18 19:28:26 BST

(In reply to Dmitry Selyutin from comment #157)
> Funny. Classes are callables. Thing turns out to be simpler.

except calling classes doesn't do what we want here, which is to check if the passed in value is an instance of the class, instead of attempting to construct a new instance of that class.

Comment 160 Dmitry Selyutin 2023-06-18 19:29:12 BST

import dataclasses

import mdis.dispatcher
import mdis.walker

class CustomWalker(mdis.walker.Walker):
    @mdis.dispatcher.Hook(dataclasses.is_dataclass)
    def dispatch_dataclass(self, instance):
        for field in dataclasses.fields(instance):
            key = field.name
            value = getattr(instance, key)
            yield (key, value)
            yield from self((key, value))

@dataclasses.dataclass
class Thing:
    a: int
    b: str

walker = CustomWalker()
thing = Thing(a=1, b="cocojamboo")
for item in walker(thing):
    print(repr(item))

('a', 1)
'a'
1
('b', 'cocojamboo')
'b'
'cocojamboo'

Comment 161 Dmitry Selyutin 2023-06-18 19:30:09 BST

(In reply to Jacob Lifshay from comment #159)
> (In reply to Dmitry Selyutin from comment #157)
> > Funny. Classes are callables. Thing turns out to be simpler.
> 
> except calling classes doesn't do what we want here, which is to check if
> the passed in value is an instance of the class, instead of attempting to
> construct a new instance of that class.

Jacob, of course I know that. My point is that I can just check callable upon Hook instantiation and raise a ValueError if it's not callable.

Comment 162 Dmitry Selyutin 2023-06-18 19:32:16 BST

(In reply to Dmitry Selyutin from comment #160)
>     @mdis.dispatcher.Hook(dataclasses.is_dataclass)
>     def dispatch_dataclass(self, instance):
>         for field in dataclasses.fields(instance):
>             key = field.name
>             value = getattr(instance, key)
>             yield (key, value)
>             yield from self((key, value))

After I thought about it for a while, I think I'm keeping this at base Walker level. dataclasses is a standard module, hell, let's not force users to invent that again.

Comment 163 Jacob Lifshay 2023-06-18 19:40:12 BST

(In reply to Dmitry Selyutin from comment #161)
> (In reply to Jacob Lifshay from comment #159)
> > (In reply to Dmitry Selyutin from comment #157)
> > > Funny. Classes are callables. Thing turns out to be simpler.
> > 
> > except calling classes doesn't do what we want here, which is to check if
> > the passed in value is an instance of the class, instead of attempting to
> > construct a new instance of that class.
> 
> Jacob, of course I know that.

yeah, people occasionally forget.

> My point is that I can just check callable
> upon Hook instantiation and raise a ValueError if it's not callable.

that works, though it may be better to do:
def hook(cond):
    if isinstance(cond, type):
        def cond(ty, __class=cond):
            return issubclass(ty, __class)  # edit: fix issubclass name
    if not callable(cond):
        raise TypeError
    ...

Comment 164 Dmitry Selyutin 2023-06-18 20:03:17 BST

OK here's yet another set of updates (therefore the PyPI version is bumped):
1. The hook can be injected as a callable which gets type. First we try to lookup the typeid as is, then we patiently iterate over stuff which looks like callable.
2. We now can walk over dataclasses, yay! Implemented in terms of @dispatcher.Hook(dataclasses.is_dataclass) in walker.Walker.
3. Underscores were replaced with __all__, this adds +5 points to charisma.
4. The calls to dispatch are cached via functools.lru_cache.

Comment 165 Dmitry Selyutin 2023-06-18 20:04:59 BST

Next steps are:
1. Update setup.py with mdis. Again, should we use the same form as with cached_property, with explcit git, just to be safe? I'm thinking of explicit git because it also allows to get the latest changes.
2. Drop Node, Dataclass, classmethod walk() and similar stuff. The new approach seem to cover anything we currently need without relying on inheritance.
3. Switch db.py to mdis module. Perhaps I'll also add few commands for pysvp64db (this is nice for debugging AND providing useful information simulaneously).

Comment 166 Dmitry Selyutin 2023-06-18 20:14:20 BST

Oh, by the way, I just recalled that tuples et al. simply yield stuff from collection upon walking by default.

class Walker(dispatcher.Dispatcher, metaclass=WalkerMeta):
    @dispatcher.Hook(tuple, list, set, frozenset)
    def dispatch_sequence(self, instance):
        for item in instance:
            yield item
            yield from self(item)

dicts and dataclasses, on the other hand, yield key too. Should the dispatcher look like this instead?

    @dispatcher.Hook(tuple, list, set, frozenset)
    def dispatch_sequence(self, instance):
        for (index, item) in enumerate(instance):
            yield (index, item)
            yield from self(item)

Also, whilst we're at it, we could perhaps support namedtuple default hook (yielding keys too I think).
I'd like to listen to your opinion, folks, on these two.

Comment 167 Jacob Lifshay 2023-06-18 20:22:44 BST

(In reply to Dmitry Selyutin from comment #166)
> dicts and dataclasses,

dataclasses/namedtuples should just visit the field values and not tuples of name/value combinations (except for path tracking)

imho dicts can recursively walk key/value pairs or just values -- at the choice of the visitor/walker?

Comment 168 Dmitry Selyutin 2023-06-18 20:40:52 BST

(In reply to Jacob Lifshay from comment #167)
> (In reply to Dmitry Selyutin from comment #166)
> > dicts and dataclasses,
> 
> dataclasses/namedtuples should just visit the field values and not tuples of
> name/value combinations (except for path tracking)
> 
> imho dicts can recursively walk key/value pairs or just values -- at the
> choice of the visitor/walker?

Yielding name along with the value is exactly path tracking. Plus, we iterate over collections, and just values do not really resemble dataclass in the same way just keys or just values don't resemble dict.
This is not the same with set/list/similar. You don't really need an index to construct the collection. That said, I've been thinking of yielding index exactly for path tracking (though this is of no use with sets).

Comment 169 Jacob Lifshay 2023-06-18 20:45:53 BST

(In reply to Dmitry Selyutin from comment #168)
> Yielding name along with the value is exactly path tracking. Plus, we
> iterate over collections, and just values do not really resemble dataclass
> in the same way just keys or just values don't resemble dict.

i guess my point is that for dicts you'd sometimes want to recursively walk inside the keys too, not just inside values. this is unlike dataclasses, namedtuples, standard python classes, etc.

Comment 170 Dmitry Selyutin 2023-06-18 20:49:41 BST

(In reply to Jacob Lifshay from comment #169)
> (In reply to Dmitry Selyutin from comment #168)
> > Yielding name along with the value is exactly path tracking. Plus, we
> > iterate over collections, and just values do not really resemble dataclass
> > in the same way just keys or just values don't resemble dict.
> 
> i guess my point is that for dicts you'd sometimes want to recursively walk
> inside the keys too, not just inside values. this is unlike dataclasses,
> namedtuples, standard python classes, etc.

I'd not say so. From perspective of visiting the stuff, for example, printing the hierarchy, these are not too different to me. If we don't yield key names, the only other option is to yield something else, e.g. index, which can be mapped back. And inventing this backward mapping is, well, quite tedious to handle in all dataclasses. So I don't think that we should drop this path which is obtained just for granted. My point is, that we should do it for tuples too, because index is actually a path here.

Comment 171 Jacob Lifshay 2023-06-18 21:23:04 BST

(In reply to Dmitry Selyutin from comment #170)

i'm imagining that visiting {("a", 1): 3.4, 2: {5, 6}} would do the following visitor calls, more or less:
with visitor.visit_dict(item={("a", 1): 3.4, 2: {5, 6}}, path=[]):
    with visitor.visit_dict_entry(item=dict_entry(("a", 1), 3.4), path=[("a", 1)]):
        with visitor.visit_tuple(item=("a", 1), path=[("a", 1), 0]):
            with visitor.visit_str(item="a", path=[("a", 1), 0, 0]):
                pass
            with visitor.visit_str(item=1, path=[("a", 1), 0, 1]):
                pass
        with visitor.visit_float(item=3.4, path=[("a", 1), 1]):
            pass
    with visitor.visit_dict_entry(item=dict_entry(2, {5, 6}), path=[2]):
        with visitor.visit_int(item=2, path=[2, 0]):
            pass
        with visitor.visit_set(item={5, 6}, path=[2, 1]):
            with visitor.visit_int(item=5, path=[2, 1, 5]):
                pass
            with visitor.visit_int(item=6, path=[2, 1, 6]):
                pass

Comment 172 Jacob Lifshay 2023-06-18 21:32:18 BST

@dataclass()
class A:
    a: int
    b: "list[int]"

visiting {3: A(5, [12, 3])} would do:
with visitor.visit_dict(item={3: A(5, [12, 3])}, path=[]):
    with visitor.visit_dict_entry(item=dict_entry(3, A(5, [12, 3])), path=[3]):
        with visitor.visit_int(item=3, path=[3, 0]):
            pass
        with visitor.visit_A(item=A(5, [12, 3]), path=[3, 1]):  # edit: fix typo
            with visitor.visit_int(item=5, path=[3, 1, "a"]):
                pass
            with visitor.visit_list(item=[12, 3], path=[3, 1, "b"]):
                with visitor.visit_int(item=12, path=[3, 1, "b", 0]):
                    pass
                with visitor.visit_int(item=3, path=[3, 1, "b", 1]):
                    pass

Comment 173 Dmitry Selyutin 2023-06-18 21:34:57 BST

Jacob, I'm having difficulties reading or understanding this massive cascade of with statements. But I suspect that you misinterpret how the API is used, because the API doesn't need to explicitly mark what's visited, hooks do it. Could you please check this README and adopt the code so that I could better understand the idea?

https://git.libre-soc.org/?p=mdis.git;a=blob;f=README.md;h=aa540581aa598fcb34d91aaff4234c2213e3413e;hb=HEAD

Comment 174 Jacob Lifshay 2023-06-18 21:35:19 BST

(In reply to Jacob Lifshay from comment #172)
>         with visitor.visit_int(item=3, path=[3, 0]):
>             pass
>         with visitor.visit_A(item=A(5, [12, 3]), path=[3, 1]):  # edit: fix typo
>             with visitor.visit_int(item=5, path=[3, 1, "a"]):

Comment 175 Jacob Lifshay 2023-06-18 21:42:06 BST

(In reply to Dmitry Selyutin from comment #173)
> Jacob, I'm having difficulties reading or understanding this massive cascade
> of with statements.

i was listing the different visitor functions that would be automatically called (with assumed @hook decorations) by the walker when walking the given python value, so you can see what i think should be recursively visited and what should only go in the path. basically, an execution trace.

> But I suspect that you misinterpret how the API is used,
> because the API doesn't need to explicitly mark what's visited, hooks do it.
> Could you please check this README and adopt the code so that I could better
> understand the idea?

i'm not saying that's how a user will call it (it would be too complex of an API if that were how users had to call it), but what functions with argument values will be called by the walker automatically.

> https://git.libre-soc.org/?p=mdis.git;a=blob;f=README.md;
> h=aa540581aa598fcb34d91aaff4234c2213e3413e;hb=HEAD

Comment 176 Dmitry Selyutin 2023-06-18 21:46:01 BST

If the method just prints the value with the relevant indent, what'd be the output?

Comment 177 Luke Kenneth Casson Leighton 2023-06-18 23:21:53 BST

(In reply to Dmitry Selyutin from comment #166)


> dicts and dataclasses, on the other hand, yield key too. Should the
> dispatcher look like this instead?
> 
>     @dispatcher.Hook(tuple, list, set, frozenset)
>     def dispatch_sequence(self, instance):
>         for (index, item) in enumerate(instance):
>             yield (index, item)
>             yield from self(item)

like it.  corresponding dispatch.py change:

  84     def __call__(self, instance, *args):
  85         for typeid in instance.__class__.__mro__:
  86             hook = self.__class__.dispatch(typeid=typeid)
  87             if hook is not None:
  88                 break
  89         if hook is None:
  90             hook = self.__class__.dispatch()
  91         return hook(dispatcher=self, instance=instance, *args)

which will need this instead:

     @dispatcher.Hook(tuple, list, set, frozenset)
     def dispatch_sequence(self, instance):
         for (index, item) in enumerate(instance):
      -->    yield (item, index) <-- swapped: item first
             yield from self(item)


> Also, whilst we're at it, we could perhaps support namedtuple default hook

awesome!

Comment 178 Luke Kenneth Casson Leighton 2023-06-18 23:26:52 BST

and (in walker.py)

  22     @dispatcher.Hook(dict)
  23     def dispatch_mapping(self, instance):
  24         for (key, value) in instance.items():
  25             yield (value, key) <--- value (instance) comes first
  26             yield from self((key, value))

value comes first so that it is picked up as instance by dispatch.__call__

  84     def __call__(self, instance, *args):

Comment 179 Jacob Lifshay 2023-06-19 09:16:18 BST

(In reply to Dmitry Selyutin from comment #176)
> If the method just prints the value with the relevant indent, what'd be the
> output?

ok, i'll have it print before the contextmanager yield.
i'll include the path too, so you can see all the relevant inputs.

(In reply to Jacob Lifshay from comment #171)
> (In reply to Dmitry Selyutin from comment #170)
> 
> i'm imagining that visiting {("a", 1): 3.4, 2: {5, 6}} would do the
> following visitor calls, more or less:

{("a", 1): 3.4, 2: {5, 6}}       path=[]
    dict_entry(("a", 1), 3.4)    path=[("a", 1)]
        ("a", 1)                 path=[("a", 1), 0]
            "a"                  path=[("a", 1), 0, 0]
            1                    path=[("a", 1), 0, 1]
        3.4                      path=[("a", 1), 1]
    dict_entry(2, {5, 6})        path=[2]
        2                        path=[2, 0]
        {5, 6}                   path=[2, 1]
            5                    path=[2, 1, 5]
            6                    path=[2, 1, 6]

(In reply to Jacob Lifshay from comment #172)
> @dataclass()
> class A:
>     a: int
>     b: "list[int]"
> 
> visiting {3: A(5, [12, 3])} would do:

{3: A(5, [12, 3])}               path=[]
    dict_entry(3, A(5, [12, 3])) path=[3]
        3                        path=[3, 0]
        A(5, [12, 3])            path=[3, 1]
            5                    path=[3, 1, "a"]
            [12, 3]              path=[3, 1, "b"]
                12               path=[3, 1, "b", 0]
                3                path=[3, 1, "b", 1]

Comment 180 Luke Kenneth Casson Leighton 2023-06-19 10:35:32 BST

(In reply to Luke Kenneth Casson Leighton from comment #178)
> and (in walker.py)
> 
>   22     @dispatcher.Hook(dict)
>   23     def dispatch_mapping(self, instance):
>   24         for (key, value) in instance.items():
>   25             yield (value, key) <--- value (instance) comes first
>   26             yield from self((key, value))
> 
> value comes first so that it is picked up as instance by dispatch.__call__
> 
>   84     def __call__(self, instance, *args):

from comment #177 (!!) if the above is not done (passing instance 1st and *args
which can accept a non-argument - rather than a mandatory argument defaulting
to e.g. None - you have to do this awful/broken-ness:

  83 class Dispatcher(metaclass=DispatcherMeta):
  84     def __call__(self, instance):
             if isinstance(instance, tuple):
                  # extract the index
                  (instance, index) = instane
             else
                  index = errr some value indicating "non-iterable leaf node"
  85         for typeid in instance.__class__.__mro__:
  86             hook = self.__class__.dispatch(typeid=typeid)
  87             if hook is not None:

which doesn't work because there is no way to discern an *actual* tuple
from the "case where passing a tuple of (instance, index)

whatever is passed to Dispatcher *has* to be orthogonal, but also *has*
to be able to discern "iterable" from "non-iterable".  None as a path
*does not do this*, jacob:

{3: A(5, [12, 3])}               path=[]

no that is not okay (path=[]).  ints do not have and do not need paths.

there are two solutions i can think of:

1) instance, *args - and for non-iterables *args would be empty
2) always yield a (new) "Instance" object, which contains a
   mandatory "instance" and an **OPTIONAL** index (which you
   call "path" jacob)

i greatly prefer (1) because (2) forces walkers to always have
"node.instance"

Comment 181 Dmitry Selyutin 2023-06-19 10:41:39 BST

I like option 1, because it's short and sweet, and users can just declare method as `method(self, instance, *_)`. Also, yielding some object which wraps the original instance looks strange, because call gets not what was actually passed, but something else.

Comment 182 Jacob Lifshay 2023-06-19 20:00:15 BST

(In reply to Luke Kenneth Casson Leighton from comment #180)
> {3: A(5, [12, 3])}               path=[]
> 
> no that is not okay (path=[]).  ints do not have and do not need paths.

you misunderstood this: path is the list of all indexes/keys/field-names needed to get from the root object to the current value, here, we're visiting the root object (not an int, it's a dict) so obviously no indexes/keys need to accessed to get from the root object to the root object, hence why path=[].

and, yes, ints *need* paths, so you can tell which int you're referring to, the path is not some intrinsic property of the int, but instead tells you how to get to the int from the root object. this is important for things like e.g. SelectableInt(123, 45) (or csv row dicts), where it will visit both the 123 and the 45 and the path is *essential* to not confuse bit-width with value.
> 
> there are two solutions i can think of:
> 
> 1) instance, *args - and for non-iterables *args would be empty
> 2) always yield a (new) "Instance" object, which contains a
>    mandatory "instance" and an **OPTIONAL** index (which you
>    call "path" jacob)

no, path is the full list of indexes/keys/field-names needed to get from the root object to the current value. it is important that nothing is skipped in the path, so you don't get 2 different values with the same path and can't easily distinguish which one does what, hence why it includes dict/set keys when visiting some entry in the dict/set and field names when visiting a python class.

note that the same list can have items pushed/popped from the end -- treating it as a stack, so visiting stuff doesn't need to allocate a new list every function call

Comment 183 Dmitry Selyutin 2023-06-19 20:15:32 BST

The last comment leverages my fate, since I actually wanted to write quite similar questions. But Jacob was first. :-)
I have difficulties to understand how we reconstruct the object path unless indices and keys are not part of yield directives.

Comment 184 Dmitry Selyutin 2023-06-19 20:33:47 BST

Ah, yes, to clarify: yielding paths applies to walker only. But the visitor should have *args, **kwargs too in its call.

Comment 185 Dmitry Selyutin 2023-06-19 21:08:07 BST

Let's consider the example below:

import contextlib

import mdis.dispatcher
import mdis.visitor
import mdis.walker

class Walker(mdis.walker.Walker):
    pass

class ContextVisitor(mdis.visitor.ContextVisitor):
    @mdis.dispatcher.Hook(object)
    @contextlib.contextmanager
    def dispatch_object(self, instance, *, path):
        print(path)
        print(f"    {instance}")
        yield instance

walker = Walker()
visitor = ContextVisitor()
items = (
    {("a", 1): 3.4, 2: {5, 6}},
)
for (item, path) in walker(items):
    with visitor(item, path=path):
        pass


With the recent update (not yet pushed), here's what I get:

[0]
    {('a', 1): 3.4, 2: {5, 6}}
[0, ('a', 1)]
    3.4
[0, 2]
    {5, 6}
[0, 2, 5]
    5
[0, 2, 6]
    6

Jacob, that's less than you posted, because I didn't handle the dict_items specifically. This seems excessive, since dict_items is not actually accessible via items[0][0]. And, to be honest, yielding the element itself in set/frozenset doesn't look valid either; None perhaps? But None in the middle is strange too.

Comment 186 Dmitry Selyutin 2023-06-19 21:10:31 BST

^^^ Updated the code and printouts

Comment 187 Dmitry Selyutin 2023-06-19 21:12:58 BST

I think sets and frozensets paths are simply hash(item). That is:

    @dispatcher.Hook(set, frozenset)
    def dispatch_unordered_sequence(self, instance, path=[]):
        for item in instance:
            subpath = (path + [hash(item)])
            yield (item, subpath)
            yield from self(item, path=subpath)

Comment 188 Dmitry Selyutin 2023-06-19 21:15:31 BST

Oh, by the way, just so this is not missed: combining paths like this gives us depth for granted.

Comment 189 Dmitry Selyutin 2023-06-19 21:26:57 BST

With a minor tweak in the visitor which accepts an additional argument:

class Visitor(mdis.visitor.ContextVisitor):
    @mdis.dispatcher.Hook(object)
    @contextlib.contextmanager
    def dispatch_object(self, instance, *, path=[]):
        indent = (" " * (len(path) * 4))
        path = "".join(map(lambda part: f"[{part}]", path))
        print(indent, path, "=>", instance)
        yield instance

items = (
    {("a", 1): 3.4, 2: {5, 6}},
)
walker = Walker()
visitor = Visitor()
for (item, path) in walker(items):
    with visitor(item, path=path):
        pass


The output is:
     [0] => {('a', 1): 3.4, 2: {5, 6}}
         [0][('a', 1)] => 3.4
         [0][2] => {5, 6}
             [0][2][5] => 5
             [0][2][6] => 6


Notice that path is just an additional argument to visitor __call__. Anything can be here, it's up to caller. Paths are obtained from walker, not visitor.

Comment 190 Jacob Lifshay 2023-06-19 21:32:59 BST

(In reply to Dmitry Selyutin from comment #187)
> I think sets and frozensets paths are simply hash(item). That is:

I strongly disagree because hash(item) can't be used to select the proper item -- both because it's a pain to access a set entry via hash and because the hash may not be unique -- you can have a set {a, b} with len == 2 and hash(a) == hash(b)

Comment 191 Jacob Lifshay 2023-06-19 21:37:03 BST

(In reply to Dmitry Selyutin from comment #185)
> With the recent update (not yet pushed), here's what I get:
> 
> [0]
>     {('a', 1): 3.4, 2: {5, 6}}
> [0, ('a', 1)]
>     3.4

that's fine when the walker never recurses into dict keys.

however, when you want the walker to recurse into keys, then you do need the additional index to tell if you're visiting the key or the value. this treats a dict as a collection of key-value pairs rather than just a collection of values.

Comment 192 Dmitry Selyutin 2023-06-19 21:40:37 BST

And, with a dataclass added...

@dataclasses.dataclass()
class A:
    a: int
    b: "list[int]"

items = (
    {("a", 1): 3.4, 2: {5, 6}},
    {3: A(5, [12, 3])},
)
for (item, path) in walker(items):
    with visitor(item, path=path):
        pass


...that's what we get:

     [0] => {('a', 1): 3.4, 2: {5, 6}}
         [0][('a', 1)] => 3.4
         [0][2] => {5, 6}
             [0][2][5] => 5
             [0][2][6] => 6
     [1] => {3: A(a=5, b=[12, 3])}
         [1][3] => A(a=5, b=[12, 3])
             [1][3]['a'] => 5
             [1][3]['b'] => [12, 3]
                 [1][3]['b'][0] => 12
                 [1][3]['b'][1] => 3


That said, I still think that we must somehow designate that in case of dataclass we access not by index but by an attribute. With that in mind, perhaps the path should be somewhat more flexible than it is, and have different types: AttributePath, DictItemPath, SetPath and IndexPath. This way the repr can be handled.

Comment 193 Dmitry Selyutin 2023-06-19 21:42:49 BST

(In reply to Jacob Lifshay from comment #190)
> (In reply to Dmitry Selyutin from comment #187)
> > I think sets and frozensets paths are simply hash(item). That is:
> 
> I strongly disagree because hash(item) can't be used to select the proper
> item -- both because it's a pain to access a set entry via hash and because
> the hash may not be unique -- you can have a set {a, b} with len == 2 and
> hash(a) == hash(b)

My point is that the item itself cannot be used either. You simply don't have syntax where set can be accessed by an index. cf. above about different repr for different path types so that we can handle .attr vs [index] vs dict_item(a,b) vs ??? (no idea how can we express sets).

Comment 194 Dmitry Selyutin 2023-06-19 21:53:57 BST

Moar tweaks with in-place constructed path classes:

     [0] => {('a', 1): 3.4, 2: {5, 6}}
         [0][('a', 1)] => 3.4
         [0][2] => {5, 6}
             [0][2]{5} => 5
             [0][2]{6} => 6
     [1] => {3: A(a=5, b=[12, 3])}
         [1][3] => A(a=5, b=[12, 3])
             [1][3].a => 5
             [1][3].b => [12, 3]
                 [1][3].b[0] => 12
                 [1][3].b[1] => 3

Comment 195 Dmitry Selyutin 2023-06-19 21:56:09 BST

(In reply to Jacob Lifshay from comment #191)
> 
> that's fine when the walker never recurses into dict keys.
> 
> however, when you want the walker to recurse into keys, then you do need the
> additional index to tell if you're visiting the key or the value. this
> treats a dict as a collection of key-value pairs rather than just a
> collection of values.

Ah yes, that's what I missed. Right, keys should be handled too.

Comment 196 Dmitry Selyutin 2023-06-19 22:03:01 BST

How about {key} and {value} notation? I'm not sure how to express it better in representation.

Comment 197 Jacob Lifshay 2023-06-19 22:09:52 BST

(In reply to Dmitry Selyutin from comment #196)
> How about {key} and {value} notation? I'm not sure how to express it better
> in representation.

i just treated it as dict entries being a tuple-like thing, like is returned by dict.items

Comment 198 Jacob Lifshay 2023-06-19 22:14:01 BST

(In reply to Dmitry Selyutin from comment #193)
> My point is that the item itself cannot be used either. You simply don't
> have syntax where set can be accessed by an index. cf. above about different
> repr for different path types so that we can handle .attr vs [index] vs
> dict_item(a,b) vs ??? (no idea how can we express sets).

well, for the root object [[{1, (5, 6)}]], we could use:
item=6
path: [0][0];(5, 6)[1]

where a set entry is just ;<key> plus parenthesis as needed.

Comment 199 Luke Kenneth Casson Leighton 2023-06-19 23:30:39 BST

(In reply to Dmitry Selyutin from comment #181)
> I like option 1, because it's short and sweet, and users can just declare
> method as `method(self, instance, *_)`.

indeed.  or, it can be detected: functools can find out how many
arguments there are, whether there are *args and **kwargs.
we use it already somewhere, in soc or ah no nmigen, for detecting
if one of the function parameters is "name".

on detection of the number of arguments the *caller* can omit the
missing parameter.

> Also, yielding some object which
> wraps the original instance looks strange, because call gets not what was
> actually passed, but something else.

yes counter-intuitive.


(In reply to Dmitry Selyutin from comment #189)
> With a minor tweak in the visitor which accepts an additional argument:
> 
> class Visitor(mdis.visitor.ContextVisitor):
>     @mdis.dispatcher.Hook(object)
>     @contextlib.contextmanager
>     def dispatch_object(self, instance, *, path=[]):
                                                  ^^

(edit: drat different fonts. arrows point to the [])

this is a serious error, it is a singleton pattern.
it MUST be a tuple NOT a list.  (basically anything hashable
is fine, but unhashable is catastrophic. this is a very common
mistake)

i really do not like path as a mandatory argument for all visitors
(particularly leaf-nodes and especially the primitives int bool etc).
if it is optional (see above including *_)  i have no problem with it.

and if using the name "instance" i recommend correspondingly using
"instanceid".


(In reply to Jacob Lifshay from comment #182)
> (In reply to Luke Kenneth Casson Leighton from comment #180)
> > {3: A(5, [12, 3])}               path=[]
> > 
> > no that is not okay (path=[]).  ints do not have and do not need paths.
> 
> you misunderstood this: path is the list of all indexes/keys/field-names
> needed to get from the root object to the current value, 

ahh ok. good explanation.

> and, yes, ints *need* paths, so you can tell which int you're referring to,

no, you really don't.  primitive types (bool int float complex str bytes)
definitely do not need a path...

> like e.g. SelectableInt(123, 45)

... but types *derived* from int might: this depends on the implementor,
if there are any additional properties.

i.e. SelectableInt is no longer a leaf-node type, it is a tree-type
due to itself having two leaf-nodes: the value and the bit-width.

Comment 200 Dmitry Selyutin 2023-06-20 00:01:42 BST

(In reply to Luke Kenneth Casson Leighton from comment #199)
> (In reply to Dmitry Selyutin from comment #181)
> > I like option 1, because it's short and sweet, and users can just declare
> > method as `method(self, instance, *_)`.
> 
> indeed.  or, it can be detected: functools can find out how many
> arguments there are, whether there are *args and **kwargs.
> we use it already somewhere, in soc or ah no nmigen, for detecting
> if one of the function parameters is "name".

No. This is _incorrect_. You shouldn't do this, because it's completely unreliable and can break at any time. The best bet would perhaps be inspect. This is just like people used to detect "unbound methods", then came Guido and dropped these once and forever. There is no justification for such tricks other than hacking or avoiding to traverse complex classes hierarchy manually (which can bring even more problems than inspect). Resorting to such constructs, other than for self-aware code or truly dynamic construction of some meta object, is almost certainly a sign of bad architecture.

> this is a serious error, it is a singleton pattern.
> it MUST be a tuple NOT a list.  (basically anything hashable
> is fine, but unhashable is catastrophic. this is a very common
> mistake)

Please don't be so dramatic. You condemn the artificial example, where you didn't even check the code. If you check it, you'll see that neither default value is used nor actually this is write-accessed. But this is not the code to be committed. The walkers, however, use tuples.

> i really do not like path as a mandatory argument for all visitors
> (particularly leaf-nodes and especially the primitives int bool etc).
> if it is optional (see above including *_)  i have no problem with it.

Luke. You don't read the code, _again_. Which part of it says it's mandatory? Which part binds it to any visitor other than visitor from this example? I specifically and explicitly told that walkers yield it, and showed it in the example itself. It's up to users whether they use this information or not in visitors.

> and if using the name "instance" i recommend correspondingly using
> "instanceid".

Is it reserved or you want to have it parallel with typid? I named it instance because it's exactly instance of some object, and because we use type(instance) for a double dispatxh. I'd have used type for typeid, but this one is a built-in. And, frankly, our typeid is even no longer typeid, because sometimes it's just callable (like dataclasses.is_dataclass). I couldn't come up with a good name since I implemented this, ideas are appreciated. :-)

> ahh ok. good explanation.
> 
> > and, yes, ints *need* paths, so you can tell which int you're referring to,
> 
> no, you really don't.  primitive types (bool int float complex str bytes)
> definitely do not need a path...

They do need paths if you want to differentiate which int comes from where. Jacob gave the nested collections for a reason.

> > like e.g. SelectableInt(123, 45)
> 
> ... but types *derived* from int might: this depends on the implementor,
> if there are any additional properties.
> 
> i.e. SelectableInt is no longer a leaf-node type, it is a tree-type
> due to itself having two leaf-nodes: the value and the bit-width.

This actually depends on how you treat it. In some cases you might opt to treat it as sequence of bits as well. This might be handy sometimes. But I don't really believe this should be part of visiting.
This is not the same with fields, though. These are entirely different beasts. And especially with fields which mark insn fields. They are also integers in some sense, but they should be walked with exact bits in mind.

Comment 201 Dmitry Selyutin 2023-06-20 00:13:32 BST

s/part of visiting/part of walking/

Comment 202 Luke Kenneth Casson Leighton 2023-06-20 00:59:50 BST

https://www.fluentpython.com/extra/function-introspection/

    "Example 6 shows the values of __defaults__, __code__.co_varnames, and
    __code__.co_argcount for the clip function listed in Example 5."

Comment 203 Luke Kenneth Casson Leighton 2023-06-20 01:17:55 BST

briefly:

* sorry i am only able to pay brief attention so i saw the [] and
  assumed it had been committed.
* __code__ and co_* etc. have been part of python since around 2.0.
  it's not going away, not with 25+ years of code in the world
  that would break.
* forcing visitors of leaf nodes to have to ignore a parameter that
  is meaningless i do not think is a good idea and i am looking for
  solutions.
* another option: the decorator-of-visitors stores in the registry
  whether the path option is wanted or not.
* alternative option: leaf-nodes on basic types never receive a
  path argument. they're just not necessary, so why have it?
* alternative option: there are *two* functions: one for leaf-nodes
  (mostly primitive types or any non-iterable class) that never need a
  path, and one for those that are iterable
* refinement: auto-identification of leaf-nodes vs tree-nodes:
  if the tree-iteration yields something of zero length, then it's
  pretty obvious that it's a leaf-node and the alternative function
  dedicated to leaf-nodes may be called.
* walking down individual bits is unnecessary and if someone
  ever wanted it they should sub-class int and provide a bit-enumerator
  (__iter__).  (nmigen does actually genuinely do this so it is not
  hypothetical)

Comment 204 Jacob Lifshay 2023-06-20 04:32:38 BST

(In reply to Luke Kenneth Casson Leighton from comment #203)
> * forcing visitors of leaf nodes to have to ignore a parameter that
>   is meaningless i do not think is a good idea and i am looking for
>   solutions.
> * another option: the decorator-of-visitors stores in the registry
>   whether the path option is wanted or not.
> * alternative option: leaf-nodes on basic types never receive a
>   path argument. they're just not necessary, so why have it?
> * alternative option: there are *two* functions: one for leaf-nodes
>   (mostly primitive types or any non-iterable class) that never need a
>   path, and one for those that are iterable
> * refinement: auto-identification of leaf-nodes vs tree-nodes:
>   if the tree-iteration yields something of zero length, then it's
>   pretty obvious that it's a leaf-node and the alternative function
>   dedicated to leaf-nodes may be called.

I think you have a misunderstanding of how path and leaf nodes (like int, float, bool, str) interact:

critically, for each function call, path is the instructions for how to get from the root node to the current node, path doesn't include any indexes or anything describing the insides/children of the current node.

Therefore, being a leaf node with no insides/children does *not* imply that the path is ignored, since the path was never describing the insides/children anyway, and since the insides/children are the only part we know to be empty due to being a leaf node.

a good example of why we do *not* want to ignore the path even on leaf nodes is when visiting the CSV dict for lhau (in major.csv):

{'opcode': '43',
 'unit': 'LDST',
<snip>
 'BR': '0',
 'sgn ext': '1',
 'upd': '1',
 'rsrv': '0',
 '32b': '0',
 'sgn': '0',
<snip>}

the nodes and paths that would be visited:
<snip>
    '0'            path=['BR']
    '1'            path=['sgn ext']
    '1'            path=['upd']
    '0'            path=['rsrv']
    '0'            path=['32b']
    '0'            path=['sgn']
<snip>

as you can see, the strings '0' and '1' get visited multiple times each, the only way the visitor can tell each '1' apart is by looking at path, which is where the critical information of *what the '1' means* is (does it mean this instruction sign-extends? or that it updates?). the only other slightly reasonable method of getting that information is relying on dict ordering, which is obviously not what we want to do.

Comment 205 Dmitry Selyutin 2023-06-20 07:25:47 BST

(In reply to Luke Kenneth Casson Leighton from comment #203)
> briefly:
> 
> * sorry i am only able to pay brief attention so i saw the [] and
>   assumed it had been committed.

If you intend to enter the discussion and especially use wordings like "catastrophic", then please DO read the code and the discussion around.

> * __code__ and co_* etc. have been part of python since around 2.0.
>   it's not going away, not with 25+ years of code in the world
>   that would break.

Having something for 25 years does not automatically means it will be there forever. Python developers are notorious example of breaking user examples. They broke strings. They broke iterators. They broke a lot of code which thought distinguishing unbound methods was a clever idea. You don't consider an existing experience; just stubbornly repeating they won't break it won't affect their choices at all. Again: you should not rely on this unless there are no other options. Even if you do — prefer inspect module.

> * forcing visitors of leaf nodes to have to ignore a parameter that
>   is meaningless i do not think is a good idea and i am looking for
>   solutions.

You don't understand what path is. Please re-read the explanations. Leaf nodes _have_ paths; root node _doesn't_, or, well, has it empty. Path means "how do I get here upon walking".
But you also don't understand how the code in example works, what's even worse. You don't read my comments, so I have to repeat it several times.

1. Walkers walk over some object recursively and yield everything they found there. Basically walking over node, which yields subnodes, and does it recursively.
2. When walker has something to yield from the object, it also yields what the node that was. As if it yields string which shows paths like "left.right.left.right". Since our nodes can have arbitrary amount of subnodes, and can be walked differently, the paths are different too.
3. Unless you want to use this information in visitor — you don't need it. I intentionally used a custom hook for object in my example because I _wanted_ to pass the path to print hierarchy. And if you see how I call the visitor, you'll see that I pass argument explicitly. But nothing forces to do it. It's walker which emits this information, not visitor. And visitors are for installing handlers on how to visit stuff.

> * walking down individual bits is unnecessary and if someone
>   ever wanted it they should sub-class int and provide a bit-enumerator
>   (__iter__).  (nmigen does actually genuinely do this so it is not
>   hypothetical)

You don't even read what example I bring. What's the point of this discussion then? Again: I'm talking about situation when bits are sparsed, like in instruction fields.


Luke, if you enter the discussion, especially with quite strong wordings, please care to read and analyze what others already said. I'm a patient person and can repeat several times, but I'm really concerned this became a repeating pattern.

Comment 206 Dmitry Selyutin 2023-06-20 07:28:42 BST

Jacob, about paths et al. Shouldn't such approach yield parent too, like os.walk() does with dirpath vs dirnames/filenames?

Comment 207 Jacob Lifshay 2023-06-20 08:20:42 BST

(In reply to Dmitry Selyutin from comment #206)
> Jacob, about paths et al. Shouldn't such approach yield parent too, like
> os.walk() does with dirpath vs dirnames/filenames?

yeah, that would be useful too!

Comment 208 Dmitry Selyutin 2023-06-20 19:19:41 BST

OK, I found a way to handle paths elegantly and keep the original identifiers as well. The trick is to yield both the original key/index/element and something which designates how to format it.

https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=c025a35e2d0f1fc3701d266aea52a0cb00abd8b7

Comment 209 Dmitry Selyutin 2023-06-20 19:51:02 BST

Folks, based on the example below (A is a dataclass, cut for brevity):

items = (
    {("a", 1): 3.4, 2: {5, 6}},
    {3: A(5, [12, 3])},
)
for (item, path) in walker(items):
    with visitor(item, path=path):
        pass


Here are results:

     [0] => {('a', 1): 3.4, 2: {5, 6}}
         [0]{('a', 1)} => ('a', 1)
             [0]{('a', 1)}[0] => a
             [0]{('a', 1)}[1] => 1
         [0][('a', 1)] => 3.4
         [0]{2} => 2
         [0][2] => {5, 6}
             [0][2]{5} => 5
             [0][2]{6} => 6
     [1] => {3: A(a=5, b=[12, 3])}
         [1]{3} => 3
         [1][3] => A(a=5, b=[12, 3])
             [1][3].a => 5
             [1][3].b => [12, 3]
                 [1][3].b[0] => 12
                 [1][3].b[1] => 3


Does it meet your expectations? As you see I chose {} for hashes and pass the stuff which was used. Currently these things cannot be accessed directly. I think PartId, if used modified to be used with the parent, can not only format, but also access parent in a correct way (via some lambda or whatever). Do we need such functionality?
Perhaps we might yield something which is correctly formatted and on its __call__ one can access the entity. Thoughts and opinion? I think it's somewhat counter-intuitive, as we discussed, I'd like to keep the original value. However, perhaps that's fine too?

for item in walker(items):
    print(item.path)
    print(item.parent)
    with visitor(item.node, path=path):
        pass

...instead of this:

for (node, parent, path) in walker(items):
    print(path)
    print(parent)
    with visitor(node, path=path):
        pass

Comment 210 Jacob Lifshay 2023-06-20 20:07:04 BST

(In reply to Dmitry Selyutin from comment #209)
> Does it meet your expectations? As you see I chose {} for hashes and pass
> the stuff which was used.

lgtm, though imho Hash is a misnomer, maybe name it GetKey?

> 
> for (node, parent, path) in walker(items):

imho this or the one returning objects with node, parent, and path fields is good. I do think we should pick whichever option is faster, since the walker is likely hot code. if we can guarantee callers won't keep the returned object around, the fastest would probably to return the same object every time and just update its node, etc. fields.

Comment 211 Dmitry Selyutin 2023-06-20 20:11:36 BST

(In reply to Jacob Lifshay from comment #210)
> (In reply to Dmitry Selyutin from comment #209)
> imho this or the one returning objects with node, parent, and path fields is
> good. I do think we should pick whichever option is faster, since the walker
> is likely hot code.

I think yielding a tuple would be better then. Plus I kinda dislike the wrapping here, though cannot clearly formulate why (even for myself).

Also, on paths. If we yield the parent (which I assumed is the original instance), perhaps the path should point to this parent, and not to the root? We currently keep the whole path.

Comment 212 Jacob Lifshay 2023-06-20 20:23:05 BST

(In reply to Dmitry Selyutin from comment #211)
> (In reply to Jacob Lifshay from comment #210)
> > (In reply to Dmitry Selyutin from comment #209)
> > imho this or the one returning objects with node, parent, and path fields is
> > good. I do think we should pick whichever option is faster, since the walker
> > is likely hot code.
> 
> I think yielding a tuple would be better then. Plus I kinda dislike the
> wrapping here, though cannot clearly formulate why (even for myself).
> 
> Also, on paths. If we yield the parent (which I assumed is the original
> instance), perhaps the path should point to this parent, and not to the
> root? We currently keep the whole path.

you still need to know which child of the parent you are...

also, if you use a list for path, and just push and pop indexes from the end rather than making a new object every time then you don't need to constantly allocate as much, which is a bit faster.

also, i think the library should provide a function that has the top-level for loop running both walker and visitor

Comment 213 Dmitry Selyutin 2023-06-20 20:28:26 BST

That's what I mean:

nodes = (
    {("a", 1): 3.4, 2: {5, 6}},
    {3: A(5, [12, 3])},
)
for (node, parent, pathid, path) in walker(nodes):
    print(parent, pathid(path), "=>", node)


({('a', 1): 3.4, 2: {5, 6}}, {3: A(a=5, b=[12, 3])}) [0] => {('a', 1): 3.4, 2: {5, 6}}
{('a', 1): 3.4, 2: {5, 6}} {('a', 1)} => ('a', 1)
('a', 1) [0] => a
('a', 1) [1] => 1
{('a', 1): 3.4, 2: {5, 6}} [('a', 1)] => 3.4
{('a', 1): 3.4, 2: {5, 6}} {2} => 2
{('a', 1): 3.4, 2: {5, 6}} [2] => {5, 6}
{5, 6} {5} => 5
{5, 6} {6} => 6
({('a', 1): 3.4, 2: {5, 6}}, {3: A(a=5, b=[12, 3])}) [1] => {3: A(a=5, b=[12, 3])}
{3: A(a=5, b=[12, 3])} {3} => 3
{3: A(a=5, b=[12, 3])} [3] => A(a=5, b=[12, 3])
A(a=5, b=[12, 3]) .a => 5
A(a=5, b=[12, 3]) .b => [12, 3]
[12, 3] [0] => 12
[12, 3] [1] =>

Comment 214 Jacob Lifshay 2023-06-20 20:35:00 BST

(In reply to Jacob Lifshay from comment #212)
> also, if you use a list for path, and just push and pop indexes from the end
> rather than making a new object every time then you don't need to constantly
> allocate as much, which is a bit faster.

e.g.:
class Walker(...):
    def __init__(self):
        self.path = []
        self.path_kind = []

    @Hook(list, tuple)
    def walk_seq(self, value):
        for i, v in enumerate(value):
            self.path.append(i)
            self.path_kind.append(Index)
            yield (v, self.path, self.path_kind, value)
            yield from self(v)
            self.path.pop()
            self.path_kind.pop()

    @Hook(dict)
    def walk_dict(self, value):
        for k, v in value.items():
            self.path.append(k)
            self.path_kind.append(GetKey)
            yield (k, self.path, self.path_kind, value)
            yield from self(k)
            self.path_kind[-1] = Index
            yield (v, self.path, self.path_kind, value)
            yield from self(v)
            self.path.pop()
            self.path_kind.pop()

    ...

Comment 215 Jacob Lifshay 2023-06-20 20:37:24 BST

(In reply to Dmitry Selyutin from comment #213)
> That's what I mean:
ah, so just the last element of the path

Comment 216 Dmitry Selyutin 2023-06-20 20:39:00 BST

(In reply to Jacob Lifshay from comment #212)
> (In reply to Dmitry Selyutin from comment #211)
> > Also, on paths. If we yield the parent (which I assumed is the original
> > instance), perhaps the path should point to this parent, and not to the
> > root? We currently keep the whole path.
> 
> you still need to know which child of the parent you are...

cf. below (#213 I think?). I think if we have both parent and path to this parent we could reconstructthe whole path... Or not?

> also, if you use a list for path, and just push and pop indexes from the end
> rather than making a new object every time then you don't need to constantly
> allocate as much, which is a bit faster.

I just don't like APIs which modify builtins in-place. This can sometimes strike back.

> also, i think the library should provide a function that has the top-level
> for loop running both walker and visitor

I'm intentionally leaving this task to caller because:
1. It's up to caller what to pass from walker to visitor.
2. It's up to caller what to pass else.
3. It's up to caller which kind of visitor to choose.
4. It's up to caller which action to after the hook was called or yielded back to caller.

The best form I could think of is:

def visit(root, walker, visitor, wrapper=lambda rv: rv,*args, **kwargs):
    if isinstance(visitor, ContextVisitor):
        for (node, parent, path) in walker(root):
            with visitor(node, parent=parent, path=path, *args, **kwargs):
                wrapper(rv)
    else:
        for (node, parent, path) in walker(root):
            wrapper(visitor(node, parent=parent, path=path, *args, **kwargs))

This seems complex and clumsy to use. Perhaps you have other ideas?

Comment 217 Dmitry Selyutin 2023-06-20 21:31:55 BST

The code which yields only last path component looks much better. Considering this statement, can we ensure that the users are capable of reconstructing the whole path if needed?

for (node, parent, path, kind) in walker(root):
    magic(...)

How can the magic call construct the whole tree as it was done when we kept the whole paths?

Comment 218 Luke Kenneth Casson Leighton 2023-06-20 22:34:03 BST

am on atrain. late. brief.

(In reply to Dmitry Selyutin from comment #208)
> OK, I found a way to handle paths elegantly and keep the original
> identifiers as well. The trick is to yield both the original
> key/index/element and something which designates how to format it.

ahh i love that idea, just be careful the "formatter" does not
expand out to "and we need another feature, and another".

> https://git.libre-soc.org/?p=mdis.git;a=commitdiff;
> h=c025a35e2d0f1fc3701d266aea52a0cb00abd8b7

     @dispatcher.Hook(dataclasses.is_dataclass)
-    def dispatch_dataclass(self, instance):
+    def dispatch_dataclass(self, instance, path=[]):  <- path=()
         for field in dataclasses.fields(instance):
             key = field.name
             value = getattr(instance, key)
-            yield (key, value)
-            yield from self((key, value))
+            part = (PartId.Attribute, key)
+            parts = (path + (part,))
+            yield (value, parts)
+            yield from self(value, path=parts)
+
+    @dispatcher.Hook(dict)
+    def dispatch_mapping(self, instance, path=[]): <- path=()
+        for (key, value) in instance.items():

Comment 219 Dmitry Selyutin 2023-06-20 22:47:35 BST

Ah yes, this path sequence is almost dropped iff someone can suggest how to solve #217 in an elegant fashion. As for formatter — yes, I also have a feeling this will be extended, perhaps enum values should be replaced with separate classes.

Comment 220 Dmitry Selyutin 2023-06-21 17:53:30 BST

(In reply to Luke Kenneth Casson Leighton from comment #218)
> 
> ahh i love that idea, just be careful the "formatter" does not
> expand out to "and we need another feature, and another".

Updated the code with separate classes. These can be extended as we wish. The walking process stays the same:

for (node, parent, path, pathcls) in walker(nodes):
    print(parent, pathcls(path), "=>", instance)

The main idea is that path appears as it was; if the users need to format them or something else not covered by "pure" object we walked, they can use pathcls.

> +    def dispatch_dataclass(self, instance, path=[]):  <- path=()

These are already dropped: we yield only the last path component. However, I'm still not sure if we're able to build the full path on our own if we have all parents for each node. I'd really like path to be simple and be connected to parent (not to the top-level object). Ideas?

Comment 221 Dmitry Selyutin 2023-06-21 17:58:22 BST

(In reply to Dmitry Selyutin from comment #220)
> 
> Updated the code with separate classes. These can be extended as we wish.

I forgot to post the commit, here it is: https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=f4c50f8ab5bcd4ead21759ea49840f549ac442d2

Comment 222 Luke Kenneth Casson Leighton 2023-06-21 18:52:34 BST

(In reply to Dmitry Selyutin from comment #219)
> Ah yes, this path sequence is almost dropped iff someone can suggest how to
> solve #217 in an elegant fashion. 

not a problem.

> As for formatter — yes, I also have a
> feeling this will be extended,

yeahno then it is a bad idea.  us "dictating" the format is a nonstarter.

> perhaps enum values should be replaced with
> separate classes.

as markos said this is taking too long.

path is not a necessary argument to pass given that the walker itself
is supposed to perform the tracking.

can we please move off of "hypothetical" development and move on
to "practical use" and see how it works out, then revisit if it
turns out to be necessary based on actual real-world usage?

(In reply to Dmitry Selyutin from comment #217)
> The code which yields only last path component looks much better.

yes.  we went over that last week.  it should be instanceid.

> Considering this statement, can we ensure that the users are capable of
> reconstructing the whole path if needed?
> 
> for (node, parent, path, kind) in walker(root):
>     magic(...)
> 
> How can the magic call construct the whole tree as it was done when we kept
> the whole paths?

not the API's problem, that is the walker's problem.

we went over this last week already (walker performs stack.push and pop).

this is standard practice for walker/visitor APIs.

Comment 223 Dmitry Selyutin 2023-06-21 19:21:50 BST

(In reply to Luke Kenneth Casson Leighton from comment #222)
> (In reply to Dmitry Selyutin from comment #219)
> 
> yeahno then it is a bad idea.  us "dictating" the format is a nonstarter.

No, I mean not dicating the formatter, but other functionality. Anyway that turned out to be simpler (cf. my previous message).

> > perhaps enum values should be replaced with
> > separate classes.

Already done.

> 
> as markos said this is taking too long.
> 
> path is not a necessary argument to pass given that the walker itself
> is supposed to perform the tracking.
> 
> can we please move off of "hypothetical" development and move on
> to "practical use" and see how it works out, then revisit if it
> turns out to be necessary based on actual real-world usage?

Well, handling paths is a real-world usage. We have `pysvp64db tree` command, which needs exactly that. Yes this _is_ taking so long, because I want to cover cases I have in mind, and also because [recall the communication we had so far]. Don't think I don't want to finish this task, the sooner it's finished the sooner I can start with binutils.

> 
> (In reply to Dmitry Selyutin from comment #217)
> > The code which yields only last path component looks much better.
> 
> yes.  we went over that last week.  it should be instanceid.

You're mixing two different unrelated things. The object itself and the way we reached it. Anyway, as for naming, I think "node" is a perfect name. Are you fine with it?

> > Considering this statement, can we ensure that the users are capable of
> > reconstructing the whole path if needed?
> > 
> > for (node, parent, path, kind) in walker(root):
> >     magic(...)
> > 
> > How can the magic call construct the whole tree as it was done when we kept
> > the whole paths?
> 
> not the API's problem, that is the walker's problem.

This is not about who yields the path. This is about what's yielded. You're probably speaking of something else. My question is whether the path needs to be full (to root, upon which we started walking) or to relative (parent).

Anyway, I'm mostly fine with how it looks. I'll return to paths issue when I handle `pysvp64db tree` command.

Comment 224 Luke Kenneth Casson Leighton 2023-06-21 22:26:48 BST

(In reply to Dmitry Selyutin from comment #223)

> Already done.

awesome.

> Well, handling paths is a real-world usage. We have `pysvp64db tree`
> command, which needs exactly that. Yes this _is_ taking so long, because I
> want to cover cases I have in mind,

can you do it as a sub-class or mix-in class?  not part of the primary
API but an "example that someone might commonly use"?  and if it becomes
common enough it gets moved *into* mdis?

> You're mixing two different unrelated things. 

almost certainly. i'm not able to keep up fully at the moment, sorry.

> The object itself and the way
> we reached it. Anyway, as for naming, I think "node" is a perfect name. Are
> you fine with it?

yes.
 
> This is not about who yields the path. This is about what's yielded. You're
> probably speaking of something else. My question is whether the path needs
> to be full (to root, upon which we started walking) or to relative (parent).

relative.
 
> Anyway, I'm mostly fine with how it looks. I'll return to paths issue when I
> handle `pysvp64db tree` command.

subclass or mixin.

Comment 225 Dmitry Selyutin 2023-06-21 23:52:38 BST

Wow. Just wow. Implementation of "pysvp64db operands". The main issue with operands is that their instantiation is delayed; the record itself just contains the Operands tuple, which has an operand class to be instantiated and necessary arguments. And each operand then gets record instance and kinda "binds" it to itself. With old API, we had to cheat, and established hooks on Record instead, then traversed operands manually. With the new API, it's simple:

class OperandsVisitor(InstructionVisitor):
    def __init__(self):
        self.__record = None
        return super().__init__()

    @mdis.dispatcher.Hook(Record)
    @contextlib.contextmanager
    def dispatch_record(self, node):
        self.__record = node
        yield node

    @mdis.dispatcher.Hook(Operands)
    @contextlib.contextmanager
    def dispatch_operands(self, node):
        for (cls, kwargs) in node:
            operand = cls(record=self.__record, **kwargs)
            print(operand.name, ", ".join(map(str, operand.span)))
        yield node

The trick here is MRO; Operands is simply declared as "class Operands(tuple)", and the first hook to be matched is picked. Previously we had to create classes like Dict, Tuple and similar, now we just match MRO. And, since we know Operands will be eventually called from Record (actually Record.MarkdownRecord.Operands), we simply catch the record which was visited before and use it for instantiation. No custom Tuple classes, no cheat hooks on Records.

Comment 226 Dmitry Selyutin 2023-06-21 23:56:08 BST

Here's how the visiting looks like:

walker = Walker()
    for (node, *_) in walker(root):
        with visitor(node):
            pass

As you see all visitors don't need anything but node. I had to temporarily deprecate TreeVisitor, though; it's the only one which needs paths.

Comment 227 Dmitry Selyutin 2023-06-22 16:39:16 BST

(In reply to Dmitry Selyutin from comment #147)
> Roadmap is:
> 1. Update setup.py and install the module. Perhaps we should install it with
> devscripts?
> 2. Deprecate Node, walk() and similar stuff.
> 3. Switch db.py to new approach.
> 

1. Introduced dependency in hdl-dev-repos instead: https://git.libre-soc.org/?p=dev-env-setup.git;a=commitdiff;h=31539e2fdbce1ac5f16792440e6b1acf8a5f1a32
2. Node, walk, Dict et al. are deprecated: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=89958b87cedb6dda485e00049f82703ce19f739a
3. db.py is switched to new approach (ibid.). I had to drop TreeVisitor for a while, though.

Despite that formally this task is completed, at considering the steps I mentioned, I'd like to practice a bit with other pysvp64db commands. I can handle it in the scope of this task. I'm going to add few commands, because pysvp64db seems to be quite a useful tool.

After this we can switch to other tasks which use visitors (I suggest codegen again, but this time in visitor API). Any objections?

Comment 228 Luke Kenneth Casson Leighton 2023-06-22 16:49:48 BST

(In reply to Dmitry Selyutin from comment #227)

> 1. Introduced dependency in hdl-dev-repos instead:
> https://git.libre-soc.org/?p=dev-env-setup.git;a=commitdiff;
> h=31539e2fdbce1ac5f16792440e6b1acf8a5f1a32

perfect. beyond that if people don't run scripts they are on their own

> 2. Node, walk, Dict et al. are deprecated:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=89958b87cedb6dda485e00049f82703ce19f739a

looks waay better.

> 
> Despite that formally this task is completed, at considering the steps I
> mentioned, I'd like to practice a bit with other pysvp64db commands. I can
> handle it in the scope of this task. I'm going to add few commands, because
> pysvp64db seems to be quite a useful tool.

er yes? :)

i will mention it on OPF discuss.

> After this we can switch to other tasks which use visitors (I suggest
> codegen again, but this time in visitor API). Any objections?

go for it, just remember there are numerous budgets kicking around,
do a quick review

https://bugs.libre-soc.org/buglist.cgi?email1=ghostman&emailassigned_to1=1&emailtype1=substring&list_id=4672&query_format=advanced&resolution=---

drat not so useful, must have forgotten to set you as assignee for
some of the visitor ones.

Comment 229 Dmitry Selyutin 2023-06-22 19:42:54 BST

I've refactored extras command. Now it's pure classes-based and demonstrates some powerful tricks with visitors. The output is perhaps not that pretty, but now we don't cheat and, what's more important, demonstrate how to use visitors.

$ pysvp64db extras sv.ldu
in1 RA0
in2 CONST_DS
in3 NONE
out RT
cr_in NONE
cr_in2 NONE
cr_out NONE
in1 RA0
in2 NONE
in3 NONE
out RT
out RA
cr_in NONE
cr_in2 NONE
cr_out NONE
EXTRA0 d:RT
EXTRA1 d:RA
EXTRA1 s:RA

Here's the relevant commit: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=bf208c7a6548cf6ae6ce33977a2fb8e3a463114a. The implementation is somewhat less obvious than when we cheat, but is interesting for demonstration purposes.

Comment 230 Dmitry Selyutin 2023-06-22 19:45:13 BST

Also, the next commit also introduces selectors command; unlike extras, this isn't specific for svp64 instructions. Since extras are quite useless without selectors, the extras visitor simply inherits selectors visitor!

https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=348016ff0f96d051e74d78e3dfd97d5f5606ceea

Comment 231 Luke Kenneth Casson Leighton 2023-06-23 00:18:19 BST

(In reply to Dmitry Selyutin from comment #229)

> Here's the relevant commit:
> https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;
> h=bf208c7a6548cf6ae6ce33977a2fb8e3a463114a. The implementation is somewhat
> less obvious than when we cheat, but is interesting for demonstration
> purposes.

yeah it's really short.  suggestion, initialise index to 0
then increment *after* the yield.  set to -1 then increment
before is counter-intuitive.

Comment 232 Dmitry Selyutin 2023-06-23 08:55:48 BST

(In reply to Luke Kenneth Casson Leighton from comment #231)
> 
> yeah it's really short.  suggestion, initialise index to 0
> then increment *after* the yield.  set to -1 then increment
> before is counter-intuitive.

I chose -1 as an alternative to accessing (index - 1). But both approaches are somewhat annoying. The thing is that _all_ code in parent is called _before_ child's code. Here's how it looks simplified:

for (node, *) in walk(nodes): # recursively
    node.__enter__()
    node.code()
    node.__exit__()

If we iterate over some child node, we've already called __enter__ and __exit__ for its parent. Frankly I'm not sure how to circumvent this. I like the semantic you assumed, though. This would kinda be similar to `for` + `with` combined.

Comment 233 Dmitry Selyutin 2023-06-23 09:08:37 BST

(In reply to Dmitry Selyutin from comment #232)
>
> Frankly I'm not sure how to circumvent this.
> 

Hm. I think if we only enter one-level we should resolve it:

def visit(root, visitor, walker):
    with visitor(root):
        for node in walker(root):
            return visit(node, visitor, walker)

Prerequisite is that walker doesn't go deeper than 1 level.

UPD: considered root node

Comment 234 Dmitry Selyutin 2023-06-23 09:19:56 BST

Yes this should do the trick. Ignore `return visit`, this should be just `visit`.

Comment 235 Dmitry Selyutin 2023-06-23 09:42:32 BST

Yes, this works! I'm waiting for CI to make sure nothing is broken, but with the relevant changes in mdis, I can increment index after the yield. The code therefore becomes:

    def traverse(root, visitor, walker):
        with visitor(root):
            for (node, *_) in walker(root):
                traverse(root=node, visitor=visitor, walker=walker)

    traverse(root=root, visitor=visitor, walker=Walker())

Comment 236 Dmitry Selyutin 2023-06-23 12:20:38 BST

OK, same 5 failures as before:
https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4353802

I've merged the updated pysvp64db and also updated mdis code. Several new commands added, can be checked by `pysvp64db --help`.

I suppose the approach is viable and extendable. I suggest therefore to mark this task as completed and move to something more practical (read "code generation"). Any objections?

Comment 237 Dmitry Selyutin 2023-06-25 06:38:56 BST

Based on discussion in IRC, I'm marking this as resolved. Strangely it has never been marked as "in progress". :-)

Comment 238 Dmitry Selyutin 2023-06-28 23:20:56 BST

After discussion, I've decided to reimplement arguments detection. From now on, we always expect at least 2 positional arguments: a dispatcher instance and a node. However, the caller can pass any additional arguments, be it positionals or keywords. The detection is done via inspect.

Also, the concrete hook now wraps the original function; therefore ConcreteHook.__call__ refers to the original method we wrapped. Example for Walker.dispatch_unordered_sequence (irrelevant details snipped):

class ConcreteHook(Hook)
 |  ConcreteHook(*typeids)
 |  
 |  Methods defined here:
 |  
 |  __call__ = dispatch_unordered_sequence(self, node)

https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=02ef49949991a739662a663ebe64ef00215c4e4d
https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=0364578c804080d24f4e8b656c58c83f9d0abb51
https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=e132cae998a7dce910efe4c2f160e7cc3c93e5ee
https://git.libre-soc.org/?p=mdis.git;a=commitdiff;h=f91c0f7cfeb121688b26191f7a1cadcfaa1ceea2

Is it sufficient to mark this task as completed, or we have other ideas?

Comment 239 Dmitry Selyutin 2023-06-29 11:40:25 BST

What'd be the correct task for "reimplement the code generation based on visitors so that it's usable for both binutils and anything else"?

Comment 240 Luke Kenneth Casson Leighton 2024-02-04 02:38:04 GMT

(In reply to Dmitry Selyutin from comment #239)
> What'd be the correct task for "reimplement the code generation based on
> visitors so that it's usable for both binutils and anything else"?

that's a massive task that has to go on an entirely new grant.
bug #1211 and bug #1212.