a *lot* of verilog code is being manually converted to nmigen, it's getting boring. time to write a tool that helps. simplest one initially is just a straight string/pattern-matcher... more sophisticated (later) can use python-ply and find a BNF lex/yacc format for it [python-ply can auto-convert c-style lex/yacc BNF into a stub python module] it needs to be a language translator, one of the features (requirements) of which is to preserve as much of the original sv structure as possible (code comments, code order etc.) https://git.libre-soc.org/?p=sv2nmigen.git;a=shortlog
what do you think of writing a tool in Rust that uses yosys to convert the verilog to ilang, then yosys prints out json, which we can use the serde library to parse and then print python and finally use a python code formatter to format it? If we do it that way, we can have yosys, serde, and the python formatter do all the hard parts
been up investigating this overnight (sigh.. :) ) if the system verilog syntax wasn't monstrously large, writing a parser from scratch would be a reasonable proposition. if yosys was up to the task, i agree it would be a good idea to use it. i considered ilang as an intermediary, however yosys has a habit of destroying if-elif-elif constructs and replacing them with casez statements, splitting out individual variables into their own parallelised state machine, and much other weirdness that would make it virtually impossible to recognise the output. not only that: this is a file from e.g. the axi_rab rtl: yosys> read_verilog fsm_expand.sv 1. Executing Verilog-2005 frontend. Parsing Verilog input from `fsm_expand.sv' to AST representation. Lexer warning: The SystemVerilog keyword `logic' (at fsm_expand.sv:21) is not recognized unless read_verilog is called with -sv! fsm_expand.sv:21: ERROR: syntax error, unexpected TOK_ID, expecting ',' or '=' or ')' it's using *Cadence* system verilog syntax, that even iverilog has some capabilities missing, due to Cadence violating the verilog standard and iverilog conforming to it. yosys doesn't stand a chance: it is missing far too much of system verilog. i've used it a couple of times: dabeaz wrote an example that was capable of actually understanding yacc-formatted files, searching for their BNF strings, and outputting an actual python-ply program that python-ply could understand. it's a two-step process that is in no way fully automated, however i have an 8 *THOUSAND* line LALR parser - that i didn't write - that's come *directly* out of icarus verilog source code (parse.y pushed through yply.py). generated in under a second, needed review, found some bugs in parse.y, fixed, moved on. currently i am munging the icarus verilog lex file into python (that was this morning's successful task) so that's phase (1) the phase after that, i have had a lot of success in the past using lib2to3's AST code, as it was designed to include whitespace (where the standard python AST library does not), and has some additional nice features including pattern-matching node-visitors that are extremely comprehensive and also extremely well-documented. what's particularly important about lib2to3's AST code is that it has a dead-accurate pretty-printer. after all, it *was* written to do python2-to-python3 and vice-versa code-conversion. phase 2 is to replace all of the print statements in the auto-generated code with python lib2to3 AST statements. phase 3 - which can be done as-and-when - will be to create some node-visitors that *MODIFY* the python AST, searching for the kinds of patterns that are expected in verilog, however are silly to keep in nmigen. one example is the practice by the eth-zurich team of following a convention varname_n, varname_q then assigning the varname_n to the initial input, over-riding it later, and then in a sync block assigning varname_q <= varname_n - something like that, at least. lib2to3's pattern-matching node-visitor/walker is a good match for removing (reworking) the AST to be much more along the lines of nmigen conventions. the alternative is to write a simple line-by-line code-converter doing basic pattern-matching. i've done that before when converting massive amounts of java to python. two weeks to convert 20,000 lines of code, i was very very bored by the end :) i'm going to give this maybe... 2-3 days, max, to see if it's a viable approach (python-ply plus lib2to3's AST+pretty-printer). if it's not making very *very* rapid progress, i'll re-evaluate the "string-matcher" version as a way to remove *most* of the drudge work. from experience though, i know that such line-by-line string-matchers are "WORN" - write once, read never :) irony is, normally this would be considered a major, major software project in its own right. it's a good candidate for a NLnet milestone, which is why i'd like to take it a bit more seriously than just a "string-matcher"
(In reply to Luke Kenneth Casson Leighton from comment #2) > yosys> read_verilog fsm_expand.sv > 1. Executing Verilog-2005 frontend. > Parsing Verilog input from `fsm_expand.sv' to AST representation. > Lexer warning: The SystemVerilog keyword `logic' (at fsm_expand.sv:21) is > not recognized unless read_verilog is called with -sv! > fsm_expand.sv:21: ERROR: syntax error, unexpected TOK_ID, expecting ',' or > '=' or ')' did you try `read_verilog -sv fsm_expand.sv`?
hm good point, let's see... yosys> read_verilog -sv fsm_expand.sv 1. Executing Verilog-2005 frontend. Parsing SystemVerilog input from `fsm_expand.sv' to AST representation. fsm_expand.sv:59: ERROR: syntax error, unexpected TOK_TYPEDEF line 59: typedef enum logic {IDLE, WAIT} state_t; so that would be information lost (iverilog supports the typedef keyword) yosys> help read_verilog -sv enable support for SystemVerilog features. (only a small subset of SystemVerilog is supported) yosys> read_verilog -sv load_unit.sv 1. Executing Verilog-2005 frontend. Parsing SystemVerilog input from `load_unit.sv' to AST representation. load_unit.sv:22: ERROR: syntax error, unexpected TOK_ID, expecting ',' or '=' or ')' (there's a type there): input lsu_ctrl_t lsu_ctrl_i, if i put the import back in (which iverilog barfs on) yosys still barfs: https://github.com/steveicarus/iverilog/issues/102 import ariane_pkg::*; // <<---- module load_unit ( this is a known issue in icarus verilog... and i can *guarantee* it will be easier to fix that in the python-ply BNF than it will be to try to patch the (c-based) iverilog source code, first. so... * yosys isn't up to the job and it would be months possibly years until any feature requested is added to support *Cadence* undocumented systemverilog features * iverilog likewise would be months to add the same... and the developer would require to go in a different direction anyway * yosys would destroy valuable information, performing hardware-suitable topological translations * we need a language *translator* where yosys is a language *compiler*. a language translator's focus is to preserve as much of the original language's features (such as code comments, structure, order of the original code and so on) * extracting the BNF syntax from iverilog is done already (automated) * modifying the BNF syntax will be a heck of a lot easier without the primary purpose of either yosys or iverilog being in the way
any other ideas? (will update comment 1 to clarify the requirement to preserve as much of the original code as possible)
we might be able to use slang: https://github.com/MikePopoloski/slang there is a godbolt-style website for trying it out: http://sv-lang.com If we have to write the systemverilog parser ourselves, I think it would be less work to just manually translate the systemverilog code if we have less than 10kloc or so of code to translate.
https://github.com/MikePopoloski/slang/blob/master/scripts/grammar.txt https://github.com/MikePopoloski/slang/blob/master/scripts/syntax_gen.py iinteresting! good find! i like the approach, split out the BNF into straight text files and write a syntax/grammar-generator that spews out code-fragments. it reminds me of the approach i took with the direct python-webkit bindings. mike has however jumped direct to c. that would be the cut-off point for adaptation (extraction of grammar.txt, syntax_gen.py etc) unless slang can cope / be a basis in c++ for some form of intermediary translation... removing features of Cadence systemverilog... tried out sv-lang.com, i appear to have crashed it, whoops ;) typed in "import ariane_pkg::*;" and it reaaally didn't like it :)
(In reply to Jacob Lifshay from comment #6) > If we have to write the systemverilog parser ourselves, I think it would be > less work to just manually translate the systemverilog code if we have less > than 10kloc or so of code to translate. i'm starting to get RSI again, so "less typing" is a high priority. ariane's 16,000 lines, axi_rab is 6,000 - both include some valuable worked examples of axi4. it easily takes me... a day to do 300-400 lines of verilog / sv manual translation... we still have the jon dawson IEEE754 code to do (FCVT from 32-64 and 64-32)... as a subproject, just for those alone it's easily justifiable on the time it would save.
https://git.libre-riscv.org/?p=sv2nmigen.git;a=summary that's where it's at, so far. the code's an absolute dog's dinner-looking mess, however, incredibly, it actually walked one of axi_rab.sv's files i've had to comment out much of the lexer for a first iteration, just to get it up and running, rapidly. some of that will have consequences such as disabling the lexer's ability to detect types and imports, which can be reintroduced incrementally. also the timestamp recognition isn't working yet, plus the number formats need some regex's / conversion (c code from the lexer replaced with python that does the same job) etc. etc.
grep "def p_" parse_sv.py | wc 1095 2190 31280 *shocked*!! that's one mmmmaaaaasive number of parser states! luckily it is not necessary to do all of them. UDP can be entirely skipped, for example. also, many of them will be incredibly simple: "return one of the things that came through from a previous state". i've done a couple of the states, to see what they look like. this is "lpvalue '=' expression ';': expr = Node(syms.expr_stmt, [p[1], Leaf(token.EQUAL, p[2]), p[3] ]) p[0] = expr p[2] needed doing (the lpvalue), p[3] likewise, and it comes out like this: Node(expr_stmt, [Leaf(1, 'port1_accept_SN'), Leaf(22, '='), Leaf(2, "1:'b0")])') which when instead of doing repr, the lib2to3 "Node" class *already* has the capability to print out the python code: 'port1_accept_SN=1:'b0' which is, apart from the spaces, and that i haven't completed the number-system, is exactly what's needed. so the AST gets recursively constructed, from the leaf-nodes down, end-result, python code!
> Hendrik Boom hendrik@topoi.pooq.com via lists.libre-riscv.org > 5:44 PM (1 minute ago) > > *shocked*!! that's one mmmmaaaaasive number of parser states! > It makes me suspect that either the language isn't well-designed, > or that the grammar formalism isn't a good match for the language. to be fair, i used an auto-conversion tool (dabeaz yply.py) which instead of keeping groups of ORed BNF syntax-rules together (which would appear to keep numbers down), the tool split them out as individual functions. this does make the code less costly to write (no need to test the length of the list of tokens), however it kiiinda gives the false impression that the *syntax* is faulty. udp_initial_expr_opt : '=' expression { $$ = $2; } | { $$ = 0; } ; becomes: def p_udp_initial_expr_opt_1(p): '''udp_initial_expr_opt : '=' expression ''' print('udp_initial_expr_opt_1', list(p)) p[0] = p[2] def p_udp_initial_expr_opt_2(p): '''udp_initial_expr_opt : ''' print('udp_initial_expr_opt_2', list(p)) # { $$ = 0; } yes that was me working out that "{ $$ = $2 }" can be global/search/replaced with "p[0] = p[2]". that's about 20% of the 1,000 rules done, right there.
module fsm #( parameter AXI_M_ADDR_WIDTH = 40, parameter AXI_S_ADDR_WIDTH = 32, parameter AXI_ID_WIDTH = 8, parameter AXI_USER_WIDTH = 6 ) --> class fsm: def __init__(AXI_M_ADDR_WIDTH=40, AXI_S_ADDR_WIDTH=32, AXI_ID_WIDTH=8, AXI_USER_WIDTH=6): woo!
parameters sort-of done: input logic prefetch_i, input logic [AXI_S_ADDR_WIDTH-1:0] in_addr_i, input logic [AXI_ID_WIDTH-1:0] in_id_i, input logic [7:0] in_len_i, class fsm: def __init__(AXI_M_ADDR_WIDTH=40, AXI_S_ADDR_WIDTH=32, AXI_ID_WIDTH=8, AXI_USER_WIDTH=6): self.prefetch_i = Signal() # input self.out_addr_i = Signal(AXI_M_ADDR_WIDTH) # input self.in_id_i = Signal(AXI_ID_WIDTH) # input self.in_len_i = Signal(8) # input i am however wondering if the use of python AST is interfering with the pace at which this code could be written, or whether it could turn out to be useful. it's actually really hard to tell if information would be lost by choosing to drop down to ad-hoc data structures and plain text-strings. i am starting to get used to examining the c++ code (from icarus verilog parse.y) and using it for not just guidance but as *actual code* that works, after some form of regular pattern-match substitution from c++ to python. this is extremely weird to involve *three* simultaneous languages... python, verilog, and c++ ....
I recently came across pyverilog which will parse verilog(not sure how robust its sv capabilities are). https://github.com/PyHDI/Pyverilog Why are we converting verilog into nMigen?
Pyverilog spits out a nice and tasty python AST too.
see: http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2020-January/003261.html
(In reply to Yehowshua from comment #14) > I recently came across pyverilog which will parse verilog(not sure how > robust its sv capabilities are). > > https://github.com/PyHDI/Pyverilog interesting. i started with python-ply because i know it is extremely good (i do a *lot* of language translation), so what they have done, is already taken care of. as i know lex syntax from my time in university i added the required sv support (which was primarily the ability to use types in module interface declarations) very quickly. > Why are we converting verilog into nMigen? because the code being targetted for conversion (the ariane project) requires significant modification, and, more than that, absolutely nobody in the software libre world - because it is specifically and critically dependent on a *proprietary* verilog toolchain - can use it.
I have tried pyverilog. It seems to work for the Xilinx dialect, but it seems to fail for the SystemVerilog dialect that ariane uses.
(In reply to Tobias Platen from comment #18) > I have tried pyverilog. It seems to work for the Xilinx dialect, but it > seems to fail for the SystemVerilog dialect that ariane uses. yes. it is a mentor graphics augmented nonstandard SV that allows structs in the module parameters. i had to modify the parser to get it to work. actually it is incredibly sensible what they did, otherwise module declarations can have hundreds of parameters, which is extremely tedious and errorprone.
Tobias: we need to do FP Exception Flags and rounding. however it's sufficiently complex, after looking at various implementations, that i think it's probably best if we use sv2nmigen on Hardfloat-1.zip http://www.jhauser.us/arithmetic/HardFloat.html can you take a look at HardFloat_rawFN.v and add support for "parameters"? i found that i had to modify HardFloat_rawFN.v as follows: module recFNToRawFN # (par note the extra space in between recFNtoRawFn and # and ( also i had to remove the "includes" (because i believe they're pre-processed) and i am not sure about support for "`define". can you take a look at that and we'll assign a new bugreport under here plus some budget for it? this will be a lot more reliable than trying to write an exception/rounder from scratch.
actually the top priority is mulRecFN, however it needs some of the other macros / functions to work. see mulRecFN.v - at the link in comment #20, no there is no online git repo, there is only a .zip file.
another couple of pieces of code which need "parameterisation": https://ascslab.org/research/briscv/index.html L1cache.v and Lxcache.v - the cache-coherence protocol used there looks particularly good, and it would be nice to be able to see this code in nmigen