Some testing was done with verilator's compile flags to measure if there is any performance left on the table without modifying verilator's source code. These flags were applied as "CFLAGS" in addition to "-O3" which was already applied. `-mcpu=native -mtune=native` These flags were applied as verilator flags as well `--threads-dpi all --x-initial fast` Tests were made with `THREADS = 3` (other numbers showed worse results, thanks Andrey) and produced the following results: `verilator_trace` branch: 83m17s to boot linux-microwatt-5.7 (to the login prompt) `optimizations` branch: 83m53s to boot linux-microwatt-5.7 (to the login prompt) Both tests were done on an Acer Swift 5 with an Intel i7 1260P (performance cpupower setting). Conclusion: there is no measurable benefit to using any of these flags, and reading through verilator's manpage; it seems there are no other flags that can affect performance. We need to modify verilator's source code to have any speed ups.
Relevant IRC chat logs: - Verilator is using 31% of resources for thread synchronization, this leaves a lot of performance on the table https://libre-soc.org/irclog/%23libre-soc.2023-09-06.log.html#t2023-09-06T20:50:31 - Test results showing no performance benefit https://libre-soc.org/irclog/%23libre-soc.2023-09-09.log.html#t2023-09-09T16:18:34 - Andrey recommending `THREADS = 3` https://libre-soc.org/irclog/%23libre-soc.2023-08-17.log.html#t2023-08-17T11:01:54