Bug 1143 - Optimization of verilator for scalability
Summary: Optimization of verilator for scalability
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2023-08-23 11:58 BST by Konstantinos Margaritis (markos)
Modified: 2023-09-10 16:57 BST (History)
1 user (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantinos Margaritis (markos) 2023-08-23 11:58:01 BST
Currently verilator performance does not scale with multiple threads due to its internal queue model and its heavy use of mutex objects to lock the queue. Because of that, simulation performance does not take advantage of CPUs with many cores. After some initial profiling, I have found that most of the CPU time is spent in the internal queue:

 41.71%  microwatt-verilator  [.] VlMTaskVertex::waitUntilUpstreamDone
 32.97%  microwatt-verilator  [.] VlWorkerThread::dequeWork
  8.72%  microwatt-verilator  [.] VlMTaskVertex::signalUpstreamDone

So about 84% of CPU time is spent on synchronization between threads. This is a huge waste of CPU time and definitely something that can be fixed.

I believe that replacement of the internal queue with a lockless thread-safe queue will increase performance by at least an order of magnitude. I have done this in the past in very demanding realtime applications and performance was greatly improved many times.

The plan is to also submit this work upstream to benefit the verilator project overall.
I believe that a budget between 7-10k EUR would suffice for this kind of work. It goes without saying that it will be heavily tested before submission.