70 – evaluate Bus Architectures

Bug 70 - evaluate Bus Architectures

Summary: evaluate Bus Architectures

Status:	DEFERRED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	191
	Show dependency tree / graph

Reported:	2019-04-21 15:08 BST by Luke Kenneth Casson Leighton
Modified:	2022-07-16 15:32 BST (History)
CC List:	3 users (show)

See Also:	410
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2019-04-21 15:08:10 BST

* Wishbone
* AXI4
* TileLink
* L1.5 CCX (OpenPiton)
* Banana Bus

Comment 1 Luke Kenneth Casson Leighton 2019-04-21 15:13:58 BST

* https://github.com/peteut/migen-axi
* https://github.com/Nic30/hwtLib/tree/master/hwtLib/amba - would require a hwtLib nmigen back-end (or use the verilog back-end)

Comment 2 Jacob Lifshay 2019-04-21 19:29:59 BST

OmniXtend (basically TileLink over ethernet)
see discussion at http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000278.html

Comment 3 Luke Kenneth Casson Leighton 2019-04-21 21:28:43 BST

https://github.com/pulp-platform/axi_rab

also contains a software-managed iommu

Comment 4 Luke Kenneth Casson Leighton 2019-04-21 21:51:23 BST

(In reply to Jacob Lifshay from comment #2)
> OmniXtend (basically TileLink over ethernet)
> see discussion at
> http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000278.
> html

we need to track down implementations, documentation, linux kernel drivers
and so on, and find if there is a stable and active community surrounding
OmniXtend.

actually... that's needed for everything we evaluate.

Comment 5 Jacob Lifshay 2020-01-08 10:22:07 GMT

Inspired by PCIe errors due to my graphics card not being plugged in all the way, I looked at the latest version of OmniXtend (v1.0.3-draft) and noticed they fixed some of the things that bugged me about previous versions:

OmniXtend now works over standard ethernet switches (rather than needing special programmable switches) -- they added the standard ethernet headers back into the spec, using a custom ethernet protocol number.
This also allows a SoC's ethernet port to be shared between TCP/IP and OmniXtend, though using the same ethernet port may not be the best idea as it might expose internal memory traffic on the network (the fastest way to leak sensitive information, other than someone posting a picture of their pile of tax papers on facebook).

They also added flow control and retransmission.

Apparently, the spec also moved to being on ChipsAlliance's GitHub organization:
https://github.com/chipsalliance/omnixtend

Comment 6 Luke Kenneth Casson Leighton 2020-01-09 02:11:36 GMT

when multiple reference implementatiobs are available it will save us a huge amount of time and help us to ensure interoperability.

until then, unfortunately, the cost is i feel too high.  it's a brilliant idea, not to be ruled out entirely: we may even need to span across multiple FPGAs and ethernet is one of the easiest ways to do that.

Comment 7 Jacob Lifshay 2020-01-09 13:03:26 GMT

(In reply to Luke Kenneth Casson Leighton from comment #6)
> when multiple reference implementatiobs are available it will save us a huge
> amount of time and help us to ensure interoperability.

Ok, sounds good.

> 
> until then, unfortunately, the cost is i feel too high.  it's a brilliant
> idea, not to be ruled out entirely: we may even need to span across multiple
> FPGAs and ethernet is one of the easiest ways to do that.

From reading the spec, tilelink seems to be a relatively simple (20-30 states) state machine along with a 64-bit Add/CompareEq/Min/Max/MinU/MaxU/And/Or/Xor ALU for handling AMOs. I would be surprised if TileLink needed more than 2-3k gates.

If all the upstream interfaces handled AMOs themselves, the ALU wouldn't be needed, however, I think it's a good idea to have the ALU even if we don't end up using TileLink at all.

The tilelink state machine:
https://github.com/chipsalliance/omnixtend/blob/master/OmniXtend-1.0.3/spec/StateTransitionTables-1.8.0.pdf

To implement OmniXtend, you'll need receive and retransmit buffers, both buffers need to be at least 1 ethernet frame, but making them larger will increase maximum throughput. The retransmit buffer can be single ported, but the receive buffer should have a separate read and write port.

So, for a 1Gbps link with 30us round-trip-time, you would need about 4kB for each buffer to fully saturate the link. That is about the same size of buffers needed to implement a CPU-controlled ethernet interface anyway, so it doesn't seem too expensive, especially considering that it would only need to run at less than 20MHz if it has a 64-bit datapath.

Comment 8 Jacob Lifshay 2020-01-09 13:21:14 GMT

(In reply to Jacob Lifshay from comment #7)
> From reading the spec, tilelink seems to be a relatively simple (20-30
> states) state machine along with a 64-bit
> Add/CompareEq/Min/Max/MinU/MaxU/And/Or/Xor ALU for handling AMOs. I would be
> surprised if TileLink needed more than 2-3k gates.

Turns out CompareEq is not supported -- I had forgotten that RISC-V doesn't have a compare-exchange operation.

Comment 9 Jacob Lifshay 2020-05-11 22:19:17 BST

one interesting thing to investigate: can omnixtend run over wireguard? From my initial research, ChaCha20 (one of the ciphers used) is implemented as a bunch of binary adds, bitwise xors, and rotates, which seem quite easy to implement in hardware assuming we only provide timing-attack resistance and *NOT* power-attack resistance. The idea is that it would be resistant to attack over the network and would use a well-tested protocol where we can use Linux's network stack for testing purposes.

Comment 10 Jacob Lifshay 2020-05-11 22:20:50 BST

if we implemented omnixtend over wireguard, we would only need to implement data packets in HW, relying on linux or some microcontroller to handle connection keepalive, setup, teardown, etc.

Comment 11 Jacob Lifshay 2020-05-20 07:47:33 BST

discovered that there are already quite a few protocols that are much more widely used than omnixtend that support cache coherent memory access over a network: google "rdma cache coherent"

Comment 12 Luke Kenneth Casson Leighton 2020-05-20 12:35:07 BST

(In reply to Jacob Lifshay from comment #11)
> discovered that there are already quite a few protocols that are much more
> widely used than omnixtend that support cache coherent memory access over a
> network: google "rdma cache coherent"

adding "wishbone" to that and opensparc T1 comes up
https://www.oracle.com/technetwork/systems/opensparc/opensparc-internals-book-1500271.pdf

Comment 13 Jacob Lifshay 2020-05-20 18:49:48 BST

(In reply to Luke Kenneth Casson Leighton from comment #12)
> (In reply to Jacob Lifshay from comment #11)
> > discovered that there are already quite a few protocols that are much more
> > widely used than omnixtend that support cache coherent memory access over a
> > network: google "rdma cache coherent"
> 
> adding "wishbone" to that and opensparc T1 comes up
> https://www.oracle.com/technetwork/systems/opensparc/opensparc-internals-book-1500271.pdf

neat! note that wishbone is not designed to run over ethernet or similar, unlike most other rdma protocols.

Comment 14 Jacob Lifshay 2020-07-01 06:58:40 BST

Why was this closed? As far as I know, we didn't decide if we were going to implement OmniXtend (or similar cache-coherency protocols over ethernet) or not for the 28nm SoC. Additionally, we didn't decide what cache coherent protocol to use for inter-core communication, since wishbone is not sufficient by itself.

Deferring till after 180nm SoC.

Comment 15 Jacob Lifshay 2020-07-01 07:01:35 BST

There is the additional concern that we shouldn't use a protocol between cores that exposes speculative operations, in order to avoid spectre-style information leaks that can't be fixed in software without disabling all but one core.

Comment 16 Luke Kenneth Casson Leighton 2020-07-27 16:03:27 BST

https://github.com/SpinalHDL/SaxonSoc

Banana Bus - appears to be extremely well-designed and suitable
for out-of-order processors.

Comment 17 Cole Poirier 2020-07-28 21:28:51 BST

(In reply to Luke Kenneth Casson Leighton from comment #16)
> https://github.com/SpinalHDL/SaxonSoc
> 
> Banana Bus - appears to be extremely well-designed and suitable
> for out-of-order processors.

Will we translate this line by line into nmigen as we are doing with microwatt? Is this for Oct 2020 or 2021 or 2022?

Comment 18 Luke Kenneth Casson Leighton 2020-07-28 22:40:16 BST

(In reply to Cole Poirier from comment #17)

> Will we translate this line by line into nmigen as we are doing with
> microwatt? Is this for Oct 2020 or 2021 or 2022?

don't know yet.  it will depend on how far we get.

Comment 19 Cole Poirier 2020-07-29 18:34:17 BST

(In reply to Luke Kenneth Casson Leighton from comment #18)
> (In reply to Cole Poirier from comment #17)
> 
> > Will we translate this line by line into nmigen as we are doing with
> > microwatt? Is this for Oct 2020 or 2021 or 2022?
> 
> don't know yet.  it will depend on how far we get.

Whenever it is that we end up getting to this, will we be using it like microwatt converting the HDL line by line?