Bug 70 - evaluate Bus Architectures
Summary: evaluate Bus Architectures
Status: DEFERRED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2019-04-21 15:08 BST by Luke Kenneth Casson Leighton
Modified: 2020-07-01 07:01 BST (History)
2 users (show)

See Also:
NLnet milestone: NLnet.2019.02
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for completion of task (excludes budget allocated to subtasks): 750
parent task for budget allocation:
child tasks for budget allocation:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2019-04-21 15:08:10 BST
* Wishbone
* AXI4
* TileLink
* L1.5 CCX (OpenPiton)
Comment 1 Luke Kenneth Casson Leighton 2019-04-21 15:13:58 BST
* https://github.com/peteut/migen-axi
* https://github.com/Nic30/hwtLib/tree/master/hwtLib/amba - would require a hwtLib nmigen back-end (or use the verilog back-end)
Comment 2 Jacob Lifshay 2019-04-21 19:29:59 BST
OmniXtend (basically TileLink over ethernet)
see discussion at http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000278.html
Comment 3 Luke Kenneth Casson Leighton 2019-04-21 21:28:43 BST
https://github.com/pulp-platform/axi_rab

also contains a software-managed iommu
Comment 4 Luke Kenneth Casson Leighton 2019-04-21 21:51:23 BST
(In reply to Jacob Lifshay from comment #2)
> OmniXtend (basically TileLink over ethernet)
> see discussion at
> http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000278.
> html

we need to track down implementations, documentation, linux kernel drivers
and so on, and find if there is a stable and active community surrounding
OmniXtend.

actually... that's needed for everything we evaluate.
Comment 5 Jacob Lifshay 2020-01-08 10:22:07 GMT
Inspired by PCIe errors due to my graphics card not being plugged in all the way, I looked at the latest version of OmniXtend (v1.0.3-draft) and noticed they fixed some of the things that bugged me about previous versions:

OmniXtend now works over standard ethernet switches (rather than needing special programmable switches) -- they added the standard ethernet headers back into the spec, using a custom ethernet protocol number.
This also allows a SoC's ethernet port to be shared between TCP/IP and OmniXtend, though using the same ethernet port may not be the best idea as it might expose internal memory traffic on the network (the fastest way to leak sensitive information, other than someone posting a picture of their pile of tax papers on facebook).

They also added flow control and retransmission.

Apparently, the spec also moved to being on ChipsAlliance's GitHub organization:
https://github.com/chipsalliance/omnixtend
Comment 6 Luke Kenneth Casson Leighton 2020-01-09 02:11:36 GMT
when multiple reference implementatiobs are available it will save us a huge amount of time and help us to ensure interoperability.

until then, unfortunately, the cost is i feel too high.  it's a brilliant idea, not to be ruled out entirely: we may even need to span across multiple FPGAs and ethernet is one of the easiest ways to do that.
Comment 7 Jacob Lifshay 2020-01-09 13:03:26 GMT
(In reply to Luke Kenneth Casson Leighton from comment #6)
> when multiple reference implementatiobs are available it will save us a huge
> amount of time and help us to ensure interoperability.

Ok, sounds good.

> 
> until then, unfortunately, the cost is i feel too high.  it's a brilliant
> idea, not to be ruled out entirely: we may even need to span across multiple
> FPGAs and ethernet is one of the easiest ways to do that.

From reading the spec, tilelink seems to be a relatively simple (20-30 states) state machine along with a 64-bit Add/CompareEq/Min/Max/MinU/MaxU/And/Or/Xor ALU for handling AMOs. I would be surprised if TileLink needed more than 2-3k gates.

If all the upstream interfaces handled AMOs themselves, the ALU wouldn't be needed, however, I think it's a good idea to have the ALU even if we don't end up using TileLink at all.

The tilelink state machine:
https://github.com/chipsalliance/omnixtend/blob/master/OmniXtend-1.0.3/spec/StateTransitionTables-1.8.0.pdf

To implement OmniXtend, you'll need receive and retransmit buffers, both buffers need to be at least 1 ethernet frame, but making them larger will increase maximum throughput. The retransmit buffer can be single ported, but the receive buffer should have a separate read and write port.

So, for a 1Gbps link with 30us round-trip-time, you would need about 4kB for each buffer to fully saturate the link. That is about the same size of buffers needed to implement a CPU-controlled ethernet interface anyway, so it doesn't seem too expensive, especially considering that it would only need to run at less than 20MHz if it has a 64-bit datapath.
Comment 8 Jacob Lifshay 2020-01-09 13:21:14 GMT
(In reply to Jacob Lifshay from comment #7)
> From reading the spec, tilelink seems to be a relatively simple (20-30
> states) state machine along with a 64-bit
> Add/CompareEq/Min/Max/MinU/MaxU/And/Or/Xor ALU for handling AMOs. I would be
> surprised if TileLink needed more than 2-3k gates.

Turns out CompareEq is not supported -- I had forgotten that RISC-V doesn't have a compare-exchange operation.
Comment 9 Jacob Lifshay 2020-05-11 22:19:17 BST
one interesting thing to investigate: can omnixtend run over wireguard? From my initial research, ChaCha20 (one of the ciphers used) is implemented as a bunch of binary adds, bitwise xors, and rotates, which seem quite easy to implement in hardware assuming we only provide timing-attack resistance and *NOT* power-attack resistance. The idea is that it would be resistant to attack over the network and would use a well-tested protocol where we can use Linux's network stack for testing purposes.
Comment 10 Jacob Lifshay 2020-05-11 22:20:50 BST
if we implemented omnixtend over wireguard, we would only need to implement data packets in HW, relying on linux or some microcontroller to handle connection keepalive, setup, teardown, etc.
Comment 11 Jacob Lifshay 2020-05-20 07:47:33 BST
discovered that there are already quite a few protocols that are much more widely used than omnixtend that support cache coherent memory access over a network: google "rdma cache coherent"
Comment 12 Luke Kenneth Casson Leighton 2020-05-20 12:35:07 BST
(In reply to Jacob Lifshay from comment #11)
> discovered that there are already quite a few protocols that are much more
> widely used than omnixtend that support cache coherent memory access over a
> network: google "rdma cache coherent"

adding "wishbone" to that and opensparc T1 comes up
https://www.oracle.com/technetwork/systems/opensparc/opensparc-internals-book-1500271.pdf
Comment 13 Jacob Lifshay 2020-05-20 18:49:48 BST
(In reply to Luke Kenneth Casson Leighton from comment #12)
> (In reply to Jacob Lifshay from comment #11)
> > discovered that there are already quite a few protocols that are much more
> > widely used than omnixtend that support cache coherent memory access over a
> > network: google "rdma cache coherent"
> 
> adding "wishbone" to that and opensparc T1 comes up
> https://www.oracle.com/technetwork/systems/opensparc/opensparc-internals-book-1500271.pdf

neat! note that wishbone is not designed to run over ethernet or similar, unlike most other rdma protocols.
Comment 14 Jacob Lifshay 2020-07-01 06:58:40 BST
Why was this closed? As far as I know, we didn't decide if we were going to implement OmniXtend (or similar cache-coherency protocols over ethernet) or not for the 28nm SoC. Additionally, we didn't decide what cache coherent protocol to use for inter-core communication, since wishbone is not sufficient by itself.

Deferring till after 180nm SoC.
Comment 15 Jacob Lifshay 2020-07-01 07:01:35 BST
There is the additional concern that we shouldn't use a protocol between cores that exposes speculative operations, in order to avoid spectre-style information leaks that can't be fixed in software without disabling all but one core.