* Wishbone * AXI4 * TileLink * L1.5 CCX (OpenPiton) * Banana Bus
* https://github.com/peteut/migen-axi * https://github.com/Nic30/hwtLib/tree/master/hwtLib/amba - would require a hwtLib nmigen back-end (or use the verilog back-end)
OmniXtend (basically TileLink over ethernet) see discussion at http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000278.html
https://github.com/pulp-platform/axi_rab also contains a software-managed iommu
(In reply to Jacob Lifshay from comment #2) > OmniXtend (basically TileLink over ethernet) > see discussion at > http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2018-December/000278. > html we need to track down implementations, documentation, linux kernel drivers and so on, and find if there is a stable and active community surrounding OmniXtend. actually... that's needed for everything we evaluate.
Inspired by PCIe errors due to my graphics card not being plugged in all the way, I looked at the latest version of OmniXtend (v1.0.3-draft) and noticed they fixed some of the things that bugged me about previous versions: OmniXtend now works over standard ethernet switches (rather than needing special programmable switches) -- they added the standard ethernet headers back into the spec, using a custom ethernet protocol number. This also allows a SoC's ethernet port to be shared between TCP/IP and OmniXtend, though using the same ethernet port may not be the best idea as it might expose internal memory traffic on the network (the fastest way to leak sensitive information, other than someone posting a picture of their pile of tax papers on facebook). They also added flow control and retransmission. Apparently, the spec also moved to being on ChipsAlliance's GitHub organization: https://github.com/chipsalliance/omnixtend
when multiple reference implementatiobs are available it will save us a huge amount of time and help us to ensure interoperability. until then, unfortunately, the cost is i feel too high. it's a brilliant idea, not to be ruled out entirely: we may even need to span across multiple FPGAs and ethernet is one of the easiest ways to do that.
(In reply to Luke Kenneth Casson Leighton from comment #6) > when multiple reference implementatiobs are available it will save us a huge > amount of time and help us to ensure interoperability. Ok, sounds good. > > until then, unfortunately, the cost is i feel too high. it's a brilliant > idea, not to be ruled out entirely: we may even need to span across multiple > FPGAs and ethernet is one of the easiest ways to do that. From reading the spec, tilelink seems to be a relatively simple (20-30 states) state machine along with a 64-bit Add/CompareEq/Min/Max/MinU/MaxU/And/Or/Xor ALU for handling AMOs. I would be surprised if TileLink needed more than 2-3k gates. If all the upstream interfaces handled AMOs themselves, the ALU wouldn't be needed, however, I think it's a good idea to have the ALU even if we don't end up using TileLink at all. The tilelink state machine: https://github.com/chipsalliance/omnixtend/blob/master/OmniXtend-1.0.3/spec/StateTransitionTables-1.8.0.pdf To implement OmniXtend, you'll need receive and retransmit buffers, both buffers need to be at least 1 ethernet frame, but making them larger will increase maximum throughput. The retransmit buffer can be single ported, but the receive buffer should have a separate read and write port. So, for a 1Gbps link with 30us round-trip-time, you would need about 4kB for each buffer to fully saturate the link. That is about the same size of buffers needed to implement a CPU-controlled ethernet interface anyway, so it doesn't seem too expensive, especially considering that it would only need to run at less than 20MHz if it has a 64-bit datapath.
(In reply to Jacob Lifshay from comment #7) > From reading the spec, tilelink seems to be a relatively simple (20-30 > states) state machine along with a 64-bit > Add/CompareEq/Min/Max/MinU/MaxU/And/Or/Xor ALU for handling AMOs. I would be > surprised if TileLink needed more than 2-3k gates. Turns out CompareEq is not supported -- I had forgotten that RISC-V doesn't have a compare-exchange operation.
one interesting thing to investigate: can omnixtend run over wireguard? From my initial research, ChaCha20 (one of the ciphers used) is implemented as a bunch of binary adds, bitwise xors, and rotates, which seem quite easy to implement in hardware assuming we only provide timing-attack resistance and *NOT* power-attack resistance. The idea is that it would be resistant to attack over the network and would use a well-tested protocol where we can use Linux's network stack for testing purposes.
if we implemented omnixtend over wireguard, we would only need to implement data packets in HW, relying on linux or some microcontroller to handle connection keepalive, setup, teardown, etc.
discovered that there are already quite a few protocols that are much more widely used than omnixtend that support cache coherent memory access over a network: google "rdma cache coherent"
(In reply to Jacob Lifshay from comment #11) > discovered that there are already quite a few protocols that are much more > widely used than omnixtend that support cache coherent memory access over a > network: google "rdma cache coherent" adding "wishbone" to that and opensparc T1 comes up https://www.oracle.com/technetwork/systems/opensparc/opensparc-internals-book-1500271.pdf
(In reply to Luke Kenneth Casson Leighton from comment #12) > (In reply to Jacob Lifshay from comment #11) > > discovered that there are already quite a few protocols that are much more > > widely used than omnixtend that support cache coherent memory access over a > > network: google "rdma cache coherent" > > adding "wishbone" to that and opensparc T1 comes up > https://www.oracle.com/technetwork/systems/opensparc/opensparc-internals-book-1500271.pdf neat! note that wishbone is not designed to run over ethernet or similar, unlike most other rdma protocols.
Why was this closed? As far as I know, we didn't decide if we were going to implement OmniXtend (or similar cache-coherency protocols over ethernet) or not for the 28nm SoC. Additionally, we didn't decide what cache coherent protocol to use for inter-core communication, since wishbone is not sufficient by itself. Deferring till after 180nm SoC.
There is the additional concern that we shouldn't use a protocol between cores that exposes speculative operations, in order to avoid spectre-style information leaks that can't be fixed in software without disabling all but one core.
https://github.com/SpinalHDL/SaxonSoc Banana Bus - appears to be extremely well-designed and suitable for out-of-order processors.
(In reply to Luke Kenneth Casson Leighton from comment #16) > https://github.com/SpinalHDL/SaxonSoc > > Banana Bus - appears to be extremely well-designed and suitable > for out-of-order processors. Will we translate this line by line into nmigen as we are doing with microwatt? Is this for Oct 2020 or 2021 or 2022?
(In reply to Cole Poirier from comment #17) > Will we translate this line by line into nmigen as we are doing with > microwatt? Is this for Oct 2020 or 2021 or 2022? don't know yet. it will depend on how far we get.
(In reply to Luke Kenneth Casson Leighton from comment #18) > (In reply to Cole Poirier from comment #17) > > > Will we translate this line by line into nmigen as we are doing with > > microwatt? Is this for Oct 2020 or 2021 or 2022? > > don't know yet. it will depend on how far we get. Whenever it is that we end up getting to this, will we be using it like microwatt converting the HDL line by line?