Bug 762 - Peripheral Pin Muxing Development
Summary: Peripheral Pin Muxing Development
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Andrey Miroshnikov
URL:
Depends on:
Blocks: 50 846
  Show dependency treegraph
 
Reported: 2022-01-13 00:52 GMT by Andrey Miroshnikov
Modified: 2022-06-20 10:35 BST (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 1600
budget (EUR) for this task, excluding subtasks' budget: 1600
parent task for budget allocation: 50
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
[andrey] amount = 1500 submitted = 2022-06-10 paid = 2022-06-14 [lkcl] amount = 100 submitted = 2022-06-16


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Miroshnikov 2022-01-13 00:52:50 GMT
As per comment #83 , I'm creating a separate bug for the pinmux development.

Current task has been to extend the simple GPIO block found here:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/bus/simple_gpio.py;hb=HEAD

And my result so far:
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/simple_gpio.py;h=251995dcf74a9f77ae8ebe6eda72ddc9adbd4dea;hb=074aced79c300b2397d2f4504c41e2ddfe9ba7ed

My rough documentation on the GPIO block can be found here:
https://libre-soc.org/docs/pinmux/temp_pinmux_info/

Please have a look at the GPIO section of the page, and let me know if anything needs to be changed/reworked.

As for budget for this task, I have no idea.
Comment 1 Andrey Miroshnikov 2022-01-13 01:28:30 GMT
Bug for the documentation task: bug #764
Comment 2 Andrey Miroshnikov 2022-01-15 22:45:10 GMT
After conversation with Luke on IRC yesterday (https://libre-soc.org/irclog/%23libre-soc.2022-01-14.log.html), came up with the following improvements:

- Get rid of multiple addresses for CSR, output, and input access. Instead reading input or writing output will happen when reading/writing the GPIO configuration. -- IMPLEMENTED
- Condense the configuration word from 16-bits down to 8-bits: 
bank2 bank1 bank0 InOutBit | PD PU IE OE -- IMPLEMENTED
- Fit multiple GPIO configurations in a single WB transaction (depends on the width of the WB bus) -- NOT DONE YET
- Add a multiplexer which switches over GPIO configuration control from WB bus to other peripherals when bank_select is changed. -- NOT DONE YET

This is the current code:
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/simple_gpio.py;hb=714f30b9eb0f49f68faedb111a5bdc22b7e7b610

The sim is working, however there are no asserts yet, and I'm working some test pattern function to exercise more of the block (instead of setting everything to out etc.).

Also will update the documentation describing the GPIO block (still same link as in comment #0).
Comment 3 Luke Kenneth Casson Leighton 2022-01-15 22:59:49 GMT
 107        comb += wb_rd_data.eq((gpio_oe_list[gpio_addr] << OESHIFT)
 108                           + (gpio_ie_list[gpio_addr] << IESHIFT)
 109                          + (puen_list[gpio_addr] << PUSHIFT)
 110                          + (pden_list[gpio_addr] << PDSHIFT)
 111                          + (gpio_i_list[gpio_addr] << IOSHIFT)
 112                          + (bank_sel[gpio_addr] << BANKSHIFT))

this is what Record is for.  one line.
Comment 4 Andrey Miroshnikov 2022-01-16 11:29:36 GMT
(In reply to Luke Kenneth Casson Leighton from comment #3)
>  107        comb += wb_rd_data.eq((gpio_oe_list[gpio_addr] << OESHIFT)
>  108                           + (gpio_ie_list[gpio_addr] << IESHIFT)
>  109                          + (puen_list[gpio_addr] << PUSHIFT)
>  110                          + (pden_list[gpio_addr] << PDSHIFT)
>  111                          + (gpio_i_list[gpio_addr] << IOSHIFT)
>  112                          + (bank_sel[gpio_addr] << BANKSHIFT))
> 
> this is what Record is for.  one line.

Sure. I looked at Robert's tutorial for Records (https://github.com/RobertBaruch/nmigen-tutorial/blob/master/6_combining.md), and created a Layout and Record:

class CSRLayout(Layout):
    def __init__(self):
        super().__init__([
            ("oe", unsigned(1)),
            ("ie", unsigned(1)),
            ("puen", unsigned(1)),
            ("pden", unsigned(1)),
            ("io", unsigned(1)),
            ("bank_sel", unsigned(NUMBANKBITS))
        ])

class CSRBus(Record):
    def __init__(self):
        super().__init__(CSRLayout)


In the SimpleGPIO module's __init__() method, I added:
self.csrbus = CSRBus()

During the object creation however, I'm getting a TypeError:
Traceback (most recent call last):
  File "simple_gpio.py", line 276, in <module>
    test_gpio()
  File "simple_gpio.py", line 258, in test_gpio
    dut = SimpleGPIO()
  File "simple_gpio.py", line 68, in __init__
    self.csrbus = CSRBus()
  File "simple_gpio.py", line 49, in __init__
    super().__init__(CSRLayout)
  File "/home/rohdo/src/nmigen/nmigen/hdl/rec.py", line 127, in __init__
    self.layout = Layout.cast(layout, src_loc_at=1 + src_loc_at)
  File "/home/rohdo/src/nmigen/nmigen/hdl/rec.py", line 25, in cast
    return Layout(obj, src_loc_at=1 + src_loc_at)
  File "/home/rohdo/src/nmigen/nmigen/hdl/rec.py", line 29, in __init__
    for field in fields:
TypeError: 'type' object is not iterable

Can you clarify what I'm missing here?

My current commit:
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/simple_gpio.py;hb=946fbf1fd881418a4061d9dab14dbe0110760f83#l36
Comment 5 Luke Kenneth Casson Leighton 2022-01-16 12:16:15 GMT
(In reply to Andrey Miroshnikov from comment #4)

> Sure. I looked at Robert's tutorial for Records
> (https://github.com/RobertBaruch/nmigen-tutorial/blob/master/6_combining.md),
> and created a Layout and Record:


csr_layout = [("oe", 1),
...
         ])
 
class CSRBus(Record):
     def __init__(self):
         super().__init__(layout=csr_layout)
 

there are dozens of examples of uses of Record in the libre-soc code.
Comment 6 Andrey Miroshnikov 2022-01-20 20:56:03 GMT
Yesterday I updated the GPIO module and input/output work now.
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/simple_gpio.py;h=aae041eb3e223bc34e74644072cad1c668270ed9;hb=08668544c65f4892867ce4bf73ed8b754c2409a3

Currently I'm looking at fitting in multiple GPIO configs in one WB data word.
This will be done by having a parametrisable for-loop to attach some of the gpios to particular bytes of the WB data bus.

Also, now that I have a better understanding of layouts, I'll make a new layout that encapsulates all parameters for one GPIO. Currently I have several signal arrays that represent GPIO configs (n-bit array for oe, ie, o, i, etc.).

The new layout would keep all the gpio config under one umbrella so-to-speak, and would allow me to do away with the current "csrbus" layout.

gpio_layout = (("oe", 1),
               ("ie", 1),
               ("puen", 1),
               ("pden", 1),
               ("o", 1),
               ("bank_sel", NUMBANKBITS),
               ("i", 1),
              )

This layout can then be created for each GPIO.
self.gpios = Array([Record(gpio_layout) for _ in range(n_gpio)])

The gpios configuration can then be generated using a for-loop and address (var names temporary, will change to something more appropriate):
for cur_byte in range(WORDSIZE):
    gpios[bus.adr*WORDSIZE+cur_byte].eq(wb_data[cur_byte*8:(8+cur_byte*8)])

This wouldn't quite work as the gpio layout has an additional "i" signal, so I'll probably have to keep this one separate. Will need some more thinking on that.

Edit: Forgot to add subdivision to wb_data
Comment 7 Luke Kenneth Casson Leighton 2022-01-20 21:25:18 GMT
(In reply to Andrey Miroshnikov from comment #6)

> This wouldn't quite work as the gpio layout has an additional "i" signal, so
> I'll probably have to keep this one separate. Will need some more thinking
> on that.

yyeah, although the format of the data as far as its memory-mapping
is concerned the csrbus layout looks perfect.

however you will need *another* layout which is the actual
wires-as-connected-directly-to-the-IOPad-Cell.  that one would
not have bank_sel, but would be more like:

         padlayout = (("oe", 1),
                      ("i", 1),
                      ("o", 1),
                      ("puen", 1),
                      ("pden", 1),
                    )

which aside from the pullup/down resistors should be looking remarkably
familiar.

will we want the exact same types as in jtag.tap.IOTypes? i don't
know.  maybe.  will we want puen/pden to be optional? probably,
but not right away, i suggest just keeping it straightforward and
simple, add options later
Comment 8 Andrey Miroshnikov 2022-01-20 22:53:03 GMT
(In reply to Luke Kenneth Casson Leighton from comment #7)
> (In reply to Andrey Miroshnikov from comment #6)
> 
> > This wouldn't quite work as the gpio layout has an additional "i" signal, so
> > I'll probably have to keep this one separate. Will need some more thinking
> > on that.
> 
> yyeah, although the format of the data as far as its memory-mapping
> is concerned the csrbus layout looks perfect.
At the moment csrbus only connects the first byte of the data word. Is this how you expect it to be? Should I extend csrbus to store the full word (several gpio configs)?

> 
> however you will need *another* layout which is the actual
> wires-as-connected-directly-to-the-IOPad-Cell.  that one would
> not have bank_sel, but would be more like:
> 
>          padlayout = (("oe", 1),
>                       ("i", 1),
>                       ("o", 1),
>                       ("puen", 1),
>                       ("pden", 1),
>                     )
What about ie? IOPad assume the GPIO is input if oe is low?

> will we want the exact same types as in jtag.tap.IOTypes? i don't
> know.  maybe.  will we want puen/pden to be optional? probably,
What do you mean by having the same types as IOTypes?
Here's what I saw in debug/jtag.py:
iotypes = {'-': IOType.In,
           '+': IOType.Out,
           '>': IOType.TriOut,
           '*': IOType.InTriOut,
          }
 
scanlens = {IOType.In: 1,
            IOType.Out: 1,
            IOType.TriOut: 2,
            IOType.InTriOut: 3,
           }

Do you mean the layout definition needs to have the '-'/'+'/etc. strings for each signal? Like ("oe", 1, '+')

> but not right away, i suggest just keeping it straightforward and
> simple, add options later
Sure, I'll focus on what you mentioned already, plenty to do as is hahaha
Comment 9 Luke Kenneth Casson Leighton 2022-01-21 00:32:26 GMT
(In reply to Andrey Miroshnikov from comment #8)

> At the moment csrbus only connects the first byte of the data word. Is this
> how you expect it to be?

well, there are two schools of thought on that one.

1) each instance is one and only one IOpad.  you therefore have to
instantiate multiple GPIO instances

2) each instance deals with N IOpads, and has N as an argument, specifying
   the length of the array.

> Should I extend csrbus to store the full word
> (several gpio configs)?

if you go for option (2) then yes, of course.

option (1) would be very wasteful of IO memory address space because each
GPIO aka IOpad would need to be allocated to an entire Wishbone-wide line.
or some form of complicated Arbiter/Mapper would be needed
[which is actualy part of nmigen-soc but let's leave that alone for now,
because it's completely undocumented and almost impossible to understand
except by its original authors]

> > 
> > however you will need *another* layout which is the actual
> > wires-as-connected-directly-to-the-IOPad-Cell.  that one would
> > not have bank_sel, but would be more like:
> > 
> >          padlayout = (("oe", 1),
> >                       ("i", 1),
> >                       ("o", 1),
> >                       ("puen", 1),
> >                       ("pden", 1),
> >                     )
> What about ie? IOPad assume the GPIO is input if oe is low?

Chips4Makers IOpads, which is what we are targetting, do not have
an "input enable" flag on the public API.

however this is not a "deterrent" or an actual "problem": we will
need a "mapping" system anyway, if you recall.  but, for now, let's
*provide* an ie on the csrbus but *ignore* it as far as connecting
to padlayout is concerned.

sort that out later. get up and running first and increment later
 
> What do you mean by having the same types as IOTypes?

same [abstract / conceptual] object.  what Signals does a c4mjtag
Record of type "IOType.InTriOut" have?

> Do you mean the layout definition needs to have the '-'/'+'/etc. strings for
> each signal? Like ("oe", 1, '+')

you can work out the answer easily to that for yourself by examining
the source code.

how can the layout definition, Layout, understand that?  where in the
source code is there any evidence of the existence of an import
"from c4mjag import..." in nmigen?  that would create a recursive
import, wouldn't it?

> > but not right away, i suggest just keeping it straightforward and
> > simple, add options later
> Sure, I'll focus on what you mentioned already, plenty to do as is hahaha

:)
Comment 10 Andrey Miroshnikov 2022-01-24 12:28:40 GMT
(In reply to Luke Kenneth Casson Leighton from comment #9)
> if you go for option (2) then yes, of course.
Went for that option.

> option (1) would be very wasteful of IO memory address space because each
Noted, just thought this through while making the new diagrams.

> Chips4Makers IOpads, which is what we are targetting, do not have
> an "input enable" flag on the public API.
> 
> however this is not a "deterrent" or an actual "problem": we will
> need a "mapping" system anyway, if you recall.  but, for now, let's
> *provide* an ie on the csrbus but *ignore* it as far as connecting
> to padlayout is concerned.
Sure, makes sense.

> > What do you mean by having the same types as IOTypes?
> you can work out the answer easily to that for yourself by examining
> the source code.
Sure, not important at the moment so I'll look into it later.


Now, onto some important updates:
From discussion on bug #764, you mentioned wanting a diagram to try to understand the GPIO/JTAG topology.
Please see the proposal section 4.5.1 on the pinmux page (https://libre-soc.org/docs/pinmux/)

If this makes sense, I'll carry on with adding the relevant controls to the GPIO block.
Comment 11 Luke Kenneth Casson Leighton 2022-01-24 13:46:59 GMT
briefly, i am very busy, a lot going on

https://libre-soc.org/pinmux/muxjtag.png

at the top is a circle where the bank sel has to be "intercepted"
in EXACTLY the same way that I,O,OE are "intercepted" with C4M JTAG
Tap.add_io() function.

this will need a sub-class of C4M JTAG TAP class, actually probably
better just to add it directly https://git.libre-soc.org/?p=c4m-jtag.git;a=summary

it will mean adding the Record instead of IOConn, adding one named IOBank
or something like that.

at the same time we might as well add the Pull-up and Pull-down settings
to it.

or, another way: *extend* IOConn with extra parameters to Tap.add_io()
which add the extra parameters (bank_sel_width, pullup/down) to
IOConn.

i think that's probably a better way to do it rather than have a separate
shift register.
Comment 12 Luke Kenneth Casson Leighton 2022-01-24 16:38:06 GMT
https://libre-soc.org/pinmux/muxjtag2.png

nope, this one.

* JTAG can be removed and it would be as if nothing is changed
* JTAG can be added and if not active it would be as if it was not there
* JTAG *can* be enabled and can *FULLY* control *ALL* aspects of the
  IOPad *AND* the Muxing.
* IOConn Record will (optionally) have BankSel and PU and PD added
  for this to work
* Wishbone working on the ASIC is *NOT* required for this to work

importantly, (removing JTAG from the diagram), you can see how GPIO0
plays *TWO* roles:

1) sets banksel, pullup and pulldown
2) connects wishbone-read-writeable GPIO registers I/O/OE to Bank 0 *ONLY*

the important thing here is that if any aspect of the wishbone bus fails
at least the GPIO Muxing as well as the pullup/pulldown can be tested
over JTAG.
Comment 13 Luke Kenneth Casson Leighton 2022-01-24 17:41:40 GMT
(In reply to Luke Kenneth Casson Leighton from comment #12)


> * IOConn Record will (optionally) have BankSel and PU and PD added
>   for this to work

done.

https://git.libre-soc.org/?p=c4m-jtag.git;a=summary

TAP.add_io() now has three extra parameters:

* banksel
* pullup
* pulldown

every time you now call jtag.add_io() with banksel=3, pullup=True, pulldown=True
it will add a total of **FIVE** extra bits into the shiftregister.  three for
banksel, one for pullup, and one for pulldown.

it should be obvious that those need to be wired to the GPIO Config
as "pass-thru" stuff (core-side, pad-side) in *EXACTLY* the same way
that core/pad o/oe are wired up (because banksel, pullup and pulldown
are all outputs that go into the pad).
Comment 14 Andrey Miroshnikov 2022-03-12 13:22:56 GMT
Currently trying to figure out a connectivity problem for the 1-bit gpio pinmux block (no jtag yet).
The problem is that the top-level WB bus signals are not appearing in gtkwave trace list, and driving these signals in the sim does not affect the GPIO block internal signals.

https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/pinmux.py;h=fda23c6a7fef6b7b78f2d0b8d8edf63dfca79f0b;hb=9c35433c10860bfe85147f8f1baccd2fe3a5c915
Lines 36-41 show a top-level creation of the WB bus record.
Lines 70-80 show the connection of the top-level WB bus to the GPIO block WB bus.
I checked the direction, should be right. In the test code, I'm just setting the address and data signals (lines 203-204), however gtkwave does not show any signal transitions.
Checking yosys show top, I saw that the top-level WB signals were being connected to the GPIO block.

Have I missed something nmigen related? I haven't connected multiple blocks like this before, so I probably missed some boiler plate code hahaha

The __iter__ definition has the top-level signals included for export.
Comment 15 Luke Kenneth Casson Leighton 2022-03-12 13:27:55 GMT
(In reply to Andrey Miroshnikov from comment #14)
> Currently trying to figure out a connectivity problem for the 1-bit gpio
> pinmux block (no jtag yet).
> The problem is that the top-level WB bus signals are not appearing in
> gtkwave trace list,

yes, there are a number of possible reasons for this:
1) a bug in gtkwave, which you can compensate for by adding
   a top-level signal then assigning the whatever-bus signals
   to it
2) you've forgotten to bring it out via ports=[list,of,signals]
   in write_ilang/write_verilog
3) they're genuinely not connected and yosys has "optimised them out"

2 and 3 are sort-of related, in that if you don't explicitly tell
write_ilang()/write_verilog() what the signals are, yosys goes
"pffh you clearly didn't want that, let me just delete it for you"
Comment 16 Luke Kenneth Casson Leighton 2022-04-21 12:26:59 BST
wotcha andrey, as we discussed yesterday:

 112         with m.If(bus.cyc & bus.stb):
 113             comb += wb_ack.eq(1) # always ack

that needs to be a sync

 112         with m.If(bus.cyc & bus.stb):
 113             sync += wb_ack.eq(1) # always ack, always delayed

then what you can do here is:


 125                 for i in range(0, self.wordsize):
 126                     multi_cat.append(rd_multi[i])
 ...->               with m.If(wb_ack):
 127                     comb += wb_rd_data.eq(Cat(multi_cat))


but that would get the data a clock-cycle late, so how about:

        # One address used to configure CSR, set output, read input
        with m.If(bus.cyc & bus.stb):
            sync += wb_ack.eq(1) # always ack but on next cycle
            # Concatenate the GPIO configs that are on the same "row" or
            # address but do NOT send...
            multi_cat = []
            for i in range(0, self.wordsize):
                multi_cat.append(rd_multi[i])
            sync += wb_rd_data_reg.eq(Cat(multi_cat))
            with m.If(bus.we): # write
                ...

            # ok NOW send it! (but on the ack'd cycle)
            with m.ElIf(wb_ack): # read (and acked)
                comb += wb_rd_data.eq(wb_rd_data_reg)

btw this will look dreadful in the graphviz (and be costly):

            for byte in range(0, self.wordsize):
                sync += multi[byte].eq(wb_wr_data[byte*8:8+byte*8])

suggest doing a Cat() trick as well on that.
Comment 17 Luke Kenneth Casson Leighton 2022-05-10 15:35:08 BST
ok i did a quick video for you on how to view byte-level access,
and it points out that because the GPIO is exactly 8 bits, those
bytes map precisely and exactly to those of the wishbone bytes.

https://youtu.be/Pf6gmDQnw_4

the issue is that you don't yet fully understand wishbone "sel"
(select) lines, so haven't taken that into consideration in the
code.  the video explains "sel" lines, and explains that they
are *NOT* byte-enable-lines: the user of the wishbone bus can
in fact set a data width of 64 and a sel width of only 2, which
would be a *WORD* granularity.  sel=0b01 would be read/write to
the first 32-bits, sel=0b11 would be read/write to the TOP 32-bits
but, as i explain in the video, about 2/3-way through, this would
be Bad(tm) because each GPIO data structure is only 8-bit.

so the only wishbone configs that we should support are:

* dat_r/w = 8-bit, sel=1-bit
* dat_r/w = 16-bit, sel=2-bit
* dat_r/w = 32-bit, sel=4-bit
* dat_r/w = 64-bit, sel=8-bit

notice there that

    8<<len(sel) == len(dat_r)

therefore you can do an assert with that as a condition.
Comment 18 Luke Kenneth Casson Leighton 2022-05-10 16:30:15 BST
here is the pseudocode for reading the GPIO
data structs:

    read_bytes = []
    for i in range(len(sel)):
        GPIO_num = adr*len(sel)+i
        if sel[i]:
            read_bytes.append(GPIO[GPIO_num])
        else:
            read_bytes.append(Const(0, 8))
    if not wen:
        dat_r.eq(Cat(read_bytes))

and for writing, slightly different style:

    if wen:
        write_bytes = []
        for i in range(len(sel)):
            GPIO_num = adr*len(sel)+i
            write_byte = dat_w.bit_select(i*8, 8)
            if sel[i]:
                GPIO[GPIO_num].eq(write_byte)

that's it. that's all there is to it.
Comment 19 Jacob Lifshay 2022-05-10 18:17:05 BST
(In reply to Luke Kenneth Casson Leighton from comment #17)
> so the only wishbone configs that we should support are:
> 
> * dat_r/w = 8-bit, sel=1-bit
> * dat_r/w = 16-bit, sel=2-bit
> * dat_r/w = 32-bit, sel=4-bit
> * dat_r/w = 64-bit, sel=8-bit
> 
> notice there that
> 
>     8<<len(sel) == len(dat_r)
> 
> therefore you can do an assert with that as a condition.

I think you meant 8*len(sel) == len(dat_r), otherwise len(sel)==4 would give len(dat_r) == 128, and len(sel) == 8 would give len(dat_r) == 2048
Comment 20 Luke Kenneth Casson Leighton 2022-05-10 18:58:55 BST
(In reply to Jacob Lifshay from comment #19)

> I think you meant 8*len(sel) == len(dat_r), otherwise len(sel)==4 would give
> len(dat_r) == 128, and len(sel) == 8 would give len(dat_r) == 2048

ahh... yes!
Comment 21 Luke Kenneth Casson Leighton 2022-05-12 10:17:14 BST
andrey i've updated the gpio.py example to show how easy it is
to use Array().

https://gitlab.com/nmigen/nmigen/-/blob/master/examples/basic/gpio.py
Comment 22 Luke Kenneth Casson Leighton 2022-05-22 00:31:39 BST
https://git.libre-soc.org/?p=pinmux.git;a=commitdiff;h=56bf88cdb4ef409ec2d7bfb7621797ddce50857c

this looks good:

+                    GPIO_num = Signal(16) # fixed for now
+                    comb += GPIO_num.eq(bus.adr*len(bus.sel)+i)
+                    with m.If(bus.sel[i]):
+                        sync += rd_multi[i].oe.eq(gpio_ports[GPIO_num].oe)

one of these is either unnecessary or the wrong way round:
you don't want to be reading things that should be written
or writing things that should be read.

+                        with m.If (gpio_ports[GPIO_num].oe):
+                            sync += rd_multi[i].io.eq(gpio_ports[GPIO_num].o)
+                        with m.Else():
+                            sync += rd_multi[i].io.eq(gpio_ports[GPIO_num].i)
+                        sync += rd_multi[i].bank.eq(gpio_ports[GPIO_num].bank)

all of these DESTROY the registers.  the whole, entire purpose of sel
is to **not** overwrite or touch that which has a "sel" bit of zero.

+                    with m.Else():
+                        sync += rd_multi[i].oe.eq(0)
+                        sync += rd_multi[i].ie.eq(0)
+                        sync += rd_multi[i].puen.eq(0)
+                        sync += rd_multi[i].pden.eq(0)
+                        sync += rd_multi[i].io.eq(0)
+                        sync += rd_multi[i].bank.eq(0)

also i noticed that you've got this:

 103                 sync += Cat(*wr_multi).eq(wb_wr_data)

followed later by this:

 141                         sync += gpio_ports[GPIO_num].oe.eq(wr_multi[i].oe)

so what will happen there is:

* firstly, on one clock cycle (cyc & stb & we), the incoming wishbone
  write data will be captured into wr_multi. this will make that data
  available ONLY ON THE NEXT CYCLE

* secondly, on the **NEXT** cycle, "bus.we" will oh wait, whoops, it's
  now the next cycle, which has a totally new set of data, it could
  be a read at this point, and the data is hosed.

actually now that i look at it the exact same thing is happening with
rd_multi.

on one clock:

 113                         sync += rd_multi[i].oe.eq(gpio_ports[GPIO_num].oe)

and on the **NEXT** clock (when bus.we is completely invalid and/or unrelated):

 153                 sync += wb_rd_data.eq(Cat(*rd_multi))

basically you probably want this:

 113                         comb += rd_multi[i].oe.eq(gpio_ports[GPIO_num].oe)
 114 etc. etc.


also, you've made a classic error of using an Array unnecessarily:

  72         self.wr_multicsr = Array(temp)
  90         wr_multi = self.wr_multicsr
 103                 sync += Cat(*wr_multi).eq(wb_wr_data)

and using it with a *static* index:

 137                 for i in range(len(bus.sel)):

Array() is only necessary on a *DYNAMIC* index, like GPIO_num.

lose "Array()" on both rd_multi and wr_multi.

  62         self.rd_multicsr = rd_multi = []
  63         for i in range(self.wordsize):
  64             name = "rd_word%d" % i # please don't use format
  65             rd_multi.append(Record(name=name, layout=csrbus_layout))
Comment 23 Andrey Miroshnikov 2022-05-22 14:31:00 BST
(In reply to Luke Kenneth Casson Leighton from comment #22)
> one of these is either unnecessary or the wrong way round:
> you don't want to be reading things that should be written
> or writing things that should be read.
> 
> +                        with m.If (gpio_ports[GPIO_num].oe):
> +                            sync +=
> rd_multi[i].io.eq(gpio_ports[GPIO_num].o)
> +                        with m.Else():
> +                            sync +=
> rd_multi[i].io.eq(gpio_ports[GPIO_num].i)

It's a good time to clarify this.
The GPIO config byte sent via the WB bus has the following format:
b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0
  bank[2:0]  | io | pd | pu | ie | oe

The "io" signal either has the input or output value.
When the user wants to read, it will contain the current output value when "oe" is high, otherwise the input value.

(If this is not the behaviour that we want, let me know what we should do instead).

The GPIO port signals:
b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0
  bank[2:0]  | pd | pu | o  | oe | i

The "ie" signal is not used at the GPIO port (since we only have an i/o/oe type of IOpad), and I left it unused for now. Should "ie" be brought out to the GPIO port?

> 
> all of these DESTROY the registers.  the whole, entire purpose of sel
> is to **not** overwrite or touch that which has a "sel" bit of zero.
> 

"rd_multi" is only an intermediate signal and is not a part of GPIO port registers.
However I will remove those lines if they are unnecessary.
 
> also i noticed that you've got this:
> 
>  103                 sync += Cat(*wr_multi).eq(wb_wr_data)
> 
> followed later by this:
> 
>  141                         sync +=
> gpio_ports[GPIO_num].oe.eq(wr_multi[i].oe)
> 
> so what will happen there is:
> 
> * firstly, on one clock cycle (cyc & stb & we), the incoming wishbone
>   write data will be captured into wr_multi. this will make that data
>   available ONLY ON THE NEXT CYCLE
> 
> * secondly, on the **NEXT** cycle, "bus.we" will oh wait, whoops, it's
>   now the next cycle, which has a totally new set of data, it could
>   be a read at this point, and the data is hosed.
> 
> actually now that i look at it the exact same thing is happening with
> rd_multi.
I assumed that as long as WB ack is not asserted, I can delay the transaction by one cycle. Is this a wrong assumption? Do you expect this block to support pipelined transactions and thus the operation has be complete in one cycle?


> basically you probably want this:
> 
>  113                         comb +=
> rd_multi[i].oe.eq(gpio_ports[GPIO_num].oe)
>  114 etc. etc.

Ok, will do.

> 
> 
> also, you've made a classic error of using an Array unnecessarily:
> ...
> and using it with a *static* index:
> ...
> Array() is only necessary on a *DYNAMIC* index, like GPIO_num.
 
Ok, now it's starting to make more sense haha!

> lose "Array()" on both rd_multi and wr_multi.
> 
>   62         self.rd_multicsr = rd_multi = []
>   63         for i in range(self.wordsize):
>   64             name = "rd_word%d" % i # please don't use format
>   65             rd_multi.append(Record(name=name, layout=csrbus_layout))

What's wrong with format()?
Is it wasteful and does it reduce clarity?
Comment 24 Luke Kenneth Casson Leighton 2022-05-22 15:05:20 BST
(In reply to Andrey Miroshnikov from comment #23)
> (In reply to Luke Kenneth Casson Leighton from comment #22)
> > one of these is either unnecessary or the wrong way round:
> > you don't want to be reading things that should be written
> > or writing things that should be read.
> > 
> > +                        with m.If (gpio_ports[GPIO_num].oe):
> > +                            sync +=
> > rd_multi[i].io.eq(gpio_ports[GPIO_num].o)
> > +                        with m.Else():
> > +                            sync +=
> > rd_multi[i].io.eq(gpio_ports[GPIO_num].i)
> 
> It's a good time to clarify this.
> The GPIO config byte sent via the WB bus has the following format:
> b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0
>   bank[2:0]  | io | pd | pu | ie | oe
> 
> The "io" signal either has the input or output value.
> When the user wants to read, it will contain the current output value when
> "oe" is high, otherwise the input value.

no it's good.  with the PU/PD you can actually wiggle OE rather than IO
and get that SCL/SDA behaviour that Cesar was talking about (the
"on" or "high-impededance" thing rather than "on" and "off" which you
would get by wiggling io)

> 
> (If this is not the behaviour that we want, let me know what we should do
> instead).
> 
> The GPIO port signals:
> b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0
>   bank[2:0]  | pd | pu | o  | oe | i
> 
> The "ie" signal is not used at the GPIO port (since we only have an i/o/oe
> type of IOpad), and I left it unused for now. Should "ie" be brought out to
> the GPIO port?

no need, let's do that later when there's an actual IOpad to match it.

> > 
> > all of these DESTROY the registers.  the whole, entire purpose of sel
> > is to **not** overwrite or touch that which has a "sel" bit of zero.
> > 
> 
> "rd_multi" is only an intermediate signal and is not a part of GPIO port
> registers.
> However I will remove those lines if they are unnecessary.

nono, don't do that, but if they are intermediary then move them actually
*into* the elaborate() function and remove "self.".  they're not intended
to be accessed externally, so why are they made public members?  just
move the initialisation wr_multi = blahblah actually directly into elaborate().

basically you can use wr_multi and rd_multi as a means to "re-format"
the data (match the data structure with what needs to go in/out of the
GPIO registers), and you can do that combinatorially.

but you *must* ensure that the GPIO_index (and associated
"sel") triggers only the storage into the GPIO_reg (for write - we=HI)
or copying-out (for read).
  
> I assumed that as long as WB ack is not asserted, I can delay the
> transaction by one cycle. Is this a wrong assumption?

yes.  what you've done requires two cycles, not one. the we needs to
be held for *two* cycles, which is not going to happen.  therefore you
*will* end up with data corruption, guaranteed.

> Do you expect this
> block to support pipelined transactions and thus the operation has be
> complete in one cycle?

yes. no.  complete on the *next* cycle.  the mistake you made earlier was
to make the acknowledgement combinatorial (which is defined as "complete in
one cycle i.e. complete in *this* cycle).  WB4 pipeline has to have the
acknowledgement be on the *NEXT* cycle, and if you get that strictly correct
it's "compatible" with WB classic.

> > Array() is only necessary on a *DYNAMIC* index, like GPIO_num.
>  
> Ok, now it's starting to make more sense haha!

i did exactly the same thing, early on. wrote code that had a
static for-loop but allocated an Array for each element. total
waste of time.  basically, Array() is a compact way to not have
to do this:

   with m.Switch(index):
      for i in range(1<<max(index)):
          with m.Case(i):
             sync += element[i].eq(something)

you can replace all that bullshit with a *one-line*:

   sync += element[index].eq(something)

but if you are instead doing this, you *do not* need the Array()

   for i in range(fixed_compiletime_constant_number):
        sync += element[i].eq(something)

> What's wrong with format()?
> Is it wasteful and does it reduce clarity?

it's nowhere near as compact.  printf formatting has been around
literally for decades, it's extremely well-known, isn't going away,
and is blindingly-obvious

  (first letter always has a clear meaning: (d)ecimal, he(x)adecimal,
    (f)loat (s)tring i mean it's not hard)

even has a wikipedia page https://en.wikipedia.org/wiki/Printf_format_string
Comment 25 Andrey Miroshnikov 2022-05-23 23:20:34 BST
(In reply to Luke Kenneth Casson Leighton from comment #24)
> no it's good.  with the PU/PD you can actually wiggle OE rather than IO
> and get that SCL/SDA behaviour that Cesar was talking about (the
> "on" or "high-impededance" thing rather than "on" and "off" which you
> would get by wiggling io)
Brain a bit fried, not sure what you mean there (might be a good point to mention tomorrow).

> no need, let's do that later when there's an actual IOpad to match it.
Sure.

> nono, don't do that, but if they are intermediary then move them actually
> *into* the elaborate() function and remove "self.".  they're not intended
> to be accessed externally, so why are they made public members?  just
> move the initialisation wr_multi = blahblah actually directly into
> elaborate().
Yes, very good idea. Completely forgot about the whole "private"/"public" thing. Will probably take some time until OOP with HDL becomes natural hahaha!

> basically you can use wr_multi and rd_multi as a means to "re-format"
> the data (match the data structure with what needs to go in/out of the
> GPIO registers), and you can do that combinatorially.
Yes! Thanks for clarifying this. I changed the code to connect wr_multi and rd_multi combinatorially to the respective WB data buses.

> but you *must* ensure that the GPIO_index (and associated
> "sel") triggers only the storage into the GPIO_reg (for write - we=HI)
> or copying-out (for read).
Yes. The GPIO_reg data is shifted to rd_multi, and ack raised  on the same cycle (as the data is valid).
For wr_multi, I have 1 clk delay to ensure ack is raised *after* GPIO_reg is updated.

> yes.  what you've done requires two cycles, not one. the we needs to
> be held for *two* cycles, which is not going to happen.
You're right, I missed that ack statement.

> even has a wikipedia page https://en.wikipedia.org/wiki/Printf_format_string
Thanks, I'll be using from now on in python (to the chargrin of most devs XD)

Took a bit of thinking, but got the driver code updated to support sel.
https://git.libre-soc.org/?p=pinmux.git;a=commitdiff;h=2e4f5b6bdedbff041b4dbfaedcc468d544fecfcf
Also needed to extend the wb_write and wb_read to support different sel values (not just 32-bit or 8-bit).
https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=919ad8e3c110f5af0779d7e1d329736c245b56ae

Unless I've missed anything else, now I can go back to the 1-pin pimux case.
Comment 26 Luke Kenneth Casson Leighton 2022-05-23 23:41:25 BST
these need to go.

 118                     with m.Else():
 119                         sync += rd_multi[i].oe.eq(0)
 120                         sync += rd_multi[i].ie.eq(0)
 121                         sync += rd_multi[i].puen.eq(0)
 122                         sync += rd_multi[i].pden.eq(0)
 123                         sync += rd_multi[i].io.eq(0)
 124                         sync += rd_multi[i].bank.eq(0)

replace with:

  93         comb += wb_rd_data.eq(Cat(*rd_multi))
  94         for i in range(len(bus.sel)):
                  sync += rd_multi[i].eq(0)


 130         # Delayed from the start of transaction by 1 clk cycle
 131         with m.If(new_transaction):
 132             # Update the GPIO configs with sent parameters
 133             with m.If(bus.we):

NO.  again, you are creating a 2-clock situation where bus.we will
be ***INVALID*** at the time new_transaction is raised.

remove new_transaction entirely.  delete ONLY the lines marked DELETE

 126         with m.Else():
 127  DELETE           sync += new_transaction.eq(0)
 128             sync += wb_ack.eq(0)
 129 
 130         # Delayed from the start of transaction by 1 clk cycle
 131  DELETE       with m.If(new_transaction):
 132             # Update the GPIO configs with sent parameters
 133  DELETE           with m.If(bus.we):


the entirety of the wishbone transaction is
***ONLY VALID AT THE CYCLE IT OCCURS***

look again at the WB4 specification PDF.  how on earth can the specification
be complied with if new_transaction is raised to indicate "please only read
wb.we on the **NEXT** cycle"??

you even say, at line 130, "delayed by 1 clk cycle" which means you are
reading the value of bus.we from the **NEXT** wishbone transaction **NOT**
the value of bus.we of the **PREVIOUS** cycle where it was valid.

bus.stb, bus.we, bus.data, bus.adr, these come as a *SPECIFIC* package
(transaction) on THAT cycle and THAT cycle alone.

look again at the timing diagrams in the WB4 spec.

yes it has taken me at least one year to fully understand this, you are
not alone in getting it wrong.  it is an absolute pig.
Comment 27 Andrey Miroshnikov 2022-05-24 15:56:38 BST
(In reply to Luke Kenneth Casson Leighton from comment #26)
> replace with:
> 
>   93         comb += wb_rd_data.eq(Cat(*rd_multi))
>   94         for i in range(len(bus.sel)):
>                   sync += rd_multi[i].eq(0)
Yes! Thanks for the simplification.

> NO.  again, you are creating a 2-clock situation where bus.we will
> be ***INVALID*** at the time new_transaction is raised.
> ...
> the entirety of the wishbone transaction is
> ***ONLY VALID AT THE CYCLE IT OCCURS***
> 
> look again at the WB4 specification PDF.  how on earth can the specification
> be complied with if new_transaction is raised to indicate "please only read
> wb.we on the **NEXT** cycle"??
> ...
> yes it has taken me at least one year to fully understand this, you are
> not alone in getting it wrong.  it is an absolute pig.
Yes, after you explanation and re-reading the spec, it makes sense.
Thanks for drilling it in!

Given that the intermediate signals are now combinatorial, the reading/writing of GPIO reg's can be done in the cycle where the slave detects the master's transaction. No "new_transaction" signals anymore.

https://git.libre-soc.org/?p=pinmux.git;a=commitdiff;h=70a9e00ebc3873399d36cfd45a555121dc9ce8ef
Comment 28 Luke Kenneth Casson Leighton 2022-05-24 16:26:50 BST
(In reply to Andrey Miroshnikov from comment #27)

> Yes, after you explanation and re-reading the spec, it makes sense.
> Thanks for drilling it in!

12 months for _me_ to get it...

> https://git.libre-soc.org/?p=pinmux.git;a=commitdiff;
> h=70a9e00ebc3873399d36cfd45a555121dc9ce8ef

awwesome, that's more like it.
Comment 29 Andrey Miroshnikov 2022-06-06 23:05:27 BST
I've been looking at the functions from pinfunctions.py (sdram1, i2c, etc.).
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/pinfunctions.py;h=f1f9558a68005ecbd5dbf2f40f1465d07cd2146e;hb=HEAD#l112

Why does the code in microtest.py (or ls180.py) call these functions with more input arguments than is specified?
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/microtest.py;h=ab957b6ae8dc6c596ecf820c6ec3c30d80b449ea;hb=HEAD#l59

For example i2c expects "suffix" and "bank"
def i2s(suffix, bank):
but microtest.py(l#63) does: 
ps.i2c("0", ('A', 1), 2)

Is that a Python-related syntax where you can hide arg parameters?


Also, after running the pinmux_generator.py to make the microtest example, I looked at the output files.
"pinmux.txt" and "microtest.mdwn" show the mux connectivity and are understandable.
However the "litex_pinpads.json" is missing information needed to assign the functions to particular banks.
For the pinspec file from which to generate the dict, am I meant to use the JSON file?
Comment 30 Luke Kenneth Casson Leighton 2022-06-09 00:51:50 BST
ok what's happened there is i had to rush-job one of
the outputs for you, adding muxing into one file
so you could do something, but only that one example.
minitest or something, can't remember now.
all others are broken and the code is horrendously
complex so i have been avoiding fixing it basically.
i will sort it tomorrow.
Comment 31 Luke Kenneth Casson Leighton 2022-06-09 00:56:46 BST
(In reply to Andrey Miroshnikov from comment #29)
> I've been looking at the functions from pinfunctions.py (sdram1, i2c, etc.).
> https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/pinfunctions.py;
> h=f1f9558a68005ecbd5dbf2f40f1465d07cd2146e;hb=HEAD#l112
> 
> Why does the code in microtest.py (or ls180.py) call these functions with
> more input arguments than is specified?
> https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/microtest.py;
> h=ab957b6ae8dc6c596ecf820c6ec3c30d80b449ea;hb=HEAD#l59

they are called indirectly by a Class Factory which uses the earlier arguments
to decide where to put the peripheral.

the pinfunctions.py definitions declare *what* pins and the pad names
exist, it does not in any way say *where* they must go.

pinfunctions.py is pretty muxh identical in purpose to nmigen_boards
Resources suxh as UARTResource()