Bug 763 - Work on I/O Core Pad JTAG Tests
Summary: Work on I/O Core Pad JTAG Tests
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 50
  Show dependency treegraph
 
Reported: 2022-01-13 01:20 GMT by Andrey Miroshnikov
Modified: 2022-02-17 21:14 GMT (History)
1 user (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 1500
budget (EUR) for this task, excluding subtasks' budget: 1500
parent task for budget allocation: 50
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
[andrey] amount = 1500 submitted = 2022-02-10 paid = 2022-02-17


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Miroshnikov 2022-01-13 01:20:44 GMT
Separate bug for the work I did on the core/pad resource JTAG test code.
Was initially documented in bug #50 .

The work involved:
- Parse a dummy resource list (gpio, i2c, uart)
- Instantiate peripherals from the list
- Write basic unit tests for the peripherals
- Connect the peripheral pins to the JTAG BS chain
- Write unit tests to check the JTAG functionality (SAMPLE and EXTEST)

Latest code:
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/testing_stage1.py;h=e6f135388a28d887ddf9664ebc33817373282d35;hb=074aced79c300b2397d2f4504c41e2ddfe9ba7ed

Currently this temporary wiki page is used for documenting the tests:
https://libre-soc.org/docs/pinmux/temp_pinmux_info/
(I'll be adding the information on the pinmux block later this week)

One main issue with a unit test is remaining: 
The TDO data after an EXTEST (test case 1 and 3) does not match up with the ios dictionary of signals (the signals corresponding to the BS chain shift register).

As I'm using the ios dictionary signals for the asserts (and comparing against the expected test vector), the test is passing, but I don't know if the JTAG is not being driven correctly, or something is affecting the shift register (or clears it).
Comment 1 Andrey Miroshnikov 2022-01-30 17:17:32 GMT
Luke, before closing the bug, I wanted to find the cause of the TDO issue.
To make the issue easier to find, I focused on checking the uart rx/tx signals.

Pad input rx cleared. UART rx/tx internal loopback disabled:

1. EX_TEST, BS data is 0x80000 (set uart rx core i)
2. EX_TEST, BS data is 0x40000 (set uart tx pad o)
3. EX_TEST, BS data is 0xC0000 (set uart rx core i and tx pad o)

In all cases, the rx core input and tx pad output waveforms are correct.
TDO however is all zeroes. When tx pad output is high (cases 2 and 3), one of the TDO bits MUST be set to coincide with tx pad output.

The same set of test cases was repeated with the rx/tx loopback connection (on the core side):

4. EX_TEST, BS data is 0x80000 (set uart rx core i)
5. EX_TEST, BS data is 0x40000 (set uart tx pad o)
6. EX_TEST, BS data is 0xC0000 (set uart rx core i and tx pad o)

The waveforms show the tx core output being set when rx core input is set which is correct.
However TDO is again all zeroes.


TDO does however, show the correct data when rx pad i is set, with the BS_SAMPLE mode:
7. BS_SAMPLE, BS data is 0x00000

This test case should show the rx pad/core i and tx core/pad o set (because loopback is working, and jtag is passthrough), and this is what is observed.
TDO is correct, 1100 0000 0000 0000 0000. Upper two bits indicate rx pad in and tx core out.

The ios dict only contains read-only signals, so perhaps they are not shifted out into TDO during an external test (EX_TEST)?

Checking yosys diagram, I can see that io_bd Boundary Scan register controls signals coming out of JTAG (core inputs and pad outputs) and it is shifted out to TDO.
Checking c4m-jtag/c4m/nmigen/jtag/tap.py, I haven't been able to find how tdo would connect to io_bd (but I did see connection to sr, the main shift register).
Comment 2 Luke Kenneth Casson Leighton 2022-01-30 17:48:56 GMT
https://git.libre-soc.org/?p=c4m-jtag.git;a=blob;f=c4m/nmigen/jtag/tap.py;h=e0cf747150c03c45a7be16682b31422103ab75bc;hb=eb28aea69c7a785085364c2fb3c2cbdbe86f4890#l625

io_bd - and in fact io_sr itself directly - is set to directly be
sent out via TDOs, one bit at a time (line 673)

io_sr is only shifted "in" from tdi in "shift" mode (line 620)

io_sr only takes data *from* pads and core in *capture* mode (line 592)

so you cannot just set a few signals and "hope for the best", you have
to actually make sure that the correct "Modes" (shift, update, capture)
are set.

the JTAG example in soc that i referred you to does this correctly.
it took me something mad like 2 days to understand it.
Comment 3 Andrey Miroshnikov 2022-01-31 11:38:22 GMT
Based on your reply, I looked at yesterday's waveforms and saw the Capture state happens before the Shifting state (when the BS data or JTAG configuration is shifted in), but as I was about to start running the code again, I got an assertion error from c4m-jtag tap.py.

https://git.libre-soc.org/?p=c4m-jtag.git;a=blob;f=c4m/nmigen/jtag/tap.py;h=e0cf747150c03c45a7be16682b31422103ab75bc;hb=eb28aea69c7a785085364c2fb3c2cbdbe86f4890#l592
(assertion on #l616)

The assertion fails because the length of iol does not match the sum of all subsignals in _ios:
Traceback (most recent call last):
  File "testing_stage1.py", line 1005, in <module>
    test_jtag()
  File "testing_stage1.py", line 976, in test_jtag
    dut = setup_blinker(build_blinker=False)
  File "testing_stage1.py", line 930, in setup_blinker
    vl = rtlil.convert(top, ports=top.ports())
  File "/home/rohdo/src/nmigen/nmigen/back/rtlil.py", line 1054, in convert
    fragment = ir.Fragment.get(elaboratable, platform).prepare(**kwargs)
  File "/home/rohdo/src/nmigen/nmigen/hdl/ir.py", line 37, in get
    obj = obj.elaborate(platform)
  File "/home/rohdo/src/nmigen/nmigen/hdl/dsl.py", line 538, in elaborate
    fragment.add_subfragment(Fragment.get(self._named_submodules[name], platform), name)
  File "/home/rohdo/src/nmigen/nmigen/hdl/ir.py", line 37, in get
    obj = obj.elaborate(platform)
  File "/home/rohdo/src/soc/pinmux/src/spec/jtag.py", line 185, in elaborate
    m = super().elaborate(platform)
  File "/home/rohdo/src/c4m-jtag/c4m/nmigen/jtag/tap.py", line 434, in elaborate
    bd2io=io_bd2io, bd2core=io_bd2core,
  File "/home/rohdo/src/c4m-jtag/c4m/nmigen/jtag/tap.py", line 621, in _elaborate_ios
    assert idx == length, "Internal error, length mismatch"
AssertionError: Internal error, length mismatch

Adding prints to tap.py show me that self._ios is a list of records, each record containing a core/pad pair:
[(rec uart_0__rx (rec uart_0__rx__core i) (rec uart_0__rx__pad i)),
... more signals ...
(rec i2c_0__scl__oe (rec i2c_0__scl__oe__core o) (rec i2c_0__scl__oe__pad o))]
So, the total number of subsignals in _ios is 40.

iol is a list that *only* contains peripheral core outputs and pad inputs (i.e. signals that JTAG cannot control, but can only sample):
[(sig uart_0__rx__pad__i), (sig uart_0__tx__core__o), 
... more signals ...
(sig i2c_0__scl__i__pad__i), (sig i2c_0__scl__o__core__o), (sig i2c_0__scl__oe__core__o)]
No pullups/downs or banksel are requested on any of the pins, so that part of the code in tap.py is not being used. Total number of subsignals in iol is 20.

I was wondering whether the chain is meant to be 40 bits long (to account for the core/pad pair), but I don't understand it enough to make that conclusion.

The way the code shows it (and the way you described it) indicates that theres a (not including intermediate states like idle, run, etc.):
1. shifting stage (which I've seen in the waveforms)
2. update stage where signals controlled by JTAG BS are set by the shift register
3. capture stage where JTAG inputs are copied into the main shift register

So that implies there's no need for a 40-bit chain, as you can access the inputs and outputs by selecting different modes.
Comment 4 Luke Kenneth Casson Leighton 2022-01-31 11:59:00 GMT
(In reply to Andrey Miroshnikov from comment #3)

> I was wondering whether the chain is meant to be 40 bits long (to account
> for the core/pad pair), but I don't understand it enough to make that
> conclusion.

look again at the loop.  line 634.

https://git.libre-soc.org/?p=c4m-jtag.git;a=blob;f=c4m/nmigen/jtag/tap.py;h=e0cf747150c03c45a7be16682b31422103ab75bc;hb=eb28aea69c7a785085364c2fb3c2cbdbe86f4890#l634

if you do not understand it, insert some print statements into that code.

again, a reminder: stop thinking that you do not have the right to do
anything you want to, to find out what is going on.

you are still thinking of other people's code as "a black box that cannot
be touched and it must work, as-is".

likewise you can perfectly well put print statements into TAP.add_io()
as well.

if you are expecting only 20 signals and there are double that amount
it is probably because you have called a loop around TAP.add_io() twice.
Comment 5 Andrey Miroshnikov 2022-02-09 21:00:00 GMT
(In reply to Luke Kenneth Casson Leighton from comment #4)
> if you do not understand it, insert some print statements into that code.

Yep, that was indeed helpful.

In addition I had to bite the bullet and study the TAP FSM. Now it makes a lot more sense. Who would've thought? XD

> 
> again, a reminder: stop thinking that you do not have the right to do
> anything you want to, to find out what is going on.

Sure. Now I've even found a discrepancy that doesn't really makes sense in tap.py
The TDI data for sr is shifted in *MSB first*. See:
https://git.libre-soc.org/?p=c4m-jtag.git;a=blob;f=c4m/nmigen/jtag/tap.py;hb=dfa012c0a884274f2ec5866ce0ac49e271a0bde5#l624

Now, there's another interesting discrepancy. The Instruction Register (ir) is loaded from TDI *LSB first*. See:
https://git.libre-soc.org/?p=c4m-jtag.git;a=blob;f=c4m/nmigen/jtag/tap.py;hb=dfa012c0a884274f2ec5866ce0ac49e271a0bde5#l150

This is important because in tms_data_getset(), which is used in jtag_read_write_reg(), the TDI data is sent *LSB first*!!!
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/debug/test/test_jtag_tap.py;hb=67ee79592f304b48e424a92e09a9fbd3a7fff751#l29

This means tms_data_get_set() works for setting the instruction (SAMPLE, EXTEST, BYPASS, etc.), but is NOT correct for setting the data.

Both ID and IR blocks shift in word LSB first,
while IO shifts in word MSB first.
Is this expected behaviour?

If the issue is with tap.py, ID and IR could be changed to use MSB first shifting.
Alternatively, what I propose is
- add a function tms_instr_set() which will load data *LSB first* for use with the setting the TAP instruction.
- modify tms_data_get_set() for-loop to start from the MSB and count down:
    for i in range(d_len-1, -1, -1)
This change also ensures the TDO data is correct (it is shifted out MSB first as well)

I made those changes locally, but if you give me access to soc repo, I could pull them in.


As for the issues I had with the tests. The above issue was only affecting me now, once I got TDO to output the boundary scan registers.

To read the resulting boundary scan chain values after an EXTEST, the TAP FSM has to be moved to the Capture state (as you've explained in comment #2, but I didn't understand significance of until I studied the FSM). I will cover this in more detail on the wiki page.

Importantly for this bug, all 4 test cases are working and producing results that are expected.

To remind of the cases:
1. Pad inputs and core outputs low, EXTEST, send 2^N-1 (0xFFFFF) test vector.
2. Pad inputs and core outputs low, SAMPLE, send 0xFFFFF test vector.
3. Pad inputs and core outputs high, EXTEST, send 0x00000 test vector.
4. Pad inputs and core outputs high, SAMPLE, send 0x00000 test vector.

Results:
1. Pad outputs and oe are high, core inputs are high.
2. All signals low.
3. All pad inputs and core oe signals high.
4. Pad outputs and core inputs high (as well as pad inputs and core outputs).

Cases 1 and 2 check control from JTAG only, cases 3 and 4 test a system where signals are controlled externally (by pads or peripherals).

The core inputs values are determined by sampling the core outputs, since they are connected via an XOR gate internally.
The reason core output oe's are high in case 3 is because the core oe's are only controlled by the oe test vector inputs, while all core outputs are an XOR of the core input and a test vector (core_i ^ test)

Can this bug now be closed and RFP sent?
Comment 6 Luke Kenneth Casson Leighton 2022-02-09 23:02:12 GMT
briefly, read it again

 150         with m.Elif(self._shift):
 151             m.d.posjtag += shift_ir.eq(Cat(shift_ir[1:], self._tdi))

on the next clock of posjtag:

this is taking TDI into the top bit (MSB) and dropping the
first bit (LSB). in software:

shift_ir = shift_ir >> 1 # shift down 1 bit
shift_ir |= (tdi << len(shift_ir-1) # put TDI into MSB
Comment 7 Luke Kenneth Casson Leighton 2022-02-10 05:22:12 GMT
Andrey, each Shift Register created and used in
one area for one task has absolutely nothing to
do with any other shift register.

the SR used for identity (IDBlock) has nothing
to do with the IO shift register.

the WB SR has nothing to do with theIO SR or
the ID SR.

what each piece of HDL does with each SR is entirely
up to that piece of HDL.

a module could decide to insert bits in an order related
to phases of the moon and the spot trading price of matured
smoked Edam if they wanted.

therefore the bitreversal should i feel be done by the
caller, i.e. call a explicit reverse() on the data in and
out of the test function.

if you really want to do reversal in the test function,
(which is fine btw) then
the only "safe" way to do it is to add extra named parameters
at the end of the paramlist, defaulting to values which do
NOT change the current behaviour.

so e.g. reverse=False

this is standard practice in well-maintained SQL databases and
in any code providing optional args (java, c++, python) but python
in particular has to watch out because it is interpreted.

welcome to your first experience of maintaining backwards compatibility
in software.
Comment 8 Luke Kenneth Casson Leighton 2022-02-10 05:25:40 GMT
  29 def tms_data_getset(dut, tms, d_len, d_in=0):

, reverse=False

  30     res = 0
  31     yield dut.bus.tms.eq(tms)
  32     for i in range(d_len):

r = range(dlen)
if reverse:
   r = reversed(r)

or similar
Comment 9 Andrey Miroshnikov 2022-02-10 12:40:41 GMT
(In reply to Luke Kenneth Casson Leighton from comment #6)
> this is taking TDI into the top bit (MSB) and dropping the
> first bit (LSB). in software:

This is what I tried to describe (I probably didn't make too much sense did I?).

I understood the behaviour, however my question is why are the IDBypassBlock and IRBlock shifting new data from the top of their internal shift registers, while the IOBlock shifts new data from the bottom?
IRBlock:       tdo <- ir[0] ir[1] ... ir[-1] <- tdi
IDBypassBlock: tdo <- sr[0] sr[1] ... sr[-1] <- tdi
IOBlock:       tdi -> sr[0] sr[1] ... sr[-1] -> tdo

The answer is probably, this is legacy/complicated/etc. so I won't press it further.

Actually, just checked the old VHDL files TAP was based of, looks like they have done the same arrangement:
https://git.libre-soc.org/?p=c4m-jtag.git;a=blob;f=c4m/vhdl/jtag/c4m_jtag_idblock.vhdl;h=a8c4ac995da3a787e3a75d10e94bf47018defa81;hb=fe7f74ad4ece0471a90bd4181f15ff678a844b07#l50



(In reply to Luke Kenneth Casson Leighton from comment #7)
> Andrey, each Shift Register created and used in
> one area for one task has absolutely nothing to
> do with any other shift register.

I know, I should've been more explicit in my wording. The names such as "io_sr" clearly implied that the signals were completely separate.

> a module could decide to insert bits in an order related
> to phases of the moon and the spot trading price of matured
> smoked Edam if they wanted.

Pretty much.

> 
> therefore the bitreversal should i feel be done by the
> caller, i.e. call a explicit reverse() on the data in and
> out of the test function.
> 
> if you really want to do reversal in the test function,
> (which is fine btw) then
> the only "safe" way to do it is to add extra named parameters
> at the end of the paramlist, defaulting to values which do
> NOT change the current behaviour.
> 
> so e.g. reverse=False

Sure, sounds like a plan. I can add a "reverse" optional arg to tms_data_getset()

The proposed change to test_jtag_tap.py:
1. Add the function:
def reverse_bit_order(word, wordlen):
    rev_word = 0
    for bit in range(wordlen):
        rev_word += ((word >> bit) & 0x1) << (wordlen - 1 - i)
    return rev_word

2. Modify input arg and add if statement:
def tms_data_getset(dut, tms, d_len, d_in=0, reverse=False):
    if reverse:
        # Reverse the data word to transmit MSB-first
        d_in = reverse_bit_order(d_in, d_len)

3. Add "reverse" argument to jtag_read_write_reg()

> this is standard practice in well-maintained SQL databases and
> in any code providing optional args (java, c++, python) but python
> in particular has to watch out because it is interpreted.
> 
> welcome to your first experience of maintaining backwards compatibility
> in software.

Hahahahaha :)
Comment 10 Luke Kenneth Casson Leighton 2022-02-10 13:00:37 GMT
(In reply to Andrey Miroshnikov from comment #9)
> (In reply to Luke Kenneth Casson Leighton from comment #6)
> > this is taking TDI into the top bit (MSB) and dropping the
> > first bit (LSB). in software:
> 
> This is what I tried to describe (I probably didn't make too much sense did
> I?).

3am for me not much makes sense

> I understood the behaviour, however my question is why are the IDBypassBlock
> and IRBlock shifting new data from the top of their internal shift
> registers, while the IOBlock shifts new data from the bottom?

Officially Not Our Problem (tm).  one to file in the "that's interesting, i'll
make a mental note in case it's relevant some point down the line" bucket.

> (In reply to Luke Kenneth Casson Leighton from comment #7)
>
> The proposed change to test_jtag_tap.py:
> 1. Add the function:
> def reverse_bit_order(word, wordlen):
>     rev_word = 0
>     for bit in range(wordlen):
>         rev_word += ((word >> bit) & 0x1) << (wordlen - 1 - i)
>     return rev_word
> 
> 2. Modify input arg and add if statement:
> def tms_data_getset(dut, tms, d_len, d_in=0, reverse=False):
>     if reverse:
>         # Reverse the data word to transmit MSB-first
>         d_in = reverse_bit_order(d_in, d_len)
> 
> 3. Add "reverse" argument to jtag_read_write_reg()

looks great.

you have commit rights to soc (have all along), you could check
with "ssh gitolite3@git.libre-soc.org"
Comment 11 Andrey Miroshnikov 2022-02-10 15:59:16 GMT
(In reply to Luke Kenneth Casson Leighton from comment #10)
> looks great.

https://git.libre-soc.org/?p=soc.git;a=commitdiff;h=d8bf21d37504d89537c397d32c84e772db5d3d37

Had to use a slightly different appoach, as just changing the input data doesn't also adjust the output.
Instead on the reverse flag, the for loop is iterated over backwards. This makes both the input and output data MSB-first.

Tests are now working.
https://git.libre-soc.org/?p=pinmux.git;a=blob;f=src/spec/testing_stage1.py;h=144fb8510dee99a7b177708a19a3416644564a94;hb=2285f3ad73ea14f79a52344a4c22c91feb754d55

> you have commit rights to soc (have all along), you could check
> with "ssh gitolite3@git.libre-soc.org"

Ah, didn't realise, thanks!
Saved in my "helpful commands" textfile.

Since there are no more blockers for this bug, can it be closed complete?
Comment 12 Andrey Miroshnikov 2022-02-10 17:44:28 GMT
Task finished, sending RFP.