377 – possible bug in Simulator Mem ld/st function

Bug 377 - possible bug in Simulator Mem ld/st function

Summary: possible bug in Simulator Mem ld/st function

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Mac OS

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:

Reported:	2020-06-12 15:58 BST by Luke Kenneth Casson Leighton
Modified:	2020-06-22 11:51 BST (History)
CC List:	2 users (show)

See Also:
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2020-06-12 15:58:35 BST

michael, hi,

it looks like there might be a bug in caller.py Mem ld/st.  by accident
i asked a byte-store (stb) to write to address 0x9.  the memory-dump
(fu/compunits/test/test_compunit.py) dumps out the internal dictionary
(sim.mem.mem.items()) however it uses sim.mem.ld to initialise the
nmigen Memory object.

* bytes per word is 8
* address 9 divided by 8 is 1
* remainder is also 1.

does this mean that the LD/ST is being word-order realigned?


Writing 0xee to ST 0x9 memaddr 0x1/1
width,rem,shift,mask 1 1 0x30 0xff
mem @ 0x1: 0xabeeef0187654321
None
carry already done? 0b0
get_cu_outputs 2 0
after got outputs, rd_rel, wr_rel, wrmask:  0b0 0b0 0b0
busy 1
busy 1
busy 1
busy 0
check cu outputs stb 3, 1(2) {}
check extra output 'stb 3, 1(2)' 0 0
sim mem dump
         0 5432123412345678
         1 abeeef0187654321
         4 1828384822324252
nmigen mem dump
         0 5432123412345678
         1 abcdef018765ee21
         2 0000000000000000
         3 0000000000000000
         4 1828384822324252
         5 0000000000000000

Comment 1 Luke Kenneth Casson Leighton 2020-06-12 16:27:23 BST

problem "solved":

diff --git a/src/soc/decoder/isa/caller.py b/src/soc/decoder/isa/caller.py
index f76afda..7dfcec1 100644
--- a/src/soc/decoder/isa/caller.py
+++ b/src/soc/decoder/isa/caller.py
@@ -51,6 +51,7 @@ class Mem:
     def _get_shifter_mask(self, wid, remainder):
         shifter = ((self.bytes_per_word - wid) - remainder) * \
             8  # bits per byte
+        shifter = remainder * 8
         mask = (1 << (wid * 8)) - 1
         print ("width,rem,shift,mask", wid, remainder, hex(shifter), hex(mask))
         return shifter, mask

is this something that we need to put under the control of a BE/LE MSR flag?

Comment 2 Luke Kenneth Casson Leighton 2020-06-13 16:53:16 BST

hmmm i took a look at microwatt, and i'm likewise not seeing any evidence of
offset-reversal based on an 8-byte (64-bit) boundary.

i did however discover that qemu 8-byte memory-read returns data in big-endian
order, and had to read it in single-bytes then reconstruct a little-endian
debug/display value.

honestly i have no real idea what i am doing here and could really use some
help and discussion.

Comment 3 Luke Kenneth Casson Leighton 2020-06-14 00:52:09 BST

i think i got it.

nmigen memory write port, if you specify a granularity argument to cut the SRAM into bytes, writes those bytes in *big* endian order.

however if you read the same interface with a single read-enable line, the answer comes back in *little* endian order.

*face-palm*

i am however much preferring thinking in LE terms when it comes to memory layouts and byte addressing.

Comment 4 Luke Kenneth Casson Leighton 2020-06-14 16:02:11 BST

i tracked this down by adding memory dump/alteration to qemu and making
a comparison of memory in qemu and memory in the simulator.

it was a number of separate things:

* the hardware was not performing big-endian byte reversal
* the simulator was storing 8 bytes in a dictionary on 8-byte
  address-aligned boundary where the data order of each 8-byte
  group was byte-reversed (big-endian)
* to correct this, the shift-mask function subtracted the offset
  from the *other* end (the 8-byte boundary of the underlying
  simulated memory - 64 bit blocks)

what i did was:

* reverse the order of 8-byte groups being stored to be in
  little-endian format in the simulator
* turned the shift round so that it is exactly the
  LSB bits of the address (bits 0 to 2) where previously
  it was (7-datalen-AddrLSBs)
* added a function which byte-reverses (big-endians) the load/store
  *data* - not the entire simulator-stored 64-bit-granularity memory
* added a byte-reverse function into the hardware.

that byte-reverse function *should* now be possible to call on-demand
for LD/ST byte-reversal opcodes.  i will give that a shot and see if
it works.

Comment 5 Luke Kenneth Casson Leighton 2020-06-22 11:51:00 BST

happy with this one.