Bug 377 - possible bug in Simulator Mem ld/st function
Summary: possible bug in Simulator Mem ld/st function
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: PC Mac OS
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2020-06-12 15:58 BST by Luke Kenneth Casson Leighton
Modified: 2020-06-22 11:51 BST (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2020-06-12 15:58:35 BST
michael, hi,

it looks like there might be a bug in caller.py Mem ld/st.  by accident
i asked a byte-store (stb) to write to address 0x9.  the memory-dump
(fu/compunits/test/test_compunit.py) dumps out the internal dictionary
(sim.mem.mem.items()) however it uses sim.mem.ld to initialise the
nmigen Memory object.

* bytes per word is 8
* address 9 divided by 8 is 1
* remainder is also 1.

does this mean that the LD/ST is being word-order realigned?


Writing 0xee to ST 0x9 memaddr 0x1/1
width,rem,shift,mask 1 1 0x30 0xff
mem @ 0x1: 0xabeeef0187654321
None
carry already done? 0b0
get_cu_outputs 2 0
after got outputs, rd_rel, wr_rel, wrmask:  0b0 0b0 0b0
busy 1
busy 1
busy 1
busy 0
check cu outputs stb 3, 1(2) {}
check extra output 'stb 3, 1(2)' 0 0
sim mem dump
         0 5432123412345678
         1 abeeef0187654321
         4 1828384822324252
nmigen mem dump
         0 5432123412345678
         1 abcdef018765ee21
         2 0000000000000000
         3 0000000000000000
         4 1828384822324252
         5 0000000000000000
Comment 1 Luke Kenneth Casson Leighton 2020-06-12 16:27:23 BST
problem "solved":

diff --git a/src/soc/decoder/isa/caller.py b/src/soc/decoder/isa/caller.py
index f76afda..7dfcec1 100644
--- a/src/soc/decoder/isa/caller.py
+++ b/src/soc/decoder/isa/caller.py
@@ -51,6 +51,7 @@ class Mem:
     def _get_shifter_mask(self, wid, remainder):
         shifter = ((self.bytes_per_word - wid) - remainder) * \
             8  # bits per byte
+        shifter = remainder * 8
         mask = (1 << (wid * 8)) - 1
         print ("width,rem,shift,mask", wid, remainder, hex(shifter), hex(mask))
         return shifter, mask

is this something that we need to put under the control of a BE/LE MSR flag?
Comment 2 Luke Kenneth Casson Leighton 2020-06-13 16:53:16 BST
hmmm i took a look at microwatt, and i'm likewise not seeing any evidence of
offset-reversal based on an 8-byte (64-bit) boundary.

i did however discover that qemu 8-byte memory-read returns data in big-endian
order, and had to read it in single-bytes then reconstruct a little-endian
debug/display value.

honestly i have no real idea what i am doing here and could really use some
help and discussion.
Comment 3 Luke Kenneth Casson Leighton 2020-06-14 00:52:09 BST
i think i got it.

nmigen memory write port, if you specify a granularity argument to cut the SRAM into bytes, writes those bytes in *big* endian order.

however if you read the same interface with a single read-enable line, the answer comes back in *little* endian order.

*face-palm*

i am however much preferring thinking in LE terms when it comes to memory layouts and byte addressing.
Comment 4 Luke Kenneth Casson Leighton 2020-06-14 16:02:11 BST
i tracked this down by adding memory dump/alteration to qemu and making
a comparison of memory in qemu and memory in the simulator.

it was a number of separate things:

* the hardware was not performing big-endian byte reversal
* the simulator was storing 8 bytes in a dictionary on 8-byte
  address-aligned boundary where the data order of each 8-byte
  group was byte-reversed (big-endian)
* to correct this, the shift-mask function subtracted the offset
  from the *other* end (the 8-byte boundary of the underlying
  simulated memory - 64 bit blocks)

what i did was:

* reverse the order of 8-byte groups being stored to be in
  little-endian format in the simulator
* turned the shift round so that it is exactly the
  LSB bits of the address (bits 0 to 2) where previously
  it was (7-datalen-AddrLSBs)
* added a function which byte-reverses (big-endians) the load/store
  *data* - not the entire simulator-stored 64-bit-granularity memory
* added a byte-reverse function into the hardware.

that byte-reverse function *should* now be possible to call on-demand
for LD/ST byte-reversal opcodes.  i will give that a shot and see if
it works.
Comment 5 Luke Kenneth Casson Leighton 2020-06-22 11:51:00 BST
happy with this one.