michael, hi, it looks like there might be a bug in caller.py Mem ld/st. by accident i asked a byte-store (stb) to write to address 0x9. the memory-dump (fu/compunits/test/test_compunit.py) dumps out the internal dictionary (sim.mem.mem.items()) however it uses sim.mem.ld to initialise the nmigen Memory object. * bytes per word is 8 * address 9 divided by 8 is 1 * remainder is also 1. does this mean that the LD/ST is being word-order realigned? Writing 0xee to ST 0x9 memaddr 0x1/1 width,rem,shift,mask 1 1 0x30 0xff mem @ 0x1: 0xabeeef0187654321 None carry already done? 0b0 get_cu_outputs 2 0 after got outputs, rd_rel, wr_rel, wrmask: 0b0 0b0 0b0 busy 1 busy 1 busy 1 busy 0 check cu outputs stb 3, 1(2) {} check extra output 'stb 3, 1(2)' 0 0 sim mem dump 0 5432123412345678 1 abeeef0187654321 4 1828384822324252 nmigen mem dump 0 5432123412345678 1 abcdef018765ee21 2 0000000000000000 3 0000000000000000 4 1828384822324252 5 0000000000000000
problem "solved": diff --git a/src/soc/decoder/isa/caller.py b/src/soc/decoder/isa/caller.py index f76afda..7dfcec1 100644 --- a/src/soc/decoder/isa/caller.py +++ b/src/soc/decoder/isa/caller.py @@ -51,6 +51,7 @@ class Mem: def _get_shifter_mask(self, wid, remainder): shifter = ((self.bytes_per_word - wid) - remainder) * \ 8 # bits per byte + shifter = remainder * 8 mask = (1 << (wid * 8)) - 1 print ("width,rem,shift,mask", wid, remainder, hex(shifter), hex(mask)) return shifter, mask is this something that we need to put under the control of a BE/LE MSR flag?
hmmm i took a look at microwatt, and i'm likewise not seeing any evidence of offset-reversal based on an 8-byte (64-bit) boundary. i did however discover that qemu 8-byte memory-read returns data in big-endian order, and had to read it in single-bytes then reconstruct a little-endian debug/display value. honestly i have no real idea what i am doing here and could really use some help and discussion.
i think i got it. nmigen memory write port, if you specify a granularity argument to cut the SRAM into bytes, writes those bytes in *big* endian order. however if you read the same interface with a single read-enable line, the answer comes back in *little* endian order. *face-palm* i am however much preferring thinking in LE terms when it comes to memory layouts and byte addressing.
i tracked this down by adding memory dump/alteration to qemu and making a comparison of memory in qemu and memory in the simulator. it was a number of separate things: * the hardware was not performing big-endian byte reversal * the simulator was storing 8 bytes in a dictionary on 8-byte address-aligned boundary where the data order of each 8-byte group was byte-reversed (big-endian) * to correct this, the shift-mask function subtracted the offset from the *other* end (the 8-byte boundary of the underlying simulated memory - 64 bit blocks) what i did was: * reverse the order of 8-byte groups being stored to be in little-endian format in the simulator * turned the shift round so that it is exactly the LSB bits of the address (bits 0 to 2) where previously it was (7-datalen-AddrLSBs) * added a function which byte-reverses (big-endians) the load/store *data* - not the entire simulator-stored 64-bit-granularity memory * added a byte-reverse function into the hardware. that byte-reverse function *should* now be possible to call on-demand for LD/ST byte-reversal opcodes. i will give that a shot and see if it works.
happy with this one.