272 – functions needed in POWER simulator which match 3.0B spec

Bug 272 - functions needed in POWER simulator which match 3.0B spec

Summary: functions needed in POWER simulator which match 3.0B spec

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	--- enhancement
Assignee:	Michael Nolan

URL:

Depends on:	278 463
Blocks:	241
	Show dependency tree / graph

Reported:	2020-03-30 18:27 BST by Luke Kenneth Casson Leighton
Modified:	2020-12-06 13:06 GMT (History)
CC List:	2 users (show)

See Also:	269
NLnet milestone:	NLNet.2019.10.046.Standards
total budget (EUR) for completion of task and all subtasks:	250
budget (EUR) for this task, excluding subtasks' budget:	250
parent task for budget allocation:	241
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:	"mnolan"={amount=200, paid=2020-10-19} "lkcl"={amount=50, paid=2020-10-19}

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2020-03-30 18:27:41 BST

we need some functions, named EXTS, EXTZ64 and so on, which
are in the spec, and need to be implemented.  this is so that
the auto-generated code, which is taken directly *from* the spec
(pseudo-code fragments, converted to actual python code),
can call them.

michael can you take a look?

Comment 1 Michael Nolan 2020-03-30 20:04:59 BST

Would this be in power_pseudo.py? Or the POC simulator I wrote?

Comment 2 Luke Kenneth Casson Leighton 2020-03-30 20:24:31 BST

(In reply to Michael Nolan from comment #1)
> Would this be in power_pseudo.py? Or the POC simulator I wrote?

in the simulator, as a base "convenience" class of some kind.

basically, right now, code like this (which is extswsli)

    n  <- sh[5] || sh[0:4]
    r  <- ROTL64(EXTS64(RS[32:63]), n)
    m  <- MASK(0, 63-n)
    RA <- r & m

currently gets turned into this:

    n  = sh[5] + sh[0:5] # these are lists so they get concatenated
    r  = ROTL64(EXTS64(RS[32:64]), n)
    m  = MASK(0, 63-n)
    RA = r & m

however what is *actually* needed is:

    n  = sh[5] + sh[0:5] # these are lists so they get concatenated
    r  = self.helperfns.ROTL64(self.helperfns.EXTS64(RS[32:64]), n)
    m  = self.helperfns.MASK(0, 63-n)
    RA = r & m

and some other small changes.

remember, the code that's being executed will be *in* the simulator.
it's only generated *by* the compiler (power_pseudo.py).

welcome to compilers, they're a bit of a mind-bender :)

Comment 3 Luke Kenneth Casson Leighton 2020-03-31 13:57:39 BST

these are the functions, listed in 3.0B section 1.4.3 p6 - p7

ABS(x)     
BCD_TO_DPD(x)
CEIL(x)   
DOUBLE(x) 
DPD_TO_BCD(x)
EXTS(x)    
FLOOR(x)
GPR(x)  
MASK(x, y)
MEM(x, y)
ROTL64(x, y)
ROTL32(x, y)
SINGLE(x) 
SPR(x)

there's really not many of them.

Comment 4 Luke Kenneth Casson Leighton 2020-03-31 14:06:57 BST

found some of the helpers, look great michael.  i think i may just be able
to import them from helpers.py and, yes, assume they're globally imported.
should be fine.

btw some whitespace crept in again:

+
+        <
+        <
+    def assertHex(self, a, b):
+        msg = "{:x} != {:x}".format(a, b)
+        return self.assertEqual(a, b, msg)
+

Comment 5 Michael Nolan 2020-03-31 15:24:40 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)
> these are the functions, listed in 3.0B section 1.4.3 p6 - p7
> there's really not many of them.

Just a note: SPR(x), GPR(x) and MEM(x, y) will need to be located outside helpers.py so they can access their respective memories

Comment 6 Luke Kenneth Casson Leighton 2020-03-31 15:31:00 BST

(In reply to Michael Nolan from comment #5)

> Just a note: SPR(x), GPR(x) and MEM(x, y) will need to be located outside
> helpers.py so they can access their respective memories

yes, see http://bugs.libre-riscv.org/show_bug.cgi?id=269#c3

Comment 7 Michael Nolan 2020-03-31 19:35:25 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)
> these are the functions, listed in 3.0B section 1.4.3 p6 - p7
> 
> BCD_TO_DPD(x)
> DPD_TO_BCD(x)

The instructions that use these pseudocode functions (cbcdtd and cdtbcd) cause the qemu power9 machine to trap to an exception handler. Should I bother trying to implement them at the moment?

Comment 8 Michael Nolan 2020-03-31 19:47:34 BST

Status of each helper function in the document:

ABS(x)		- Not found in pseudocode
BCD_TO_DPD(x)	- Traps on QEMU
CEIL(x)		- Not found in pseudocode
DOUBLE(x)	- Float - needs softfloat library
DPD_TO_BCD(x)	- Traps on QEMU
EXTS(x)		- Implemented
FLOOR(x)	- Not found in pseudocode
GPR(x)		- Needs internal implementation
MASK(x, y)	- Implemented
MEM(x, y)	- Needs internal implementation
ROTL64(x, y)	- Implemented
ROTL32(x, y)	- Implemented
SINGLE(x)	- Float - needs softfloat library
SPR(x)		- Needs internal implementation

Comment 9 Luke Kenneth Casson Leighton 2020-03-31 21:06:50 BST

(In reply to Michael Nolan from comment #7)
> (In reply to Luke Kenneth Casson Leighton from comment #3)
> > these are the functions, listed in 3.0B section 1.4.3 p6 - p7
> > 
> > BCD_TO_DPD(x)
> > DPD_TO_BCD(x)
> 
> The instructions that use these pseudocode functions (cbcdtd and cdtbcd)
> cause the qemu power9 machine to trap to an exception handler. Should I
> bother trying to implement them at the moment?

... naah :)

(In reply to Michael Nolan from comment #8)
> Status of each helper function in the document:
> SINGLE(x)	- Float - needs softfloat library

let's leave this one for now.  i'd like to focus on integer initially.

i am planning to do a "pre-prep" phase (which gets variables set up,
reading in advance from the regfile) and "post-cleanup" phase
(which writes out anything that was modified)

what we will need, however, is an indexable overloaded integer class.
(err, basically, the features of Signal).

in other words, due to the fact that you can do this:

RS[32:64] = product[0:32]

or

RS[0:32] = (RA)[0:32] + (RB)[0:32]

it means basically an integer class with operator overloads on
__add__ *and* it means adding a __getitem__ (in order to be able
to do the slice), and probably changing it to be this:

RS[0:32].eq(RA[0:32] + RB[0:32])

later, we will need a class for SimpleV anyway (so as to be able to change
bit-widths)

do you want to do that?  it's basically a class:

class Int:
   def __init__(self, value, width):
       self.value = value
       self.width = width
   def __getitem__(self, idx):
       if isinstance(idx, slice):
           # TODO, convert slice to an &able mask,
           # make sure that the length is that of the slice
       else:
           return Int((self.int & (1<<idx) >> idx, 1)

it will also need an "append" function - again, just like Signal can
use Cat().

it *might* - although performance will be absolutely terrible - be that
we are forced to individually slice bits into lists.

one other alternative, michael, is just... use Signal.  it's got everything
needed.

ideas?

Comment 10 Michael Nolan 2020-03-31 23:02:03 BST

(In reply to Luke Kenneth Casson Leighton from comment #9)
> ... naah :)

> let's leave this one for now.  i'd like to focus on integer initially.

That's what I figured 


> later, we will need a class for SimpleV anyway (so as to be able to change
> bit-widths)
> 
> do you want to do that?  it's basically a class:

I can, sure.

> 
> one other alternative, michael, is just... use Signal.  it's got everything
> needed.

Doing it through signal would mean messing with yield and such wouldn't it? Unless implementing the class turns out to be very difficult, I'd kinda like to avoid that

Comment 11 Luke Kenneth Casson Leighton 2020-04-01 00:07:07 BST

(In reply to Michael Nolan from comment #10)

> Doing it through signal would mean messing with yield and such wouldn't it?
> Unless implementing the class turns out to be very difficult, I'd kinda like
> to avoid that

sigh well as the PowerDecoder class is based around Signal we are kinda stuck with yield and it is a pain.

i will attempt a pre and post process tomorrow which basically duplicates what you did to get src1 2 etc out of the opcode.

once i have the regnums as integers i believe the class Int as defined, which passes Int(value,len) around between ooerations rather than just value would do the trick

Comment 12 Luke Kenneth Casson Leighton 2020-04-01 16:35:42 BST

(In reply to Luke Kenneth Casson Leighton from comment #11)

> i will attempt a pre and post process tomorrow which basically duplicates
> what you did to get src1 2 etc out of the opcode.

done - it works (although the code's a dreadful mess)

if you'd like to do that Int class i can use it immediately.
or, if you want to give it a shot, knock yourself out.

        for i in range(32):
            self[i] = [0] * 64 # TODO: needs to be IntClass(value=0, len=64)

the class needs to be something like:

class IntClass
    def __init__(self, value, len):
        self.value = value
        self.len = len

then __add__ etc. all have to return an object which first asserts that
the len is the same for both operands.

the one thing, assign (now in soc/decoder/pseudo/parser.py) will, instead
of returning ast.Assign(name, expr) will have to return something like
this:

ast.Call(func=ast.Attribute(value=ast.Name(id='x'), attr='eq'),
                args=[Num(n=5)],
                keywords=[],
                starargs=None,
                kwargs=None))])

i got that from a little test program, parsing this:

x.eq(5)

so when we see "x <- 5"...

i'll add it to the parser, commented-out.

Comment 13 Michael Nolan 2020-04-01 19:40:43 BST

I've got something in decoder/selectable_int.py. It uses Power's bit ordering so it should be good for the compiler but something to be aware of if it's used anywhere else.

I'm not sure what to do about left shift and right shift. Should SelectableInt (0x10, bits=8) << 4 equal SelectableInt(0x100, bits=12) or SelectableInt(0x00, bits=8)

It's also unsigned only, I'm not sure how much that matters right now

Comment 14 Jacob Lifshay 2020-04-01 20:06:54 BST

(In reply to Michael Nolan from comment #13)
> I've got something in decoder/selectable_int.py. It uses Power's bit
> ordering so it should be good for the compiler but something to be aware of
> if it's used anywhere else.
> 
> I'm not sure what to do about left shift and right shift. Should
> SelectableInt (0x10, bits=8) << 4 equal SelectableInt(0x100, bits=12) or
> SelectableInt(0x00, bits=8)

I would assume shift left is still shift left. You can check which pseudo-code gets run by the slw instruction since it's what gcc produces for the << operator.

Comment 15 Luke Kenneth Casson Leighton 2020-04-01 22:55:06 BST

(In reply to Michael Nolan from comment #13)
> I've got something in decoder/selectable_int.py. It uses Power's bit
> ordering so it should be good for the compiler but something to be aware of
> if it's used anywhere else.

ahh superb.  ok.

> It's also unsigned only, I'm not sure how much that matters right now

as jacob pointed out, we can find out soon enough.

i do feel that we maay need to put carry-in and carry-out into the
Int class (somehow).  also, perhaps rewrite the pseudo-code fragments
to match those, then propose them back to the OpenPOWER Foundation.

just a thought.

Comment 16 Michael Nolan 2020-04-02 00:05:46 BST

(In reply to Luke Kenneth Casson Leighton from comment #15)

> i do feel that we maay need to put carry-in and carry-out into the
> Int class (somehow).  also, perhaps rewrite the pseudo-code fragments
> to match those, then propose them back to the OpenPOWER Foundation.

Carry-in is addressed in the pseudocode, for instance:
adde:

RT <- (RA) + (RB) + CA

However, yes we'll probably need to add carry-out handling to the SelectableInt class at some point

Comment 17 Luke Kenneth Casson Leighton 2020-04-02 00:44:54 BST

(In reply to Michael Nolan from comment #16)

> Carry-in is addressed in the pseudocode, for instance:
> adde:
> 
> RT <- (RA) + (RB) + CA

oh duh :)

> However, yes we'll probably need to add carry-out handling to the
> SelectableInt class at some point

carry out, yes.

however think ahead to PartitionedSignal.  it becomes more like the unit tests logic, with multiple incoming carry bits and multiple outgoing carry bits.

this is where it becomes both interesting and also hair-raising to implement SV.

not least: the regfile gets typecast to a union of arrays of byte, hword, word and dword.  the "register" RA etc then simply become a pointer to the 64 bit (8 byte) offset into the SRAM of the regfile memory.

but... let's leave that for later :)

Comment 18 Luke Kenneth Casson Leighton 2020-04-02 11:41:23 BST

alriiight, i added it in: just... used it, and it worked!
i did however have to modify EXTZ64 and EXTS64 to get them
to recognise SelectableInt, and, interestingly, was able to
"drop in" the fact that the sign-zero-extended int now has
a bit-width of 64.

Comment 19 Luke Kenneth Casson Leighton 2020-04-05 19:37:52 BST

hiya michael, saw the commit (reinstated addi) - couple of things:

* generally, a blank __init__.py is kiiinda the normal convention?
  (there's a few exceptions)

* i believe you may have created a circular import dependency loop

* multiple ISACaller base classes have been added.  this is a pain
  to workaround.

imports in python work hierarchically.  so, import x.y.z will
look for x, then x.y and finally x.y.z

because class ISA(fixedarith, fixedload) is in soc.decoder.isa
and because fixedarith is derived from ISACaller
and because ISACaller is in soc.decoder.isa.caller i think
that goes circular.

but the multiple ISACaller bases is slightly more problematic.

how about instead either:

* creating a class ISA which has fixedarith and fixedload (etc)
  as *member* variables, which, in the constructor, merge the
  instr dictionaries into one?

* removing ISACaller as a base then making it a base of ISA:
  class ISA(ISACaller, fixedarith, fixedload)

* changing pywriter.py to output dicts named fixedarith_instr (etc.)
  which can be merged / accessed separately

oh - i think you might have fixed it, just saw the latest push :)