Write basic SV implementations, run on the simulator.
x86 reference numbers for SSE:
Lauri let me know, what the minimum operations you need in the simulator
are. if it is just for example FP add, FP mul, FP sub, FP LOAD and FP STORE
i can likely get that done quite quickly.
(Also bear in mind, the basic SV implementation, due to the way that
Vector ISAs work, is almost 100% likely to be the optimised version: the
focus is more on the "basic" version informing us at a very early stage
what *instructions* are needed, more than "how optimised the actual
You can look at the asm as soon as we get the repo going, would that work?
Yes, the optimized version includes new instructions as the bug says. If it turns out no new instructions are useful, the tests and measurements to verify that cover the optimized bug.
(clarification: of the C baseline, I haven't yet started the SV version)
(In reply to Luke Kenneth Casson Leighton from comment #2)
> are. if it is just for example FP add, FP mul, FP sub, FP LOAD and FP STORE
> i can likely get that done quite quickly
the preliminary ops are now in the simulator, and should be functional as long as exceptions, CR1, and use of FPSCR are avoided for now.
if CR1 or other status bits are needed do let me know.
apply_window working well
discussion of imdct36
I have not received several mails in that thread. Possibly others missing too. Not in spam.
On the questions such as "for temporaries t0 t1 etc were
you planning to drop those into stack, with an offset 16, then
*reread* them back with an offset of 4?" or the last mail's five loop, I can't answer those until I have time to look at the thing properly.
(In reply to cand from comment #8)
> I have not received several mails in that thread. Possibly others missing
> too. Not in spam.
very weird, they're going out. i checked server logs.
(In reply to cand from comment #9)
> On the questions such as "for temporaries t0 t1 etc were
> you planning to drop those into stack, with an offset 16, then
> *reread* them back with an offset of 4?" or the last mail's five loop, I
> can't answer those until I have time to look at the thing properly.
well i spent several days staring at imdct36, i am confident now that no more needs to be added to get the "basic" version running. (please ignore the FFT work i am currently doing, that is for the optimised version)
* sv.add/mrr/m=r30 you can use for adding in *reverse* order, setting r30=0b010101
this will cover the for-loop for (i=17; i >= ... i-=2)
the first big procesing loop, for i 0 to 1, VL can be set to 2
the second, VL can be set to *five* but the 2nd of each group of operations, a predicate mask can be set to 0b01111.
my feeling is, it may be best to actually morph the c code a little, to make it clearer.
if you can add a small test Makefile plus main.c like for the mp3_0 test, reading and comparing data, i am happy to do some code-morphs that remove the very last block (out of the forloop 0..3) and increase the loop to 0..4 with conditional code.
> if you can add a small test Makefile plus main.c like for the mp3_0 test, reading and comparing data
That already exists?
On the mails, everything from Monday onwards was delivered correctly.
(In reply to cand from comment #11)
> > if you can add a small test Makefile plus main.c like for the mp3_0 test, reading and comparing data
> That already exists?
for the c code, no.
find . -name "Makefile"
media/Makefile does not contain any reference to apply_window_standalone.c
or to imdct36_standalone.c
lkcl@fizzy:~/src/libresoc/openpower-isa/media$ ls data/audio/mp3/mp3_0_data/
a.out buf3000 buf8000 samples1000 samples6000
apply_window_standalone.c buf4000 buf9000 samples2000 samples7000
buf0 buf5000 main.c samples3000 samples8000
buf1000 buf6000 out_samples0 samples4000 samples9000
buf2000 buf7000 samples0 samples5000 win0
lkcl@fizzy:~/src/libresoc/openpower-isa/media$ ls data/audio/mp3/mp3_1_data/
beforeout1 beforeout5 buf20 in14 main.c out3 win19
beforeout2 buf14 imdct36_standalone.c in3 out18 win12 win8
ah! found main.c in there. excellent ok.
i can work with that.
what i propose is, to copy imdct36_standalone.c and main.c into
and commit them
then adjust them to work relative to the media/Makefile directory
then, make a *second* copy which is "a little bit more vector-like"
They weren't intended for running the media tests, just as a sanity check the data matches on x86, etc. So I wouldn't bind them into the main makefile tests at least. If you want to store modified versions, maybe with their own makefile target, that's fine.
(In reply to cand from comment #14)
> They weren't intended for running the media tests, just as a sanity check
> the data matches on x86, etc.
yehyeh. except, the idea for incrementing the loop size from 4 to 5
and excluding the bits that aren't needed with a predicate 0b01111,
that idea needs checking.
i mean the section at the end
s0 = tmp;
s1 = MULH3(tmp, icos36h, 2);
all of that can go inside the loop with for j = 0 j < **5**...
but its partner part can't (icos36[8-j) and s2 + s3), that would over-run
the array, so has to be predicated, hence 0b01111.
> So I wouldn't bind them into the main makefile
> tests at least. If you want to store modified versions, maybe with their own
> makefile target, that's fine.
ok good call.
* pretend-predicate 0b01111 added
* loop count increased to 5
* MULH3 on icos36h works out to be exactly as
the hard-coded constants 16, 17, 9+4, 8-4 etc.
* hard-coded block now removed.
interestingly it wasn't as straightforward as i first imagined,
but retrospectively it's logical: s1 and s2 are initially created
from t1 and t0, the name-changes of the variables in the last
(now deleted) block confused me.
the predicate mask can be hard-coded to 0b01111 with an immediate,
which results in t1, s2 and t3 being set to zero on that last block.
i'll add some comments.