Create a suite of REMAP modes suitable for triangular, rhombus, and modulo patterns: xxxx xxxx 1234 xxx xxxx 4123 xx xxxx 3412 x xxxx 2341 --- * TODO: note the inner loop notification used by "svstep." in Vertical-First https://bugs.libre-soc.org/show_bug.cgi?id=1044#c16 * TODO: design spec up the three REMAP modes which involves a new svshape3 * TODO: write python "yield" versions (see dct fft and other REMAP modes) with mini demos / unit tests * TODO: confirm that the spec changes are ok (see comment #5) * TODO: (IN A BRANCH FOR REVIEW) make the spec changes * TODO: (in same branch): using the yielder(s) add actual REMAP unit tests * TODO: review branch * TODO: write PR/blog about bigmul (budget under another bug, TODO link) * TODO: commit and close Basic algorithm for rhombus: void mul( uint64_t *product, const uint64_t *a, const uint64_t *b, size_t a_sz, size_t b_sz ) { for(size_t bi = 0; bi < a_sz; bi++) { product[bi] = 0; } for(size_t ai = 0; ai < a_sz; ai++) { uint64_t carry = 0; for(size_t bi = 0; bi < a_sz; bi++) { uint128_t v = (uint128_t)a[ai] * (uint128_t)b[bi]; v += (uint128_t)carry; v += (uint128_t)product[ai + bi]; carry = v >> 64; product[ai + bi] = (uint64_t)v; } product[ai + b_sz] = carry; } }
(In reply to Jacob Lifshay from comment #0) > Create a REMAP mode > for(size_t ai = 0; ai < a_sz; ai++) { > uint64_t carry = 0; > for(size_t bi = 0; bi < a_sz; bi++) { > uint128_t v = (uint128_t)a[ai] * (uint128_t)b[bi]; the basics for the bigmulremap fn become: for i in range(asz): for j in range(bsz): yield i, j, i+j however additional modes then include options to do yield i, j, ((i+j) % bsz) plus "reversing" options (horiz, vert, h&v) and perhaps 2D mirroring as well. that's 5 bits of mode which is quite enough. i know that konstantinos may have something to add about the motion-estimate algorithm he did, i know there is a similar pattern but it is... i//2, j, i//2+j or something like that. this may be a bit much though. have to be very careful as there is not much space left in svshape.
space in SHAPE SPRs https://libre-soc.org/openpower/sv/remap/ |0:5 |6:11 | 12:17 | 18:20 | 21:23 |24:27 |28:29 |30:31| Mode | |----- |----- | ------- | ------- | ------ |------|------ |---- | ----- | |xdimsz|ydimsz| zdimsz | permute | invxyz |offset|skip |mode |Matrix | |xdimsz|ydimsz|SVGPR | 11/ |sk1/invxy|offset|elwidth|0b00 |Indexed| |xdimsz|mode | zdimsz | submode2| invxyz |offset|submode|0b01 |DCT/FFT| | rsvd |rsvd |xdimsz | rsvd | invxyz |offset|submode|0b10 |Red/Sum| | | | | | | | |0b11 |rsvd | => | rsvd |rsvd |xdimsz | 00 | invxyz |offset|submode|0b10 |Red/Sum| | rsvd |ydimsz|xdimsz | 01 | invxy 0 |offset|mirror |0b10 |Bigmul| or | rsvd |rsvd |xdimsz | rsvd | invxy 0 |offset|submode|0b10 |Red/Sum| | rsvd |ydimsz|xdimsz | rsvd | invxy 1 |offset|submode|0b10 |Bigmul | have to check z of invxyz is free
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e6612065f290d840e15a596dfc2177de5e5;hb=f746c31efca189fb09aec2e661dba686616a846f#l320 unused (not set). i did tell you repeatedly to enable inversion of both x and y in parallel-reduce REMAP. z is free so can be used to shoe-horn bigmul in it. next step, the instruction format
svshape SVxd,SVyd,SVzd,SVRM,vf 0:5 6:10 11:15 16:20 21:24 25 26:31 name PO SVxd SVyd SVzd SVRM vf XO svshape Fields: SVxd - SV REMAP "xdim" (X-dimension) SVyd - SV REMAP "ydim" (Y-dimension, sometimes used for sub-mode selection) SVzd - SV REMAP "zdim" (Z-dimension) SVRM - SV REMAP Mode (0b00000 for Matrix, 0b00001 for FFT etc.) ok so SVyd is the de-facto selector for sub-modes: SVyd "bit 0" can be used for selecting bigmul/prefix, under SVRM=0b0111, whereupon bits 1-2 can be used for inversion and 2-3 for mirror... damn damn damn 5 bits are needed: one for long-mul, one for carrysave ah hang on i know: "/mrr" performs a full reverse, which in combination with inverting either x or y order *back* again (single bit) you get the full complement. i think. so: * bit 0 select bigmul or Prefx * bit 1 select carrysave or longmul * bit 2 select invert x (or y?) * bit 3 mirror x (5 45 345 2345 12345 012345 instead of 0 01 012 ...) * bit 4 mirror y (ditto)
we will need an svshape3. damn. https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-September/005711.html
okaaay, there are some necessary steps here, but there are *two* branches needed which is a royal pain in the ass (openpower-isa, spec). what i'm proposing - bear in mind this is still PO22 - is to move bincrflut and binlut to the "- 10 00 |0" row, leaving their current locations free for svshape3 and 4. that way it is possible to do a svshape3, bringing in more space for REMAP options. which will be identical to svshape. | NN | | | | | - 10 00 |0 | rsvd | rsvd | | NN | | | | | - 11 00 |0 | rsvd | rsvd | | NN | RT | RA | RB | RC | nh 00 00 |1 | binlut | VA-Form | | NN | RT | RA | RB | /BFA/ | 0 01 00 |1 | bincrflut | VA-Form | | NN | | | | | 1 01 00 |1 | svindex | SVI-Form | | NN | RT | RA | RB | mode | L 10 00 |1 | bmask | BM2-Form | | NN | | | | | 0 11 00 |1 | svshape | SVM-Form | | NN | | | | | 1 11 00 |1 | svremap | SVRM-Form | git diffs: * https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=261c0c4a * https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ac538421