Bug 1155 - O(n^2) multiplication REMAP mode(s)
Summary: O(n^2) multiplication REMAP mode(s)
Status: CONFIRMED
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Jacob Lifshay
URL: https://libre-soc.org/openpower/sv/re...
Depends on:
Blocks: 1044 1151 1157
  Show dependency treegraph
 
Reported: 2023-09-09 00:12 BST by Jacob Lifshay
Modified: 2023-10-13 14:29 BST (History)
4 users (show)

See Also:
NLnet milestone: NLnet.2021.02A.052.CryptoRouter
total budget (EUR) for completion of task and all subtasks: 1000
budget (EUR) for this task, excluding subtasks' budget: 1000
parent task for budget allocation: 773
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
lkcl=300 jacob=700


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2023-09-09 00:12:35 BST
Create a suite of REMAP modes suitable for triangular, rhombus, and modulo
patterns:

xxxx  xxxx    1234
 xxx   xxxx   4123
  xx    xxxx  3412
   x     xxxx 2341

---

* TODO: note the inner loop notification used by "svstep." in Vertical-First
  https://bugs.libre-soc.org/show_bug.cgi?id=1044#c16
* TODO: design spec up the three REMAP modes which involves a new svshape3
* TODO: write python "yield" versions (see dct fft and other REMAP modes)
        with mini demos / unit tests
* TODO: confirm that the spec changes are ok (see comment #5)
* TODO: (IN A BRANCH FOR REVIEW) make the spec changes
* TODO: (in same branch): using the yielder(s) add actual REMAP unit tests
* TODO: review branch
* TODO: write PR/blog about bigmul (budget under another bug, TODO link)
* TODO: commit and close

Basic algorithm for rhombus:

void mul(
    uint64_t *product,
    const uint64_t *a,
    const uint64_t *b,
    size_t a_sz,
    size_t b_sz
) {
    for(size_t bi = 0; bi < a_sz; bi++) {
        product[bi] = 0;
    }
    for(size_t ai = 0; ai < a_sz; ai++) {
        uint64_t carry = 0;
        for(size_t bi = 0; bi < a_sz; bi++) {
            uint128_t v = (uint128_t)a[ai] * (uint128_t)b[bi];
            v += (uint128_t)carry;
            v += (uint128_t)product[ai + bi];
            carry = v >> 64;
            product[ai + bi] = (uint64_t)v;
        }
        product[ai + b_sz] = carry;
    }
}
Comment 1 Luke Kenneth Casson Leighton 2023-09-09 00:28:57 BST
(In reply to Jacob Lifshay from comment #0)
> Create a REMAP mode

>     for(size_t ai = 0; ai < a_sz; ai++) {
>         uint64_t carry = 0;
>         for(size_t bi = 0; bi < a_sz; bi++) {
>             uint128_t v = (uint128_t)a[ai] * (uint128_t)b[bi];

the basics for the bigmulremap fn become:
   for i in range(asz):
      for j in range(bsz):
          yield i, j, i+j

however additional modes then include options to
do yield i, j, ((i+j) % bsz) plus "reversing" options
(horiz, vert, h&v) and perhaps 2D mirroring as well.
that's 5 bits of mode which is quite enough.

i know that konstantinos may have something to
add about the motion-estimate algorithm he did, i
know there is a similar pattern but it is...
i//2, j, i//2+j or something like that. this may be
a bit much though.

have to be very careful as there is not much space
left in svshape.
Comment 2 Luke Kenneth Casson Leighton 2023-09-09 12:21:09 BST
space in SHAPE SPRs

https://libre-soc.org/openpower/sv/remap/


|0:5   |6:11  | 12:17   | 18:20   | 21:23   |24:27 |28:29  |30:31| Mode  |
|----- |----- | ------- | ------- | ------  |------|------ |---- | ----- |
|xdimsz|ydimsz| zdimsz  | permute | invxyz  |offset|skip   |mode |Matrix |
|xdimsz|ydimsz|SVGPR    | 11/     |sk1/invxy|offset|elwidth|0b00 |Indexed|
|xdimsz|mode  | zdimsz  | submode2| invxyz  |offset|submode|0b01 |DCT/FFT|
| rsvd |rsvd  |xdimsz   | rsvd    | invxyz  |offset|submode|0b10 |Red/Sum|
|      |      |         |         |         |      |       |0b11 |rsvd   |

=>

| rsvd |rsvd  |xdimsz   | 00      | invxyz  |offset|submode|0b10 |Red/Sum|
| rsvd |ydimsz|xdimsz   | 01      | invxy 0 |offset|mirror |0b10 |Bigmul|

or


| rsvd |rsvd  |xdimsz   | rsvd    | invxy 0 |offset|submode|0b10 |Red/Sum|
| rsvd |ydimsz|xdimsz   | rsvd    | invxy 1 |offset|submode|0b10 |Bigmul |

have to check z of invxyz is free
Comment 3 Luke Kenneth Casson Leighton 2023-09-09 12:27:56 BST
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e6612065f290d840e15a596dfc2177de5e5;hb=f746c31efca189fb09aec2e661dba686616a846f#l320

unused (not set). i did tell you repeatedly to enable inversion of both
x and y in parallel-reduce REMAP.

z is free so can be used to shoe-horn bigmul in it.  next step, the
instruction format
Comment 4 Luke Kenneth Casson Leighton 2023-09-09 12:38:39 BST
svshape SVxd,SVyd,SVzd,SVRM,vf
0:5	6:10	11:15	16:20	21:24	25	26:31	name
PO	SVxd	SVyd	SVzd	SVRM	vf	XO	svshape

Fields:

SVxd - SV REMAP "xdim" (X-dimension)
SVyd - SV REMAP "ydim" (Y-dimension, sometimes used for sub-mode selection)
SVzd - SV REMAP "zdim" (Z-dimension)
SVRM - SV REMAP Mode (0b00000 for Matrix, 0b00001 for FFT etc.)

ok so SVyd is the de-facto selector for sub-modes: SVyd "bit 0" can
be used for selecting bigmul/prefix, under SVRM=0b0111, whereupon
bits 1-2 can be used for inversion and 2-3 for mirror...

damn damn damn 5 bits are needed: one for long-mul, one for carrysave

ah hang on i know: "/mrr" performs a full reverse, which in combination
with inverting either x or y order *back* again (single bit) you get
the full complement. i think.

so:

* bit 0 select bigmul or Prefx
* bit 1 select carrysave or longmul
* bit 2 select invert x (or y?)
* bit 3 mirror x (5 45 345 2345 12345 012345 instead of 0 01 012 ...)
* bit 4 mirror y (ditto)
Comment 5 Luke Kenneth Casson Leighton 2023-10-01 08:44:28 BST
we will need an svshape3. damn.
https://lists.libre-soc.org/pipermail/libre-soc-dev/2023-September/005711.html
Comment 6 Luke Kenneth Casson Leighton 2023-10-06 09:49:51 BST
okaaay, there are some necessary steps here, but there are *two* branches
needed which is a royal pain in the ass (openpower-isa, spec).

what i'm proposing - bear in mind this is still PO22 - is to move
bincrflut and binlut to the "- 10 00 |0" row, leaving their current
locations free for svshape3 and 4.

that way it is possible to do a svshape3, bringing in more space for
REMAP options. which will be identical to svshape.

| NN |    |     |      |         | -  10  00 |0 | rsvd     | rsvd  |
| NN |    |     |      |         | -  11  00 |0 | rsvd     | rsvd  |
| NN | RT | RA  | RB   | RC      | nh 00  00 |1 | binlut |  VA-Form      |
| NN | RT | RA  | RB   | /BFA/   | 0  01  00 |1 | bincrflut  | VA-Form     |
| NN |    |     |      |         | 1  01  00 |1 | svindex    | SVI-Form    |
| NN | RT | RA  | RB   | mode    | L  10  00 |1 | bmask    | BM2-Form     |
| NN |    |     |      |         | 0  11  00 |1 | svshape    | SVM-Form    |
| NN |    |     |      |         | 1  11  00 |1 | svremap   | SVRM-Form   |

git diffs:

* https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=261c0c4a
* https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ac538421