Bug 773 - High-Level Demos of Cryptographic and Other Relevant Algorithms
Summary: High-Level Demos of Cryptographic and Other Relevant Algorithms
Status: CONFIRMED
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: Other Linux
: High enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 1151 1153
Blocks: 589
  Show dependency treegraph
 
Reported: 2022-02-15 06:43 GMT by Jacob Lifshay
Modified: 2024-01-01 20:05 GMT (History)
5 users (show)

See Also:
NLnet milestone: NLnet.2021.02A.052.CryptoRouter
total budget (EUR) for completion of task and all subtasks: 5500
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation: 589
child tasks for budget allocation: 1007 1151 1153 1155 1157
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2022-02-15 06:43:39 GMT
Creation of demonstrations of cryptographic and other relevant
algorithms using our new and/or modified instructions to attempt
to demonstrate benefits of our additions and/or modifications.
Comment 1 Luke Kenneth Casson Leighton 2023-05-09 14:42:28 BST
https://hackage.haskell.org/package/ed25519-donna-0.1.1/src/upstream-c/curve25519-donna-64bit.h

the next one is ed25519, and it looks really straightforward

https://hackage.haskell.org/package/ed25519-donna-0.1.1/src/upstream-c/modm-donna-64bit.h

barret reduce a little more involved

header:

https://hackage.haskell.org/package/ed25519-donna-0.1.1/src/upstream-c/ed25519-donna-portable.h

relevant sections:

typedef struct uint128_t {
	uint64_t lo, hi;
} uint128_t;
#define mul64x64_128(out,a,b) out.lo = _umul128(a,b,&out.hi);
#define shr128_pair(out,hi,lo,shift) out = __shiftright128(lo, hi, shift);
#define shl128_pair(out,hi,lo,shift) out = __shiftleft128(lo, hi, shift);
#define shr128(out,in,shift) shr128_pair(out, in.hi, in.lo, shift)
#define shl128(out,in,shift) shl128_pair(out, in.hi, in.lo, shift)
#define add128(a,b) { uint64_t p = a.lo; a.lo += b.lo; a.hi += b.hi + (a.lo < p); }
#define add128_64(a,b) { uint64_t p = a.lo; a.lo += b; a.hi += (a.lo < p); }
#define lo128(a) (a.lo)
#define hi128(a) (a.hi)
Comment 2 Luke Kenneth Casson Leighton 2023-05-10 15:29:18 BST
https://www.bunniestudios.com/blog/?p=6140
Comment 3 Luke Kenneth Casson Leighton 2023-05-19 14:20:44 BST
curve25519_mul, edited on the muls to show the obvious patterns,
shifts are also similarly regular patterns, an INDEXED REMAP with
Triangular Numbers on a sv.mulld, this first muladd part should be
Horizontal-Firzt about 13 instructions, the 2nd part (shifting)
may need VFirst, but you MIGHT find that the new 3-in 2-out shift
instructions do the job, as they are sort-of a 128-in 128-out shift.

t[0]  =  r0 * s0
t[1]  =  r0 * s1 + r1 * s0;
t[2]  =  r0 * s2 + r1 * s1 + r2 * s0;
t[3]  =  r0 * s3 + r1 * s2 + r2 * s1 + r3 * s0;
t[4]  =  r0 * s4 + r1 * s3 + r2 * s2 + r3 * s1 + r4 * s0;

	r1 *= 19;
	r2 *= 19;
	r3 *= 19;
	r4 *= 19;

t[0] += r4 * s1 + r3 * s2 + r2 * s3 + r1 * s4;
t[1] += r4 * s2 + r3 * s3 + r2 * s4;
t[2] += r4 * s3 + r3 * s4;
t[3] += r4 * s4;

                     r0 = lo128(t[0]) & reduce_mask_51; shr128(c, t[0], 51);
add128_64(t[1], c)   r1 = lo128(t[1]) & reduce_mask_51; shr128(c, t[1], 51);
add128_64(t[2], c)   r2 = lo128(t[2]) & reduce_mask_51; shr128(c, t[2], 51);
add128_64(t[3], c)   r3 = lo128(t[3]) & reduce_mask_51; shr128(c, t[3], 51);
add128_64(t[4], c)   r4 = lo128(t[4]) & reduce_mask_51; shr128(c, t[4], 51);
r0 +=   c * 19; c = r0 >> 51; r0 = r0 & reduce_mask_51;
r1 +=   c;

=>

for i in range(5):
   t[i] = 0

for i in range(5):
   for j in range(i+1..0):
      t[i] += r[i] * s[i]

for i in range(1..4):
      r[i] *= 19

for i in range(4..0):
   for j in range(4..i+1):
      t[i] += r[i] * s[i]

# this is the one where i *think* it possible to do some sort
# of single-operation similar to dsld.

c = 0
for i in range(5):
    add128_64(t[i], c);
    r[i] = lo128(t[i]) & reduce_mask_51;
    shr128(c, t[i], 51);

r[0] +=   c * 19; c = r[0] >> 51; r[0] = r[0] & reduce_mask_51;
r[1] +=   c;
Comment 4 Luke Kenneth Casson Leighton 2023-09-05 05:00:34 BST
https://libre-soc.org/irclog/%23libre-soc.2023-09-03.log.html#t2023-09-03T19:50:26

konstantinos i'm assigning this to you as a reminder to create the
sub-bug for ed25519 and also the bignum long-multiply REMAP Schedule,
discussed briefly here:

http://lists.libre-soc.org/pipermail/libre-soc-dev/2023-September/005626.html

bug #776 is where you can get budget for documentation of
each of those (ed25519 similar page to chacha20's docpage,
REMAP doc-budget needs to go to "the SVP64 specification"), 

and bug #840 is where you can get budget for writing unit tests
for ed25519 and also the long-mul REMAPper.
Comment 5 Jacob Lifshay 2023-09-06 17:22:26 BST
markos, you were running into the issue that ed25519 needs too many registers, if you use all 64 bits of each register and use the bigint instructions you can probably squeeze into fewer registers, since the ed25519 code referenced in comment #0 uses only 51 bits (edit: corrected) out of each 64-bit register.
Comment 6 Luke Kenneth Casson Leighton 2023-09-06 18:17:36 BST
(In reply to Jacob Lifshay from comment #5)
> markos, you were running into the issue that ed25519 needs too many
> registers, if you use all 64 bits of each register and use the bigint
> instructions you can probably squeeze into fewer registers, since the
> ed25519 code referenced in comment #0 uses only 51 bits (edit: corrected)
> out of each 64-bit register.

child subtasks needed! jacob can you raise them so that markos is
not distracted?

dsrd should work extremely well to "unpack" the 51-bits, using
Vertical-First.  2 input regs are a 128-bit "FIFO" in essence
whilst the output is "the current constant plus the backend
remainder of the FIFO".

   bitsleft = 128
   r1, r2, j = input[0], input[1], 2
   loop i:
        output[i], r2 = dsrd(r1, r2, 51)
        bitsleft -= 51
        if bitsleft <= 64:
             r2 |= input[j]
             j += 1

something like that although it relies on Vertical-First mode
which the entire ed25519 algorithm would have to be based on.