Creation of demonstrations of cryptographic and other relevant algorithms using our new and/or modified instructions to attempt to demonstrate benefits of our additions and/or modifications.
https://hackage.haskell.org/package/ed25519-donna-0.1.1/src/upstream-c/curve25519-donna-64bit.h the next one is ed25519, and it looks really straightforward https://hackage.haskell.org/package/ed25519-donna-0.1.1/src/upstream-c/modm-donna-64bit.h barret reduce a little more involved header: https://hackage.haskell.org/package/ed25519-donna-0.1.1/src/upstream-c/ed25519-donna-portable.h relevant sections: typedef struct uint128_t { uint64_t lo, hi; } uint128_t; #define mul64x64_128(out,a,b) out.lo = _umul128(a,b,&out.hi); #define shr128_pair(out,hi,lo,shift) out = __shiftright128(lo, hi, shift); #define shl128_pair(out,hi,lo,shift) out = __shiftleft128(lo, hi, shift); #define shr128(out,in,shift) shr128_pair(out, in.hi, in.lo, shift) #define shl128(out,in,shift) shl128_pair(out, in.hi, in.lo, shift) #define add128(a,b) { uint64_t p = a.lo; a.lo += b.lo; a.hi += b.hi + (a.lo < p); } #define add128_64(a,b) { uint64_t p = a.lo; a.lo += b; a.hi += (a.lo < p); } #define lo128(a) (a.lo) #define hi128(a) (a.hi)
https://www.bunniestudios.com/blog/?p=6140
curve25519_mul, edited on the muls to show the obvious patterns, shifts are also similarly regular patterns, an INDEXED REMAP with Triangular Numbers on a sv.mulld, this first muladd part should be Horizontal-Firzt about 13 instructions, the 2nd part (shifting) may need VFirst, but you MIGHT find that the new 3-in 2-out shift instructions do the job, as they are sort-of a 128-in 128-out shift. t[0] = r0 * s0 t[1] = r0 * s1 + r1 * s0; t[2] = r0 * s2 + r1 * s1 + r2 * s0; t[3] = r0 * s3 + r1 * s2 + r2 * s1 + r3 * s0; t[4] = r0 * s4 + r1 * s3 + r2 * s2 + r3 * s1 + r4 * s0; r1 *= 19; r2 *= 19; r3 *= 19; r4 *= 19; t[0] += r4 * s1 + r3 * s2 + r2 * s3 + r1 * s4; t[1] += r4 * s2 + r3 * s3 + r2 * s4; t[2] += r4 * s3 + r3 * s4; t[3] += r4 * s4; r0 = lo128(t[0]) & reduce_mask_51; shr128(c, t[0], 51); add128_64(t[1], c) r1 = lo128(t[1]) & reduce_mask_51; shr128(c, t[1], 51); add128_64(t[2], c) r2 = lo128(t[2]) & reduce_mask_51; shr128(c, t[2], 51); add128_64(t[3], c) r3 = lo128(t[3]) & reduce_mask_51; shr128(c, t[3], 51); add128_64(t[4], c) r4 = lo128(t[4]) & reduce_mask_51; shr128(c, t[4], 51); r0 += c * 19; c = r0 >> 51; r0 = r0 & reduce_mask_51; r1 += c; => for i in range(5): t[i] = 0 for i in range(5): for j in range(i+1..0): t[i] += r[i] * s[i] for i in range(1..4): r[i] *= 19 for i in range(4..0): for j in range(4..i+1): t[i] += r[i] * s[i] # this is the one where i *think* it possible to do some sort # of single-operation similar to dsld. c = 0 for i in range(5): add128_64(t[i], c); r[i] = lo128(t[i]) & reduce_mask_51; shr128(c, t[i], 51); r[0] += c * 19; c = r[0] >> 51; r[0] = r[0] & reduce_mask_51; r[1] += c;
https://libre-soc.org/irclog/%23libre-soc.2023-09-03.log.html#t2023-09-03T19:50:26 konstantinos i'm assigning this to you as a reminder to create the sub-bug for ed25519 and also the bignum long-multiply REMAP Schedule, discussed briefly here: http://lists.libre-soc.org/pipermail/libre-soc-dev/2023-September/005626.html bug #776 is where you can get budget for documentation of each of those (ed25519 similar page to chacha20's docpage, REMAP doc-budget needs to go to "the SVP64 specification"), and bug #840 is where you can get budget for writing unit tests for ed25519 and also the long-mul REMAPper.
markos, you were running into the issue that ed25519 needs too many registers, if you use all 64 bits of each register and use the bigint instructions you can probably squeeze into fewer registers, since the ed25519 code referenced in comment #0 uses only 51 bits (edit: corrected) out of each 64-bit register.
(In reply to Jacob Lifshay from comment #5) > markos, you were running into the issue that ed25519 needs too many > registers, if you use all 64 bits of each register and use the bigint > instructions you can probably squeeze into fewer registers, since the > ed25519 code referenced in comment #0 uses only 51 bits (edit: corrected) > out of each 64-bit register. child subtasks needed! jacob can you raise them so that markos is not distracted? dsrd should work extremely well to "unpack" the 51-bits, using Vertical-First. 2 input regs are a 128-bit "FIFO" in essence whilst the output is "the current constant plus the backend remainder of the FIFO". bitsleft = 128 r1, r2, j = input[0], input[1], 2 loop i: output[i], r2 = dsrd(r1, r2, 51) bitsleft -= 51 if bitsleft <= 64: r2 |= input[j] j += 1 something like that although it relies on Vertical-First mode which the entire ed25519 algorithm would have to be based on.