Bug 776 - Documentation of designs, code, processes, and other relevant things as needed
Summary: Documentation of designs, code, processes, and other relevant things as needed
Status: CONFIRMED
Alias: None
Product: Libre-SOC's second ASIC
Classification: Unclassified
Component: source code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on: 809
Blocks: 589
  Show dependency treegraph
 
Reported: 2022-02-15 06:49 GMT by Jacob Lifshay
Modified: 2023-12-16 23:48 GMT (History)
6 users (show)

See Also:
NLnet milestone: NLnet.2021.02A.052.CryptoRouter
total budget (EUR) for completion of task and all subtasks: 8000
budget (EUR) for this task, excluding subtasks' budget: 4600
parent task for budget allocation: 589
child tasks for budget allocation: 968 1006 1158 1166
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2022-02-15 06:49:56 GMT
In order to make it more likely for our project to be understandable and useful,
documentation of designs, code, processes, and other relevant things is necessary.
ISA Standard creation and submission covered by bug #952
Comment 1 Luke Kenneth Casson Leighton 2023-09-05 04:53:08 BST
konstantinos i am assigning this bugreport to you as a reminder to
add an ed25519 sub-bug, and also to discuss who is going to
add/document a "bigint long-multiply REMAP Schedule" that i
need to sketch an outline for, as well. as jacob has done
a Prefix-Sum REMAP a few months back he can guide on doing it.
Comment 2 Jacob Lifshay 2023-09-05 05:00:50 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> konstantinos i am assigning this bugreport to you as a reminder to
> add an ed25519 sub-bug, and also to discuss who is going to
> add/document a "bigint long-multiply REMAP Schedule" that i
> need to sketch an outline for, as well. as jacob has done
> a Prefix-Sum REMAP a few months back he can guide on doing it.

unfortunately, because a long-multiply needs 2 kinds of insns (carrying-wide-madd and carrying-add), you can't easily do that as a REMAP schedule. Additionally, it is substantially faster to use Karatsuba multiplication once you get inputs more than a few hundred bits wide (and other more complex algorithms for wider multiplies).
Comment 3 Konstantinos Margaritis (markos) 2023-09-05 18:21:33 BST
(In reply to Jacob Lifshay from comment #2)
> unfortunately, because a long-multiply needs 2 kinds of insns
> (carrying-wide-madd and carrying-add), you can't easily do that as a REMAP
> schedule. Additionally, it is substantially faster to use Karatsuba
> multiplication once you get inputs more than a few hundred bits wide (and
> other more complex algorithms for wider multiplies).

I would pick the simplest and fastest to implement long-multiply method for this one, speed is not a requirement. We can always optimize later.
Comment 4 Jacob Lifshay 2023-09-05 18:50:47 BST
(In reply to Konstantinos Margaritis (markos) from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > unfortunately, because a long-multiply needs 2 kinds of insns
> > (carrying-wide-madd and carrying-add), you can't easily do that as a REMAP
> > schedule. Additionally, it is substantially faster to use Karatsuba
> > multiplication once you get inputs more than a few hundred bits wide (and
> > other more complex algorithms for wider multiplies).
> 
> I would pick the simplest and fastest to implement long-multiply method for
> this one, speed is not a requirement. We can always optimize later.

yes, except that the stuff that's going into the PowerISA spec. needs to actually be as fast as we can make it since it's for forever, not just for the crypto-router.

imho doing REMAP for just O(n^2) multiply is fine (except for the complexity due to multiple different insns), since Karatsuba multiplication can just run those insns a bunch of times.
Comment 5 Luke Kenneth Casson Leighton 2023-09-05 19:06:05 BST
(In reply to Konstantinos Margaritis (markos) from comment #3)

> I would pick the simplest and fastest to implement long-multiply method for
> this one, speed is not a requirement. We can always optimize later.

the top priority for the embedded application which is commercially
confidential is to fit within 1 to 2 L1 cache lines.

that is *real* tight.

optimisation for "speed" is very low priority indeed.

Knuth Algorithms D and M are perfectly fine and Jacob and I already
did the conversion when doing the madd dsld and divmod instructions.

but for ed25519 a totally different approach is needed because they
did carry-save.  please read the edited comment on that, raise the
bugreports so i can properly fill them in.