In order to make it more likely for our project to be understandable and useful, documentation of designs, code, processes, and other relevant things is necessary. ISA Standard creation and submission covered by bug #952
konstantinos i am assigning this bugreport to you as a reminder to add an ed25519 sub-bug, and also to discuss who is going to add/document a "bigint long-multiply REMAP Schedule" that i need to sketch an outline for, as well. as jacob has done a Prefix-Sum REMAP a few months back he can guide on doing it.
(In reply to Luke Kenneth Casson Leighton from comment #1) > konstantinos i am assigning this bugreport to you as a reminder to > add an ed25519 sub-bug, and also to discuss who is going to > add/document a "bigint long-multiply REMAP Schedule" that i > need to sketch an outline for, as well. as jacob has done > a Prefix-Sum REMAP a few months back he can guide on doing it. unfortunately, because a long-multiply needs 2 kinds of insns (carrying-wide-madd and carrying-add), you can't easily do that as a REMAP schedule. Additionally, it is substantially faster to use Karatsuba multiplication once you get inputs more than a few hundred bits wide (and other more complex algorithms for wider multiplies).
(In reply to Jacob Lifshay from comment #2) > unfortunately, because a long-multiply needs 2 kinds of insns > (carrying-wide-madd and carrying-add), you can't easily do that as a REMAP > schedule. Additionally, it is substantially faster to use Karatsuba > multiplication once you get inputs more than a few hundred bits wide (and > other more complex algorithms for wider multiplies). I would pick the simplest and fastest to implement long-multiply method for this one, speed is not a requirement. We can always optimize later.
(In reply to Konstantinos Margaritis (markos) from comment #3) > (In reply to Jacob Lifshay from comment #2) > > unfortunately, because a long-multiply needs 2 kinds of insns > > (carrying-wide-madd and carrying-add), you can't easily do that as a REMAP > > schedule. Additionally, it is substantially faster to use Karatsuba > > multiplication once you get inputs more than a few hundred bits wide (and > > other more complex algorithms for wider multiplies). > > I would pick the simplest and fastest to implement long-multiply method for > this one, speed is not a requirement. We can always optimize later. yes, except that the stuff that's going into the PowerISA spec. needs to actually be as fast as we can make it since it's for forever, not just for the crypto-router. imho doing REMAP for just O(n^2) multiply is fine (except for the complexity due to multiple different insns), since Karatsuba multiplication can just run those insns a bunch of times.
(In reply to Konstantinos Margaritis (markos) from comment #3) > I would pick the simplest and fastest to implement long-multiply method for > this one, speed is not a requirement. We can always optimize later. the top priority for the embedded application which is commercially confidential is to fit within 1 to 2 L1 cache lines. that is *real* tight. optimisation for "speed" is very low priority indeed. Knuth Algorithms D and M are perfectly fine and Jacob and I already did the conversion when doing the madd dsld and divmod instructions. but for ed25519 a totally different approach is needed because they did carry-save. please read the edited comment on that, raise the bugreports so i can properly fill them in.