Bug 965 - implement chacha20
Summary: implement chacha20
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 1007
  Show dependency treegraph
 
Reported: 2022-10-23 11:39 BST by Luke Kenneth Casson Leighton
Modified: 2023-09-24 09:48 BST (History)
1 user (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2022-10-23 11:39:31 BST
vector-efficient implementation of chacha20 is needed
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;hb=HEAD
Comment 1 Luke Kenneth Casson Leighton 2022-10-23 11:52:01 BST
getting very interesting.  svindex is successful in doing the inner rounds,
svstep for the round inner loop, CTR mode for the outer.

there is however an opportunity to reorder the access to elements such that
the parallelism originally intended for chacha20 by bernstein is possible,
and it involves 3D REMAP.

an attempt to only deploy 2D REMAP was not successful, due to the fact that
the Indices are set up to cover both round-groups (the straight group of
16 followed by the rotated group):

       CHACHA_QUARTER_ROUND(w[0], w[4], w[8], w[12]);
       CHACHA_QUARTER_ROUND(w[1], w[5], w[9], w[13]);
       CHACHA_QUARTER_ROUND(w[2], w[6], w[10], w[14]);
       CHACHA_QUARTER_ROUND(w[3], w[7], w[11], w[15]);
  
       CHACHA_QUARTER_ROUND(w[0], w[5], w[10], w[15]);
       CHACHA_QUARTER_ROUND(w[1], w[6], w[11], w[12]);
       CHACHA_QUARTER_ROUND(w[2], w[7], w[8], w[13]);
       CHACHA_QUARTER_ROUND(w[3], w[4], w[9], w[14]);

the ordering needed was initially believed to be 2D: cycling through
by row (y) before moving to the next column (x)

unfortunately it is necessary to stop half-way down the rows (y=0-3)
before moving on to the next column (x+=1), then after all columns
are done repeat the process with the 2nd group (y=4-7)

this is perfectly possible but requires a 3D version of svindex
and svshape2, which is too much.