965 – implement chacha20

Bug 965 - implement chacha20

Summary: implement chacha20

Status:	RESOLVED FIXED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	1007
	Show dependency tree / graph

Reported:	2022-10-23 11:39 BST by Luke Kenneth Casson Leighton
Modified:	2023-09-24 09:48 BST (History)
CC List:	1 user (show)

See Also:	770 969 1157
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2022-10-23 11:39:31 BST

vector-efficient implementation of chacha20 is needed
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_chacha20.py;hb=HEAD

Comment 1 Luke Kenneth Casson Leighton 2022-10-23 11:52:01 BST

getting very interesting.  svindex is successful in doing the inner rounds,
svstep for the round inner loop, CTR mode for the outer.

there is however an opportunity to reorder the access to elements such that
the parallelism originally intended for chacha20 by bernstein is possible,
and it involves 3D REMAP.

an attempt to only deploy 2D REMAP was not successful, due to the fact that
the Indices are set up to cover both round-groups (the straight group of
16 followed by the rotated group):

       CHACHA_QUARTER_ROUND(w[0], w[4], w[8], w[12]);
       CHACHA_QUARTER_ROUND(w[1], w[5], w[9], w[13]);
       CHACHA_QUARTER_ROUND(w[2], w[6], w[10], w[14]);
       CHACHA_QUARTER_ROUND(w[3], w[7], w[11], w[15]);
  
       CHACHA_QUARTER_ROUND(w[0], w[5], w[10], w[15]);
       CHACHA_QUARTER_ROUND(w[1], w[6], w[11], w[12]);
       CHACHA_QUARTER_ROUND(w[2], w[7], w[8], w[13]);
       CHACHA_QUARTER_ROUND(w[3], w[4], w[9], w[14]);

the ordering needed was initially believed to be 2D: cycling through
by row (y) before moving to the next column (x)

unfortunately it is necessary to stop half-way down the rows (y=0-3)
before moving on to the next column (x+=1), then after all columns
are done repeat the process with the 2nd group (y=4-7)

this is perfectly possible but requires a 3D version of svindex
and svshape2, which is too much.