1085 – parallel prefix sum polynomial interpolation

Bug 1085 - parallel prefix sum polynomial interpolation

Summary: parallel prefix sum polynomial interpolation

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Documentation (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	High enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	953 1019
	Show dependency tree / graph

Reported:	2023-05-19 09:47 BST by Luke Kenneth Casson Leighton
Modified:	2023-10-17 16:56 BST (History)
CC List:	2 users (show)

See Also:
NLnet milestone:	---
total budget (EUR) for completion of task and all subtasks:	0
budget (EUR) for this task, excluding subtasks' budget:	0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2023-05-19 09:47:04 BST

a recipe using prefix sum REMAP is needed that does polynomial
interpolation of non-linear equations, taylor series etc.

Comment 1 Luke Kenneth Casson Leighton 2023-05-19 09:48:28 BST

https://github.com/jiahao/parallel-prefix

Comment 2 Luke Kenneth Casson Leighton 2023-05-19 12:18:34 BST

<markos_> I've done a similar method in the past
<programmerjake> idk, but parallel prefix sum and parallel tree reduction can be used for polynomial evaluation:
<programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x), op=mul)), op=add) ==
<programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4...
<programmerjake> though it tends to have lower precision than using fma because of rounding before adding

Comment 3 Luke Kenneth Casson Leighton 2023-05-19 13:59:44 BST

(In reply to Luke Kenneth Casson Leighton from comment #2)
> <markos_> I've done a similar method in the past
> <programmerjake> idk, but parallel prefix sum and parallel tree reduction
> can be used for polynomial evaluation:
> <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x),
> op=mul)), op=add) ==
> <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4...
> <programmerjake> though it tends to have lower precision than using fma
> because of rounding before adding

ok writing this out myself so that my brain absorbs it

* start with:
    [1, x, x, x, x...]
* perform prefix sum multiply (parallel is fine)
    [1, x, x^2, x^3, ....]
* take vector of c
    [c0, c1, c2, c3, c4...]
* straight multiply:
    [c0, c1*x, c2*x^2, ....]
* perform reduction
  c0 + c1*x + c2*x^2 ...

ok that is dead easy.

so now what about parallel reduction using fmac?  let us first try
mapreduce on fmac:
   
    sv.madd/mr res, *c, *xv, res

that looks perfectly fine, and can get accuracy from
fmac.

now parallel reduction, res would have to be a vector,
but i don't believe it works, because *c (or *xv) would
end up getting multiplied in more than once. unless done
as Vertical-First.

Comment 4 Jacob Lifshay 2023-05-19 17:58:38 BST

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Luke Kenneth Casson Leighton from comment #2)
> > <markos_> I've done a similar method in the past
> > <programmerjake> idk, but parallel prefix sum and parallel tree reduction
> > can be used for polynomial evaluation:
> > <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x),
> > op=mul)), op=add) ==
> > <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4...
> > <programmerjake> though it tends to have lower precision than using fma
> > because of rounding before adding
> 
> ok writing this out myself so that my brain absorbs it
> 
> * start with:
>     [1, x, x, x, x...]
> * perform prefix sum multiply (parallel is fine)
>     [1, x, x^2, x^3, ....]
> * take vector of c
>     [c0, c1, c2, c3, c4...]
> * straight multiply:
>     [c0, c1*x, c2*x^2, ....]
> * perform reduction
>   c0 + c1*x + c2*x^2 ...

parallel reduction is fine here too -- reduce using fadd, though as noted it gets less accuracy than using fma (because fma doesn't round between the mul and add and because higher powers of x tend to give monomials (includes coefficients) that are smaller, so adding a smaller approximate value to a larger exact value (the coefficient) gives a value with higher relative precision). (edit: add forgotten parenthesis)

Comment 5 Luke Kenneth Casson Leighton 2023-05-19 18:06:54 BST

(In reply to Jacob Lifshay from comment #4)

> parallel reduction is fine here too -- reduce using fadd, though as noted it
> gets less accuracy than using fma (because fma doesn't round between the mul
> and add and because higher powers of x tend to give monomials (includes
> coefficients) that are smaller, so adding a smaller approximate value to a
> larger exact value (the coefficient) gives a value with higher relative
> precision).

sounds to me like /mrr (reverse-gear) would be sensible to use,
at the end where the tinier values would accumulate with better
accuracy.

exactly the kind of thing to put in an app note.