Bug 1085 - parallel prefix sum polynomial interpolation
Summary: parallel prefix sum polynomial interpolation
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Documentation (show other bugs)
Version: unspecified
Hardware: Other Linux
: High enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 953 1019
  Show dependency treegraph
 
Reported: 2023-05-19 09:47 BST by Luke Kenneth Casson Leighton
Modified: 2023-10-17 16:56 BST (History)
2 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-05-19 09:47:04 BST
a recipe using prefix sum REMAP is needed that does polynomial
interpolation of non-linear equations, taylor series etc.
Comment 1 Luke Kenneth Casson Leighton 2023-05-19 09:48:28 BST
https://github.com/jiahao/parallel-prefix
Comment 2 Luke Kenneth Casson Leighton 2023-05-19 12:18:34 BST
<markos_> I've done a similar method in the past
<programmerjake> idk, but parallel prefix sum and parallel tree reduction can be used for polynomial evaluation:
<programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x), op=mul)), op=add) ==
<programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4...
<programmerjake> though it tends to have lower precision than using fma because of rounding before adding
Comment 3 Luke Kenneth Casson Leighton 2023-05-19 13:59:44 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)
> <markos_> I've done a similar method in the past
> <programmerjake> idk, but parallel prefix sum and parallel tree reduction
> can be used for polynomial evaluation:
> <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x),
> op=mul)), op=add) ==
> <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4...
> <programmerjake> though it tends to have lower precision than using fma
> because of rounding before adding

ok writing this out myself so that my brain absorbs it

* start with:
    [1, x, x, x, x...]
* perform prefix sum multiply (parallel is fine)
    [1, x, x^2, x^3, ....]
* take vector of c
    [c0, c1, c2, c3, c4...]
* straight multiply:
    [c0, c1*x, c2*x^2, ....]
* perform reduction
  c0 + c1*x + c2*x^2 ...

ok that is dead easy.

so now what about parallel reduction using fmac?  let us first try
mapreduce on fmac:
   
    sv.madd/mr res, *c, *xv, res

that looks perfectly fine, and can get accuracy from
fmac.

now parallel reduction, res would have to be a vector,
but i don't believe it works, because *c (or *xv) would
end up getting multiplied in more than once. unless done
as Vertical-First.
Comment 4 Jacob Lifshay 2023-05-19 17:58:38 BST
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Luke Kenneth Casson Leighton from comment #2)
> > <markos_> I've done a similar method in the past
> > <programmerjake> idk, but parallel prefix sum and parallel tree reduction
> > can be used for polynomial evaluation:
> > <programmerjake> reduce(element_wise_mul(c, prefix_sum([1] + splat(x),
> > op=mul)), op=add) ==
> > <programmerjake> c[0]+c[1]*x+c[2]*x^2+c[3]*x^3+c[4]*x^4...
> > <programmerjake> though it tends to have lower precision than using fma
> > because of rounding before adding
> 
> ok writing this out myself so that my brain absorbs it
> 
> * start with:
>     [1, x, x, x, x...]
> * perform prefix sum multiply (parallel is fine)
>     [1, x, x^2, x^3, ....]
> * take vector of c
>     [c0, c1, c2, c3, c4...]
> * straight multiply:
>     [c0, c1*x, c2*x^2, ....]
> * perform reduction
>   c0 + c1*x + c2*x^2 ...

parallel reduction is fine here too -- reduce using fadd, though as noted it gets less accuracy than using fma (because fma doesn't round between the mul and add and because higher powers of x tend to give monomials (includes coefficients) that are smaller, so adding a smaller approximate value to a larger exact value (the coefficient) gives a value with higher relative precision). (edit: add forgotten parenthesis)
Comment 5 Luke Kenneth Casson Leighton 2023-05-19 18:06:54 BST
(In reply to Jacob Lifshay from comment #4)

> parallel reduction is fine here too -- reduce using fadd, though as noted it
> gets less accuracy than using fma (because fma doesn't round between the mul
> and add and because higher powers of x tend to give monomials (includes
> coefficients) that are smaller, so adding a smaller approximate value to a
> larger exact value (the coefficient) gives a value with higher relative
> precision).

sounds to me like /mrr (reverse-gear) would be sensible to use,
at the end where the tinier values would accumulate with better
accuracy.

exactly the kind of thing to put in an app note.