analyse how to use mapreduce mode to do dotproduct

reduce on FMA might do it because mapreduce is insane on 3-op instructions, they're normally excluded. therefore we might as well change the meaning of fma to *be* dotproduct.

alexandre says that fma might actually make sense to have 3-op, because of matrix multiply

jacob notes: fma reduction would be a polynomial reduction but it would be a Bad Idea (tm) to implement in hardware

fma can be used in reduce mode for dot-product but matrix multiply amounts to multiple parallel dot-products, so you woulnd't want non-reduce fma for that what I can't quite picture as useful (which definitely is no authoritative) is reduce on the multiply, rather than on the add.

fma reducing on the multiply can be used for the same case that one would use a reduce multiply followed by a scalar add, so it is likely useful too

reminder of what dotproduct is: result = 0 for i in range(x): result += a[i] * b[i] which makes sense for fma as long as RC starts out as zero.

(In reply to Alexandre Oliva from comment #4) > fma can be used in reduce mode for dot-product > > but matrix multiply amounts to multiple parallel dot-products, so you > woulnd't want non-reduce fma for that in my mind it would make sense simply to do an outer for-loop on a reduce-fma > what I can't quite picture as useful (which definitely is no authoritative) > is > reduce on the multiply, rather than on the add. can you write that out in pseudocode?

(In reply to Luke Kenneth Casson Leighton from comment #3) > jacob notes: fma reduction would be a polynomial reduction but it would be a > Bad Idea (tm) to implement in hardware *could* be a polynomial reduction: v = a v = x * v + b v = x * v + c v = x * v + d produces: v == d + x * c + x^2 * b + x^3 * a having fma reduction be a dot product is also valid, easier to implement in hardware, and more useful: v = a v = b * c + v v = d * e + v v = f * g + v v == a + dot(<b, d, f>, <c, e, g>)

(In reply to Jacob Lifshay from comment #8) > (In reply to Luke Kenneth Casson Leighton from comment #3) > > jacob notes: fma reduction would be a polynomial reduction but it would be a > > Bad Idea (tm) to implement in hardware > > *could* be a polynomial reduction: > v = a > v = x * v + b > v = x * v + c > v = x * v + d > > produces: > v == d + x * c + x^2 * b + x^3 * a once a b and c are factored out, yes. above is more (with substitution) d + (v * (c + (v * (b + (v * a))))) which i think may be doable with some overlapping fmas (no reduce required) the polynomial version: i love it. it's so cool that i think we should give it a shot. interestingly it may be possible to detect from the src/dest scalar/vector marking. this one is dest=v (needed in case of intermediaries) src1=s src2=s src3=v and also, note, RT == RB > > having fma reduction be a dot product is also valid, easier to implement in > hardware, well we are waay past the point where stuff is "easy" :) we are long into FSMs and micro-coding. > and more useful: > v = a > v = b * c + v > v = d * e + v > v = f * g + v > > v == a + dot(<b, d, f>, <c, e, g>) this one is dest=v (needed for intermediary results) src1=v src2=v src3=s and note, RT == RC