From the task http://bugs.libre-riscv.org/show_bug.cgi?id=217 (In reply to Luke Kenneth Casson Leighton from comment #42) > (In reply to Jock Tanner from comment #41) > > I think most ALUs > > just negate and add instead of subtract. I think this would use a bit less > > logic, and negation in itself is also a useful operation. > > yes. in the "real" alu we have... mmm... not sure, actually. probably - > > do me a favour and raise a bugreport as a reminder to look up what > rocketchip does? Actually I was thinking about what Harris&Harris suggested in their famous book. You just have to have an inverter on one operand input of an adder. The two's complement of that operand can be achieved via the carry input.

(In reply to Jock Tanner from comment #0) > Actually I was thinking about what Harris&Harris suggested in their famous > book. You just have to have an inverter on one operand input of an adder. > The two's complement of that operand can be achieved via the carry input. yes, interestingly, michael and i are reviewing in the simulator, various implementations (microwatt, qemu, pear_pc), they all seem to do this http://bugs.libre-riscv.org/show_bug.cgi?id=186#c130 it looks like power was extremely well designed, and microwatt as well, to basically "translate" opcodes into "everything is an ADD, but you optionally do add1, add0, add-carry, occasionally invert op1, oh and farm out carry to different places, or set carry to 0, or 1" etc. etc. it's quite neat.

Another thing that is slightly bugging me here is lookahead carry units. Should we reconfigure them "on the fly" together with adders/subtractors? Is that viable? Multiplexers can introduce enough delay to make the whole looking-ahead thing inefficient comparing to simple serial carry propagation.

(In reply to Jock Tanner from comment #2) > Another thing that is slightly bugging me here is lookahead carry units. you'll need to send me some links so i can take a look at what you mean. > Should we reconfigure them "on the fly" together with adders/subtractors? Is > that viable? Multiplexers can introduce enough delay to make the whole > looking-ahead thing inefficient comparing to simple serial carry propagation. if you can put some links here as to what you're comparing (descriptions, tutorials, gate-level diagrams) i can get up to speed and follow the conversation.

(In reply to Luke Kenneth Casson Leighton from comment #3) > if you can put some links here as to what you're comparing (descriptions, > tutorials, gate-level diagrams) i can get up to speed and follow the > conversation. I mostly rely on a college textbook by David & Sarah Harris, named “Digital Design and Computer Architecture”. I read it only in Russian though. The book has a very detailed explanation of carry-lookahead adder, with truth tables and schematics. Some knowledge can even be found in Wiki [1]. Very roughly speaking, N-bit ripple-carry adder takes a*N transistors and b*N time for calculation, while straight-out carry-lookahead adder takes a^N transistors and b time. AFAIK in practical ALU design they always make a compromise between the two. And I just wonder what kind of compromise could be suitable for *reconfigurable*, high-speed ALU. I imagine something cascadable, like 74181 bit-slice ALUs + 74182 carry generators [2], but with 'plexors in between. But the speed of such a design may be questionable. [1] https://en.wikipedia.org/wiki/Adder_(electronics) [2] https://en.wikipedia.org/wiki/74181

(In reply to Jock Tanner from comment #4) > (In reply to Luke Kenneth Casson Leighton from comment #3) > > if you can put some links here as to what you're comparing (descriptions, > > tutorials, gate-level diagrams) i can get up to speed and follow the > > conversation. > > I mostly rely on a college textbook by David & Sarah Harris, named “Digital > Design and Computer Architecture”. I read it only in Russian though. The > book has a very detailed explanation of carry-lookahead adder, with truth > tables and schematics. interesting. well, what we will be relying on is yosys "synth" command turning things from high-level "we want an N-bit adder" into low-level "these 4-bit adders and etc. etc. get created". we are pretty much trusting yosys "synth" to have the optimal state-of-the-art latest industry-recognised "thing". > Very roughly speaking, N-bit ripple-carry adder takes a*N transistors and > b*N time for calculation, while straight-out carry-lookahead adder takes a^N > transistors and b time. AFAIK in practical ALU design they always make a > compromise between the two. And I just wonder what kind of compromise could > be suitable for *reconfigurable*, high-speed ALU. i honestly have no idea! :) the basic units will actually be a 64+8-bit add (yes 72 bit). https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_mul_add/adder.py;h=e1849b4d25fc5ec4fc4473cc02b32f49dd5912b2;hb=77f150cff440bed025e2487b6f0fcda9f529290b#l108 see the comment description there of how it works the "partitions" are done via the insertion of those "extra bits". those "extra bits" will be set to 0 to create a partition, and "1" to get a roll-over. so we're not *actually* "partitioning" into completely separate 8-bit adders. therefore i would hazard a guess that the (underlying 72-bit) adders are properly optimised by yosys "synth" command to give a decent fast adder, thus giving us the "best of both worlds". in other words, a quick analysis leads me to believe that what we're using is *not* adversely impacted just because it's a partitioned adder. it has the *functionality* of a partitioned adder but is in fact a 72-bit "straight" (normal) add.

The idea is so simple yet so out-of-the-box! I feel like I'm in a Western movie, while my partner is devising a get-rich-fast scheme: − And where's the catch? − There is no catch. BTW is it patented? (In reply to Luke Kenneth Casson Leighton from comment #5) > https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/ > part_mul_add/adder.py;h=e1849b4d25fc5ec4fc4473cc02b32f49dd5912b2; > hb=77f150cff440bed025e2487b6f0fcda9f529290b#l108 > > see the comment description there of how it works > > the "partitions" are done via the insertion of those "extra bits". > > those "extra bits" will be set to 0 to create a partition, and "1" > to get a roll-over. > > > so we're not *actually* "partitioning" into completely separate 8-bit adders. > > therefore i would hazard a guess that the (underlying 72-bit) adders are > properly optimised by yosys "synth" command to give a decent fast adder, > thus giving us the "best of both worlds".

(In reply to Jock Tanner from comment #4) > Very roughly speaking, N-bit ripple-carry adder takes a*N transistors and > b*N time for calculation, while straight-out carry-lookahead adder takes a^N > transistors and b time. AFAIK in practical ALU design they always make a > compromise between the two. And I just wonder what kind of compromise could > be suitable for *reconfigurable*, high-speed ALU. > > I imagine something cascadable, like 74181 bit-slice ALUs + 74182 carry > generators [2], but with 'plexors in between. But the speed of such a design > may be questionable. I haven't checked which specific design yosys produces for an adder, but I know there are divide and conquer carry lookahead designs that have area of O(N*log(N)) and delay of O(log(N)) when ignoring delay due to wire length, so that's much better than the above a^N value. I found a paper with an overview of different carry lookahead adder designs: https://www.ijrte.org/wp-content/uploads/papers/v7i5s4/E10920275S419.pdf In fact, I'm planning on using the same kind of recursive divide and conquer structure for purposes other than addition such as finding the first set bit or a prefix-sum.

(In reply to Jock Tanner from comment #6) > The idea is so simple yet so out-of-the-box! I feel like I'm in a Western > movie, while my partner is devising a get-rich-fast scheme: > > − And where's the catch? > − There is no catch. > > BTW is it patented? Not that I'm aware of, I invented it.

(In reply to Jock Tanner from comment #6) > The idea is so simple yet so out-of-the-box! I feel like I'm in a Western > movie, while my partner is devising a get-rich-fast scheme: > > − And where's the catch? > − There is no catch. a lot of what we're doing is like that. the 6600 scoreboards are *very* badly misunderstood and drastically underestimated right across the entire industry (thanks largely to Patterson's book on RISC architectures). consequently what's in Thornton's book "Design of a Computer" (google it) has had to be "reinvented" and often comes attached with unnecessary baggage such as "register renamers". then again there are areas where we are "functional" but need serious improvement. we need a Dadda multiplier algorithm for example, because we currently have a Wallace Tree one. https://bugs.libre-soc.org/show_bug.cgi?id=136