the wallace multiplier produces a 10-stage-long chain at 64-bit. dadda tree multipliers use less gates. https://github.com/jorisvr/gen_hdl_multiplier existing code that needs converting: https://git.libre-soc.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/part_mul_add/multiply.py;hb=HEAD