Carry-less multiply could be implemented relatively inexpensively, sharing HW with the SIMD integer/mantissa multiplier. All that would be needed is to mask out the carry signals in the multiplier, which shouldn't have too much of an incremental cost since it'd be converting 2-input AND gates to 3-input. carry-less multiply is commonly used for cryptography, CRC calculation (used for networking and compression/decompression), and more. x86's version of carry-less multiply https://en.wikipedia.org/wiki/CLMUL_instruction_set
nice idea. we'll need to justify an opcode for it, however that's no reason not to actually put it into the ALU, as the ALU is for general-purpose use.
*** This bug has been marked as a duplicate of bug 784 ***