preliminary exploratory software emulation of FP SQRT function in python

idea. sqrt of num on 2-bit boundary always whole num. sqrt 1 -> 1 sqrt 4 -> 2 sqrt 16 -> 4 sqrt 64 -> 8 sqrt of num on *odd* 2-bit boundry *never* whole. why? represent as exponent / mantissa, basically exponent LSB has to be zero. to sqrt exponent, just divide exponent by two. except, if exp LSB == 1, CANNOT DO THAT. however... there is a trick: * add 1 to exponent * shift mantissa DOWN by 1 bit to compensate *NOW* exp LSB is guaranteed to be zero. *NOW* can divide exp by 2 without destroying (losing) info. mantissa shifted by 1 means, interestingly, 2-bit loop now has input shifted by 1 compared to before. fascinating. example exp = 0b1101, m = 01101011 ADD 1, SHIFT --> exp = 0b1110, m = 00110101 (lose 1 bit, must make room internally! this is what extra bits are for) before, m would be divided 01 | 10 | 10 | 11 now it is divided 00 | 11 | 01 | 01 sqrt: exp /= 2 --> 0b0111, mant = pipealgorithm(00 | 11 | 01 | 01) example exp = 0b0100, m = 01010101 exp LSB = 0, --> *NO SHIFT* sqrt: exp /= 2 --> 0b0010, mant = pipealgorithm(01 | 01 | 01 | 01)

This code I wrote quite a while ago in C++11/14 might be a useful reference: Integer sqrt: https://github.com/programmerjake/instruction-selector/blob/727a0c277d4bc97cd96bf15e6a895244cd309a07/int_types.h#L393-L420 Floating-point sqrt based on integer sqrt (don't think it rounds properly though): https://github.com/programmerjake/instruction-selector/blob/727a0c277d4bc97cd96bf15e6a895244cd309a07/float_types.h#L612-L625

(v.get_unbiased_exponent_value() & (signed_type)1) != (signed_type)0 ? from_bits(construct_float(false, fractional_sqrt(v.get_unsigned_mantissa_value() << 1), (v.get_unbiased_exponent_value() - (signed_type)1) / (signed_type)2)) : from_bits(construct_float(false, fractional_sqrt(v.get_unsigned_mantissa_value()), v.get_unbiased_exponent_value() / (signed_type)2)); } okaaay, so that's... really hard to decipher... :) however it's basically what i said in the comment above. if exponent & 1 != 0: # test LSB return ieeefloat(intsqrt(mantissa << 1), # shift mantissa up (exponent - 1) / 2)) # subtract 1 from exp to compensate else: return ieeefloat(intsqrt(mantissa), # mantissa as-is exponent / 2) # no compensating needed on exp basically exactly like is done in FPBase.op_normalise: op.e.eq(op.e - 1), # DECREASE exponent op.m.eq(op.m << 1), # shift mantissa UP

SO i was reading how the sqrt function in pythons math module works and found the alternative to it: anynumber ** (1/2) = sqrt(anynumber) basicly using exponent operator to devide number with 1/2 and you get sqrt of that number. checked with nmigen here: https://github.com/m-labs/nmigen/blob/master/nmigen/back/rtlil.py#L360 it supports ** operator so now i am not yet sure but i think this should be able to reduce the amount of work needed to produce sqrt function. Plan is to re-use get_a and get_b functions, add another func called sqrt in the FSM, then add normalise funcs what does everyone think?

(In reply to Aleksandar Kostovic from comment #4) > SO i was reading how the sqrt function in pythons math module works and > found the alternative to it: > > anynumber ** (1/2) = sqrt(anynumber) > > basicly using exponent operator to devide number with 1/2 and you get sqrt > of that number. > > checked with nmigen here: > https://github.com/m-labs/nmigen/blob/master/nmigen/back/rtlil.py#L360 > > it supports ** operator > > so now i am not yet sure but i think this should be able to reduce the > amount of work needed to produce sqrt function. > Plan is to re-use get_a and get_b functions, add another func called sqrt in > the FSM, then add normalise funcs > > what does everyone think? I highly doubt nmigen supports non-integer exponents.

(In reply to Aleksandar Kostovic from comment #4) > it supports ** operator > > so now i am not yet sure but i think this should be able to reduce the > amount of work needed to produce sqrt function. as jacob said: i think that's usually used to do 2**3 (integer powers, basically). hardware is so low-level, i was actually extremely surprised to find that verilog supports integer-divide. can i suggest the first step being to convert that c-code from the paper into python, or to convert the one from jacob's code (it's a lot simpler). if i remove the brackets and leave in the indentation it's almost done already! for(; bit != (fixed_width_uint)0; bit = bit >> 2) if(v >= bit + retval) v = v - (bit + retval); retval = (retval >> 1) | bit; else retval = retval >> 1; just literally clean that up, add it to the repo somewhere, run it, make it work, and play with it. put in some values, see what it does, run it again. then you will start to get confidence in python and in sqrt. > Plan is to re-use get_a and get_b functions, add another func called sqrt in > the FSM, then add normalise funcs yes that would do the trick - bear in mind that's the hardware, where this bugreport is for discussion of the software simulation side. so, that should be discussed further under bug #43

hi aleksander, saw the commits https://git.libre-riscv.org/?p=ieee754fpu.git;a=commitdiff;h=264ec6d3b61267429026b0090047e81fe85900d5 looks great, and pretty much the exact same thing as what jacob pointed to. i'm curious as to why "+ bit" instead of "| bit", can investigate that later (or if the algorithm doesn't work) now you can use something like this, to calculate a value in python: >>> int(pow(7, 0.5)) 2 or if you prefer, >>> int(7 ** 0.5) 2 yes, int rounds *DOWN*. that shouldn't be a problem initially, although we will need to investigate what to do with the remainder (local variable "num", in the test code that you wrote) so, try just throwing some numbers at the sqrt function, print them out in a loop: for i in range(1, 20): print(sqrt(i)) that should do the trick. then, modify that to include the python version (int(pow)) as well. you'll soon see if the function works :)

>looks great, and pretty much the exact same thing as what jacob >pointed to. thanks! i actually took some time to understand what it does, so took a while >so, try just throwing some numbers at the sqrt function, print >them out in a loop: > >for i in range(1, 20): > print(sqrt(i)) > >that should do the trick. then, modify that to include the python >version (int(pow)) as well. > >you'll soon see if the function works :) I see, it works, but only displays whole numbers. remeber its not a bug, its a feature ;) Next up, trying to make it work as it should

(In reply to Aleksandar Kostovic from comment #8) > >for i in range(1, 20): > > print(sqrt(i)) > I see, it works, but only displays whole numbers. > remeber its not a bug, its a feature ;) yehyeh! duUuuh, *now* it sinks in?? :) you're the one that wanted to implement floating-point, ya muppet! https://www.youtube.com/watch?v=B7UmUX68KtE ... starts to make sense what jacob was saying about hardware not supporting 2 ** 0.5? just keep iterating and exploring, ok? not sure of something, simply write a small bit of code to check if it's true. the faster you can do that, the quicker you'll understand, and the quicker you can write the code. i regularly (like... 15 times a day) run python3 interactively just to do some exploring, not even write some code, just type directly at the prompt. > Next up, trying to make it work as it should awesome.

>yehyeh! duUuuh, *now* it sinks in?? :) you're the one that wanted >to implement floating-point, ya muppet! >https://www.youtube.com/watch?v=B7UmUX68KtE LOL. Takes time to sink in but worth the struggle of doing it :) >... starts to make sense what jacob was saying about hardware >not supporting 2 ** 0.5? Yep, all clear >just keep iterating and exploring, ok? not sure of something, simply >write a small bit of code to check if it's true. > >the faster you can do that, the quicker you'll understand, and the >quicker you can write the code. > >i regularly (like... 15 times a day) run python3 interactively just >to do some exploring, not even write some code, just type directly >at the prompt Just like i did when learning verilog. Literaly had 100s of files for designs that i was trying to build. Everything takes practice to master, so i will practice as much as i can!

I pushed the commit of the python code of the previous C version of SQRT function found in that paper. BUT it doesnt work yet, as i had to init a variable with no value(it was done like that in c version) so now python trows an error because of it. also i commented the old version so there exists a "vorking version" of sqrt function. File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 28, in sqrt R = (R<<2)|((D>>(i+i))&3) TypeError: unsupported operand type(s) for >>: 'NoneType' and 'int' Any ideas on how to fix this are welcome.

this is how you do decrementing ranges in python: - for i in range(15): - i -= 1 + r = 0 # remainder + for i in range(15, -1, -1): # negative ranges are weird... from the c code: //initialize all the variables. a = num; q = 0; i = 0; left = 0; //input to adder/sub right = 0; //input to adder/sub r = 0; //remainder .... ---> python .... - r = None - D = None + D = num # D is input (from num) + r = 0 # remainder i also restored the "simple" version, so that it can be used as a comparative visual test against the new one. now if you do "assert sqrt(Q) == int(Q**0.5)" in that loop, and make the loop say... run to 65536, you'll get a good idea of whether it's going to work.

i see, works the same as the simpler function... really need to find an algorithms that would suit floating point math. any useful links or pseudo code i could use?

https://pdfs.semanticscholar.org/5060/4e9aff0e37089c4ab9a376c3f35761ffe28b.pdf Found this from google search- aslo includes a diagram for in hardware processing ;)

(In reply to Aleksandar Kostovic from comment #13) > i see, works the same as the simpler function... it does... except... try sqrt(65536)... try this: for Q in range(1, int(1e7)): print(Q, sqrt(Q), sqrtsimple(Q), int(Q**0.5)) assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q assert int(Q**0.5) == sqrt(Q), "Q sqrt fail %d" % Q > really need to find an algorithms that would suit floating point math. that *is* the algorithm that suits FP math! congratulations, sqrt() and sqrtsimple() - if it worked - handle the mantissa. now you just need to do the exponent. > any useful links or pseudo code i could use? you're already doing it. don't worry, keep going, ok? you know how to do the square root of an exponent, right? you just... divide it by 2. yes, really! dang this is actually hard to find proper simple illustrations, online... a whole lot of modern total bullshit about a new-fangled concept called "radicals" (wtf??)... google search pulls up results that are such total bullshit i'm just going to write it out for you. the square root of a number when it is written in exponent form (e^x) is basically that you multiply the exponent (x) by 0.5. exactly as with that x ** 0.5 thing you discovered that python does, remember? so: sqrt( e ^ x ) == e ^ (x*0.5) therefore, very very simply (far simpler than you're imagining that it is), to take the square root of the exponent of an IEEE754 FP number... you DIVIDE THE EXPONENT BY TWO! that's just a shift operation! that's it. that's all it is. so with that in mind, does comment #3 now make a bit more sense? http://bugs.libre-riscv.org/show_bug.cgi?id=74#c3 obviously, in IEEE754 FP, exponent is also an integer, so if it is an odd integer, you lose information. to adjust for that, you add 1 to the exponent and shift the mantissa. now the exponent is even, *now* it will divide cleanly by 2 (shift). basically you are nearly there. it really is this simple. you should be able to write the function which takes e and m as arguments, (from comment #3) that handles the core algorithm, in about... 6 lines.

> try this: > >for Q in range(1, int(1e7)): > print(Q, sqrt(Q), sqrtsimple(Q), int(Q**0.5)) > assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q > assert int(Q**0.5) == sqrt(Q), "Q sqrt fail %d" % Q Thats interesting, so as you said it may fail, it fails! Left out the sqrtsimple and let sqrt run for a while. It worked well >you should be able to write the function which takes e and m as arguments, > (from comment #3) that handles the core algorithm, in about... 6 lines. Got it. Made a function for that and pushed it. Take a look :)

(In reply to Aleksandar Kostovic from comment #16) > > try this: > > > >for Q in range(1, int(1e7)): > > print(Q, sqrt(Q), sqrtsimple(Q), int(Q**0.5)) > > assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q > > assert int(Q**0.5) == sqrt(Q), "Q sqrt fail %d" % Q > > Thats interesting, so as you said it may fail, it fails! Left out the > sqrtsimple and let sqrt run for a while. It worked well great. so... bye bye sqrt for now. can always track that down later. > >you should be able to write the function which takes e and m as arguments, > > (from comment #3) that handles the core algorithm, in about... 6 lines. > Got it. Made a function for that and pushed it. Take a look :) great. ok so the call to Q() wasn't needed (or, we could do an Object named Q, but hey, let's keep it dead-simple for now) btw do you already have sfpy installed and built? if so, we can move to the next phase of "from sfpy import Float32" and use that to mangle/de-mangle FP numbers into sign/mantissa/exponent, using sfpy.Float32.get_bits plus use sfpy.Float32.sqrt function. i added some "convenience" functions which, given a 32-bit value, will allow extraction of mantissa, sign and exponent. and reconstruction. so, next stage, you want to write a test for-loop around the main() function? maybe run another for-loop "for e in range(25)" or something, and maaybe reduce the for-loop around mantissa just a tad? :) don't delete the existing for-loop and asserts, add *another* for-loop (this one with 2 nested for's, one for e one for m)

> great. so... bye bye sqrt for now. can always track that down later. I think you misunderstood. Its the sqrtsimple function that is failing: File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 88, in <module> assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q So we use the regular sqrt func. >btw do you already have sfpy installed and built? if so, we can move >to the next phase of "from sfpy import Float32" and use that to mangle/de-mangle >FP numbers into sign/mantissa/exponent, using sfpy.Float32.get_bits Yes i have it installed >so, next stage, you want to write a test for-loop around the main() function? >maybe run another for-loop "for e in range(25)" or something, and maaybe >reduce the for-loop around mantissa just a tad? :) > >don't delete the existing for-loop and asserts, add *another* for-loop >(this one with 2 nested for's, one for e one for m) Will try to do it now

>def main(mantissa, exponent): > if exponent & 1 != 0: > return sqrt(mantissa << 1), # shift mantissa up > ((exponent - 1) / 2) # subtract 1 from exp to compensate > return sqrt(mantissa), # mantissa as-is > (exponent / 2) # no compensating needed on exp ((exponent - 1) / 2) # subtract 1 from exp to compensate ^ IndentationError: unexpected indent now i get indenattion error which i dont know how to resolve. added and removed bothh spaces and tabs but nothing works...

(In reply to Aleksandar Kostovic from comment #18) > > great. so... bye bye sqrt for now. can always track that down later. > I think you misunderstood. Its the sqrtsimple function that is failing: sqrtsimple fails because, by line 8, bit needs to be the smallest power of 4 that is not less than num. 65536 fails because 1 << 14 is too small. also, python doesn't let you wrap lines after a comma unless the comma is in one of () [] or {}.

fixed, on to the testing of for loop: as you can see here:https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/add/fsqrt.py;h=285492c60f5ef11a5a90c6c11d4a44bdfbc13f89;hp=edb7d4ba17ad916b5043cc60a986d6fcc421ec66;hb=988c785b20b569cef6f9140401fab2ed413259a4;hpb=19fdaa150749a27d4a88d0e9ca3a36ae90c64291 I added two for loops, but when i run it i get the following: (0, 0) (1, 0) (1, 0) (1, 0) (2, 0) (2, 0) (2, 0) (2, 0) (2, 0) (3, 0) . . . . SO it isnt working

(In reply to Jacob Lifshay from comment #20) > (In reply to Aleksandar Kostovic from comment #18) > > > great. so... bye bye sqrt for now. can always track that down later. > > I think you misunderstood. Its the sqrtsimple function that is failing: whoops > sqrtsimple fails because, by line 8, bit needs to be the smallest power of 4 > that is not less than num. 65536 fails because 1 << 14 is too small. oh duh! - bit = 1 << 14 + bit = 1 << 32 fixed. > also, python doesn't let you wrap lines after a comma unless the comma is in > one of () [] or {}. i often use \ which i don't see many other people use. so: x = fred + joe + mary + \ shawn_the_sheep

(In reply to Aleksandar Kostovic from comment #21) > fixed, on to the testing of for loop: > > as you can see > here:https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/add/ > fsqrt.py;h=285492c60f5ef11a5a90c6c11d4a44bdfbc13f89; > hp=edb7d4ba17ad916b5043cc60a986d6fcc421ec66; > hb=988c785b20b569cef6f9140401fab2ed413259a4; > hpb=19fdaa150749a27d4a88d0e9ca3a36ae90c64291 > > I added two for loops, but when i run it i get the following: > > (0, 0) > (1, 0) > (1, 0) > (1, 0) > (2, 0) > (2, 0) > (2, 0) > (2, 0) > (2, 0) > (3, 0) > . > . > . > . > > So it isnt working you're not looking far enough down the results... also, not having the original m/e to compare with doesn't help... i added this: + ms, es = main(m, e) + print("m:%d e:%d sqrt: m:%d e:%d" % (m, e, ms, es)) that gives the original values of the for-loop side-by-side with the sqrt'ed answer _now_ it's possible to see that yes, when the exponent is odd, mantissa is multiplied by 2 before being integer-sqrt'ed. so.... m:8 e:25 sqrt: m:4 e:12 because e==25, so it gets reduced to 24 and m gets multiplied by 2 which means, now e==24 and m==16 the sqrt of the exponent is 12 (just shift down by one aka "divide by 2") the sqrt of the mantissa (16) is 4. m:8 e:24 sqrt: m:2 e:12 because e==24, it does NOT get reduced by 1, so m remains the SAME. sqrt of exponent e=24 is 12 (div by 2) sqrt of mantissa (8) is 2 with a LOT of remainder missing. check that with python floating point.. >>> pow(8.0, 0.5) 2.8284271247461903 wow, so with the integer sqrt we lost a *LOT* of digits in the rounding, there. but that's ok, we can deal with that later.

(In reply to Aleksandar Kostovic from comment #18) > > great. so... bye bye sqrt for now. can always track that down later. > I think you misunderstood. Its the sqrtsimple function that is failing: > > File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 88, in > <module> > assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q > > So we use the regular sqrt func. > > >so, next stage, you want to write a test for-loop around the main() function? > Will try to do it now looks great, works well. > >btw do you already have sfpy installed and built? if so, we can move > >to the next phase of "from sfpy import Float32" and use that to mangle/de-mangle >FP numbers into sign/mantissa/exponent, using sfpy.Float32.get_bits > Yes i have it installed ok great, so next phase will be to use "from sfpy import Float32" and create some arbitrary numbers, say... oh i dunno, pick one! 753.415926535897932 x = Float32(fred_the_number) xbits = x.bits then use the function decode_fp32(xbits) to get the s, m, e and from there you can throw those into the function main() and see what happens. theeeeennn... you can do sq_test = x.sqrt() above... and on the answer, use the function decodefp32(sq_test)... and print those out next to the answer from using main(). if those are even remotely close we have a winner!

>so next phase will be to use "from sfpy import Float32" and create some >arbitrary numbers, say... oh i dunno, pick one! 753.415926535897932 > >x = Float32(fred_the_number) >xbits = x.bits > >then use the function decode_fp32(xbits) to get the s, m, e > >and from there you can throw those into the function main() and see what >happens. > >theeeeennn... you can do sq_test = x.sqrt() above... and on the answer, >use the function decodefp32(sq_test)... > >and print those out next to the answer from using main(). > >if those are even remotely close we have a winner! Okay i understood up until the x = Float32(num) and xbits = x.bits. So i decided to print out the values of decode_fp32 function first, to check if its working.... aaaaand its not. x = Float32(1234.123456789) xbits = x.bits print(decode_fp32(x)) which gives: Traceback (most recent call last): File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 108, in <module> print(decode_fp32(x)) File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 74, in decode_fp32 return get_sign(x), get_exponent(x), get_mantissa(x) File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 64, in get_sign return ((x & 0x80000000) >> 31) TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int'

(In reply to Aleksandar Kostovic from comment #25) > Okay i understood up until the x = Float32(num) and xbits = x.bits. So i > decided to print out the values of decode_fp32 function first, to check if > its working.... aaaaand its not. > > x = Float32(1234.123456789) > xbits = x.bits > > > print(decode_fp32(x)) > TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int' yes, that is precisely and exactly the correct error, because you tried to decode x (an object) not xbits (an integer)

>theeeeennn... you can do sq_test = x.sqrt() above... and on the answer, >use the function decodefp32(sq_test)... > >and print those out next to the answer from using main(). Okay so i am confused by the thing i should do but tried to understand, so added: + sq_test = x.sqrt() + e, m = decode_fp32(sq_test) + print(main(e, m)) and get an error: File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 64, in get_sign return ((x & 0x80000000) >> 31) TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int'

(In reply to Aleksandar Kostovic from comment #27) > >theeeeennn... you can do sq_test = x.sqrt() above... and on the answer, > >use the function decodefp32(sq_test)... > > > >and print those out next to the answer from using main(). > > Okay so i am confused by the thing i should do but tried to understand, so > added: > > + sq_test = x.sqrt() > + e, m = decode_fp32(sq_test) > + print(main(e, m)) > > and get an error: > File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 64, in > get_sign > return ((x & 0x80000000) >> 31) > TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int' you did exactly the same thing, doh! you passed in sq_test (an object) *NOT* the bits *OF* the object, which are an integer. - sq_test = x.sqrt() + sq_test = x.sqrt.bits also, decode_fp32 returns a tuple of 3 items: sign, exponent, mantissa. you can't assign a tuple of length 3 to a tuple of length 2 (e, m) - e, m = decode_fp32(sq_test.bits) + s, e, m = decode_fp32(sq_test.bits) try git pull.

>you did exactly the same thing, doh! you passed in sq_test (an object) >*NOT* the bits *OF* the object, which are an integer. i think i get it now >try git pull. okay. i see. the question now remains how do i assemble it all?

(In reply to Aleksandar Kostovic from comment #29) > >you did exactly the same thing, doh! you passed in sq_test (an object) > >*NOT* the bits *OF* the object, which are an integer. > i think i get it now > >try git pull. > okay. i see. the question now remains how do i assemble it all? well... run the example, you'll see that it's almost there: 1234.1234130859375 <class 'sfpy.float.Float32'> 1150960627 <class 'int'> 0 10 1721331 0x1a43f3 our sqrt 4 860665 0xd21f9 sf32 sqrt 0 5 820535 0xc8537 okay! so now we can start exploring... and at this point, it's probably best to go look at softfloat source code and/or also hardfloat... 1 sec let me just commit some extra print/debug stuff...

https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f32_sqrt.c#L90 what on eeeearth? moo?? the fp16 version isn't much easier to understand: https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f16_sqrt.c#L115 https://github.com/ucb-bar/berkeley-hardfloat/blob/master/src/main/scala/DivSqrtRecF64_mulAddZ31.scala not understandable at all... RocketChip uses that hardfloat library, so no joy there... anyone else find anything? we might be stuck with trying to work out how f32_sqrt.c operates.

shock! near-total guess-work, and this works! m |= 1<<23 # set top bit (the missing "1" from mantissa) m <<= 25 sm, se = main(m, e) sm >>= 1 sm = get_mantissa(sm) ok so the |= 1<<23 wasn't total guess-work, that's what softfloat does: sigA = (sigA | 0x00800000)<<8; which is to set bit 23 (and then shift up by 8) i shifted up by 25 - as a guess - to give a couple extra bits in the mantissa, then shifted down by one (sm >>= 1) the call to get_mantissa() masks out the top bits... we should really call "normalise" to adjust exp/mantissa... so quite a lot to be done, however that's the hard part. dang that was lucky. x 1234.1234130859375 <class 'sfpy.float.Float32'> sqrt 35.13009262084961 1150960627 <class 'int'> x decode 0 10 1721331 0x1a43f3 our sqrt 0 5 820535 0xc8537 0b11001000010100110111 sf32 sqrt 0 5 820535 0xc8537 0b11001000010100110111 x 32.099998474121094 <class 'sfpy.float.Float32'> sqrt 5.665686130523682 1107322470 <class 'int'> x decode 0 5 26214 0x6666 our sqrt 0 2 3493196 0x354d4c 0b1101010100110101001100 sf32 sqrt 0 2 3493197 0x354d4d 0b1101010100110101001101 x 16.0 <class 'sfpy.float.Float32'> sqrt 4.0 1098907648 <class 'int'> x decode 0 4 0 0x0 our sqrt 0 2 0 0x0 0b0 sf32 sqrt 0 2 0 0x0 0b0 x 8.0 <class 'sfpy.float.Float32'> sqrt 2.8284270763397217 1090519040 <class 'int'> x decode 0 3 0 0x0 our sqrt 0 1 3474675 0x3504f3 0b1101010000010011110011 sf32 sqrt 0 1 3474675 0x3504f3 0b1101010000010011110011 x 8.5 <class 'sfpy.float.Float32'> sqrt 2.915475845336914 1091043328 <class 'int'> x decode 0 3 524288 0x80000 our sqrt 0 1 3839784 0x3a9728 0b1110101001011100101000 sf32 sqrt 0 1 3839784 0x3a9728 0b1110101001011100101000

>shock! near-total guess-work, and this works! YEY! Thats awesome :) >the call to get_mantissa() masks out the top bits... we should really >call "normalise" to adjust exp/mantissa... > >so quite a lot to be done, however that's the hard part. so now we create a function normalise or what?

On Tue, Apr 30, 2019 at 10:42 AM <bugzilla-daemon@libre-riscv.org> wrote: > > http://bugs.libre-riscv.org/show_bug.cgi?id=74 > > --- Comment #33 from Aleksandar Kostovic <alexandar.kostovic@gmail.com> --- > >shock! near-total guess-work, and this works! > YEY! Thats awesome :) > > >the call to get_mantissa() masks out the top bits... we should really > >call "normalise" to adjust exp/mantissa... > > > >so quite a lot to be done, however that's the hard part. > so now we create a function normalise or what? rounding and normalising, yes. rounding based on that new variable "lowbits" being >= 2, and normalise... mmm... m.d.sync += z.m.eq(z.m + 1) # mantissa rounds up with m.If(z.m == z.m1s): # all 1s m.d.sync += z.e.eq(z.e + 1) # exponent rounds up looks like we're supposed to just increment the exponent by one if the mantissa becomes all 1s. so just something like this: if (lowbits >= 2): m += 1 if get_mantissa(m) == ((1<<24)-1): e += 1 something like that. it might be sm and se not m and e then we can look at re-constructing the IEEE754 number, doing something similar to the unit test code. also, we need to work out the exceptions, NaN, Inf, etc.

https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f32_sqrt.c#L66 the NaN/Inf stuff looks quite straightforward

>the NaN/Inf stuff looks quite straightforward Yes, i will now create functions normalize. Lets see how that works...

Created the normalise func(without special cases for NaN and inf) Actually the more i look at it, it isnt clear to me how to create the special cases for NaN and inf

(In reply to Aleksandar Kostovic from comment #37) > Created the normalise func(without special cases for NaN and inf) > Actually the more i look at it, it isnt clear to me how to create the > special cases for NaN and inf Theyre a separate function anyway. Earlier in process, way earlier than rounding. Lets start with the c code, Convert to python. Can you add separate fn taking block of code from softfloat link Commit as-is first then try doing straight conversion. check jondawson verilog code for notes on what INF means, what NAN. Piece together. will help tomorrow.

* moved normalise up to where it's used (above fsqrt_test) * added lowbits arg (was missed out) * called normalise * seems to work on the 8-or-so examples so far. +#normalization function +def normalise(s, m, e, lowbits): + if (lowbits >= 2): + m += 1 + if get_mantissa(m) == ((1<<24)-1): + e += 1 + return s, m, e + + def fsqrt_test(x): xbits = x.bits @@ -99,6 +108,9 @@ def fsqrt_test(x): sm >>= 2 sm = get_mantissa(sm) #sm += 2 + + s, sm, se = normalise(s, sm, se, lowbits) + print("our sqrt", s, se, sm, hex(sm), bin(sm), "lowbits", lowbits, "rem", hex(sr)) if lowbits >= 2:

https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/fpsqrt/fsqrt.py;h=15f1555d969098246dc9df4bf1f3ffce812d6958;hp=02449b0f75b41b0ba2e004711d9570ccca49ca3d;hb=2a273381d3f843cd58b99cbc45728761a4d12d0d;hpb=23daa4a7dbe5d98b00bbd468b35e7b2f2c782ff4 Done this from FPU verilog code, but i still dont know how i would integrate it better. Can anyone help out?

(In reply to Aleksandar Kostovic from comment #40) > Done this from FPU verilog code, but i still dont know how i would integrate > it better. > > Can anyone help out? i added some comments that will help you to transition it from the IEEE754 format that z is in, into s/e/m format. basically bit 31 is s, bits 0..22 are the mantissa, and 23 to 30 are the exponent. also the IEEE754 format adds 128 to the exponent. by working in that 32-bit format, you have nothing that can be returned from that function. so you need to *translate* the z-stuff *back* into s / e / m and i've provided some comments which will allow you to do that.

>i added some comments that will help you to transition it from the > IEEE754 format that z is in, into s/e/m format. Thanks very much. I think i understood ;) https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/fpsqrt/fsqrt.py;h=f7410703643e763c0ad5c43c57e6c39b3c12e71e;hp=5ae2bb253fa1356ec89f2960930b884faf51a2db;hb=75f9bd28e14c789872060997a655335bbdaed1aa;hpb=5690d1f94f158cbc8e715b9943d3eaba32cec9e3 Take a look

(In reply to Aleksandar Kostovic from comment #42) > >i added some comments that will help you to transition it from the > > IEEE754 format that z is in, into s/e/m format. > Thanks very much. I think i understood ;) > > https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/ > fpsqrt/fsqrt.py;h=f7410703643e763c0ad5c43c57e6c39b3c12e71e; > hp=5ae2bb253fa1356ec89f2960930b884faf51a2db; > hb=75f9bd28e14c789872060997a655335bbdaed1aa; > hpb=5690d1f94f158cbc8e715b9943d3eaba32cec9e3 > > Take a look s = 1 # sign (so, s=1) e = 255 # exponent (minus 128, so e = 127 note the comment that the format of IEEE offsets by 128? so when this value 255 goes through the IEEE format processing, 128 will be added and the value attempted to be stored will be 255+128. m = 1<<22 # high bit of mantissa, so m = 1<<22 i think good... m = 1 ... so... err... why are you trying to then destroy the value? m = 1<<22 # rest of mantissa is zero, so m = 1<<22 is good. ... and then set it again.... m = 0 ... and then destroy it again?

oh i made so stupid errors... so check out now: https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/fpsqrt/fsqrt.py;h=0c8aa1e539f1e9e61d6d8b89610900cda9312a0b;hp=f7410703643e763c0ad5c43c57e6c39b3c12e71e;hb=eb7c98a477798c050c7c3103296e3f8f9d6e4bff;hpb=c9a2ef1e7e5510db37ebde1c3a5faa9efb9a305f

(In reply to Aleksandar Kostovic from comment #44) > oh i made so stupid errors... > > so check out now: > https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/ > fpsqrt/fsqrt.py;h=0c8aa1e539f1e9e61d6d8b89610900cda9312a0b; > hp=f7410703643e763c0ad5c43c57e6c39b3c12e71e; > hb=eb7c98a477798c050c7c3103296e3f8f9d6e4bff; > hpb=c9a2ef1e7e5510db37ebde1c3a5faa9efb9a305f looks great, and would be fantastic if i hadn't got the offset wrong. it's supposed to be "subtract 127" not "subtract 128" (*sigh*), have a look at the FPNumDecode class. once that's sorted, the next task is in comment #35 above.

> once that's sorted, the next task is in comment #35 above. hi alexander if you can sort out the NaN and INF (special cases), we can declare this one done and you can get paid.

Should we declare this complete enough, pay Aleksandar, and close it? We have a working HW implementation (div/mod/sqrt/rsqrt pipeline) and working SW emulation (simple-soft-float)

(In reply to Jacob Lifshay from comment #47) > Should we declare this complete enough, pay Aleksandar, and close it? We > have a working HW implementation (div/mod/sqrt/rsqrt pipeline) and working > SW emulation (simple-soft-float) yes good idea. aleksander are you around? we have $ for you :)

(In reply to Luke Kenneth Casson Leighton from comment #48) > (In reply to Jacob Lifshay from comment #47) > > Should we declare this complete enough, pay Aleksandar, and close it? We > > have a working HW implementation (div/mod/sqrt/rsqrt pipeline) and working > > SW emulation (simple-soft-float) > > yes good idea. aleksander are you around? we have $ for you :) So I saw this is marked as done huh. Haven't been on the mailing list in ages. See you guys are making good progress all-arround. I am more than happy to accept. Will email you at lkcl@lkcl.net with acc details :)