Bug 74 - preliminary exploratory software emulation of FP SQRT
Summary: preliminary exploratory software emulation of FP SQRT
Status: RESOLVED FIXED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: ALU (including IEEE754 16/32/64-bit FPU) (show other bugs)
Version: unspecified
Hardware: PC Linux
: --- enhancement
Assignee: Aleksandar Kostovic
URL:
Depends on:
Blocks: 48
  Show dependency treegraph
 
Reported: 2019-04-25 20:15 BST by Aleksandar Kostovic
Modified: 2022-06-20 10:03 BST (History)
3 users (show)

See Also:
NLnet milestone: NLnet.2019.02.012
total budget (EUR) for completion of task and all subtasks: 300
budget (EUR) for this task, excluding subtasks' budget: 300
parent task for budget allocation: 48
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:
aleksandar_kostovic={amount=300, submitted=2020-01-28, paid=2020-01-28}


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aleksandar Kostovic 2019-04-25 20:15:49 BST
preliminary exploratory software emulation of FP SQRT function in python
Comment 1 Luke Kenneth Casson Leighton 2019-04-25 20:27:06 BST
idea.

sqrt of num on 2-bit boundary always whole num.

sqrt 1 -> 1
sqrt 4 -> 2
sqrt 16 -> 4
sqrt 64 -> 8

sqrt of num on *odd* 2-bit boundry *never* whole.

why?

represent as exponent / mantissa, basically exponent LSB has to be zero.

to sqrt exponent, just divide exponent by two.

except, if exp LSB == 1, CANNOT DO THAT.

however... there is a trick:

* add 1 to exponent
* shift mantissa DOWN by 1 bit to compensate

*NOW* exp LSB is guaranteed to be zero.

*NOW* can divide exp by 2 without destroying (losing) info.

mantissa shifted by 1 means, interestingly, 2-bit loop now has input
shifted by 1 compared to before.

fascinating.

example

exp = 0b1101, m = 01101011
ADD 1, SHIFT -->
exp = 0b1110, m = 00110101 (lose 1 bit, must make room internally!
                           this is what extra bits are for)

before, m would be divided 01 | 10 | 10 | 11
now it is divided 00 | 11 | 01 | 01

sqrt: exp /= 2 --> 0b0111, mant = pipealgorithm(00 | 11 | 01 | 01)

example

exp = 0b0100, m = 01010101
exp LSB = 0, -->
*NO SHIFT*

sqrt: exp /= 2 --> 0b0010, mant = pipealgorithm(01 | 01 | 01 | 01)
Comment 2 Jacob Lifshay 2019-04-27 01:29:19 BST
This code I wrote quite a while ago in C++11/14 might be a useful reference:

Integer sqrt:
https://github.com/programmerjake/instruction-selector/blob/727a0c277d4bc97cd96bf15e6a895244cd309a07/int_types.h#L393-L420

Floating-point sqrt based on integer sqrt (don't think it rounds properly though):
https://github.com/programmerjake/instruction-selector/blob/727a0c277d4bc97cd96bf15e6a895244cd309a07/float_types.h#L612-L625
Comment 3 Luke Kenneth Casson Leighton 2019-04-27 04:13:50 BST
            (v.get_unbiased_exponent_value() & (signed_type)1) != (signed_type)0 ?
                from_bits(construct_float(false, fractional_sqrt(v.get_unsigned_mantissa_value() << 1), (v.get_unbiased_exponent_value() - (signed_type)1) / (signed_type)2)) :
            from_bits(construct_float(false, fractional_sqrt(v.get_unsigned_mantissa_value()), v.get_unbiased_exponent_value() / (signed_type)2));
    }

okaaay, so that's... really hard to decipher... :)
however it's basically what i said in the comment above.

if exponent & 1 != 0: # test LSB
   return ieeefloat(intsqrt(mantissa << 1), # shift mantissa up
                    (exponent - 1) / 2))    # subtract 1 from exp to compensate
else:
   return ieeefloat(intsqrt(mantissa),      # mantissa as-is
                    exponent / 2)           # no compensating needed on exp

basically exactly like is done in FPBase.op_normalise:

                op.e.eq(op.e - 1),  # DECREASE exponent
                op.m.eq(op.m << 1), # shift mantissa UP
Comment 4 Aleksandar Kostovic 2019-04-27 08:33:17 BST
SO i was reading how the sqrt function in pythons math module works and found the alternative to it:

anynumber ** (1/2) = sqrt(anynumber)

basicly using exponent operator to devide number with 1/2 and you get sqrt of that number.

checked with nmigen here:
https://github.com/m-labs/nmigen/blob/master/nmigen/back/rtlil.py#L360 

it supports ** operator

so now i am not yet sure but i think this should be able to reduce the amount of work needed to produce sqrt function.
Plan is to re-use get_a and get_b functions, add another func called sqrt in the FSM, then add normalise funcs

what does everyone think?
Comment 5 Jacob Lifshay 2019-04-27 08:36:28 BST
(In reply to Aleksandar Kostovic from comment #4)
> SO i was reading how the sqrt function in pythons math module works and
> found the alternative to it:
> 
> anynumber ** (1/2) = sqrt(anynumber)
> 
> basicly using exponent operator to devide number with 1/2 and you get sqrt
> of that number.
> 
> checked with nmigen here:
> https://github.com/m-labs/nmigen/blob/master/nmigen/back/rtlil.py#L360 
> 
> it supports ** operator
> 
> so now i am not yet sure but i think this should be able to reduce the
> amount of work needed to produce sqrt function.
> Plan is to re-use get_a and get_b functions, add another func called sqrt in
> the FSM, then add normalise funcs
> 
> what does everyone think?

I highly doubt nmigen supports non-integer exponents.
Comment 6 Luke Kenneth Casson Leighton 2019-04-27 10:57:51 BST
(In reply to Aleksandar Kostovic from comment #4)

> it supports ** operator
> 
> so now i am not yet sure but i think this should be able to reduce the
> amount of work needed to produce sqrt function.

as jacob said: i think that's usually used to do 2**3 (integer powers,
basically).

hardware is so low-level, i was actually extremely surprised to find
that verilog supports integer-divide.

can i suggest the first step being to convert that c-code from the paper
into python, or to convert the one from jacob's code (it's a lot simpler).
if i remove the brackets and leave in the indentation it's almost done
already!

        for(; bit != (fixed_width_uint)0; bit = bit >> 2)
            if(v >= bit + retval)
                v = v - (bit + retval);
                retval = (retval >> 1) | bit;
            else
                retval = retval >> 1;

just literally clean that up, add it to the repo somewhere, run it,
make it work, and play with it.

put in some values, see what it does, run it again.

then you will start to get confidence in python and in sqrt.


> Plan is to re-use get_a and get_b functions, add another func called sqrt in
> the FSM, then add normalise funcs

yes that would do the trick - bear in mind that's the hardware, where
this bugreport is for discussion of the software simulation side.
so, that should be discussed further under bug #43
Comment 7 Luke Kenneth Casson Leighton 2019-04-27 15:04:40 BST
hi aleksander, saw the commits 
https://git.libre-riscv.org/?p=ieee754fpu.git;a=commitdiff;h=264ec6d3b61267429026b0090047e81fe85900d5

looks great, and pretty much the exact same thing as what jacob
pointed to.  i'm curious as to why "+ bit" instead of "| bit", can
investigate that later (or if the algorithm doesn't work)

now you can use something like this, to calculate a value in
python:

>>> int(pow(7, 0.5))
2

or if you prefer,
>>> int(7 ** 0.5)
2

yes, int rounds *DOWN*.  that shouldn't be a problem initially,
although we will need to investigate what to do with the remainder
(local variable "num", in the test code that you wrote)

so, try just throwing some numbers at the sqrt function, print
them out in a loop:

for i in range(1, 20):
    print(sqrt(i))

that should do the trick.  then, modify that to include the python
version (int(pow)) as well.

you'll soon see if the function works :)
Comment 8 Aleksandar Kostovic 2019-04-27 15:24:29 BST
>looks great, and pretty much the exact same thing as what jacob
>pointed to.
thanks! i actually took some time to understand what it does, so took a while

>so, try just throwing some numbers at the sqrt function, print
>them out in a loop:
>
>for i in range(1, 20):
>    print(sqrt(i))
>
>that should do the trick.  then, modify that to include the python
>version (int(pow)) as well.
>
>you'll soon see if the function works :)

I see, it works, but only displays whole numbers. remeber its not a bug, its a feature ;)

Next up, trying to make it work as it should
Comment 9 Luke Kenneth Casson Leighton 2019-04-27 15:32:55 BST
(In reply to Aleksandar Kostovic from comment #8)

> >for i in range(1, 20):
> >    print(sqrt(i))

> I see, it works, but only displays whole numbers.
> remeber its not a bug, its a feature ;)

 yehyeh! duUuuh, *now* it sinks in?? :)  you're the one that wanted
 to implement floating-point, ya muppet!
 https://www.youtube.com/watch?v=B7UmUX68KtE

 ... starts to make sense what jacob was saying about hardware
 not supporting 2 ** 0.5?

 just keep iterating and exploring, ok? not sure of something, simply
 write a small bit of code to check if it's true.

 the faster you can do that, the quicker you'll understand, and the
 quicker you can write the code.

 i regularly (like... 15 times a day) run python3 interactively just
 to do some exploring, not even write some code, just type directly
 at the prompt.


> Next up, trying to make it work as it should

awesome.
Comment 10 Aleksandar Kostovic 2019-04-27 16:08:07 BST
>yehyeh! duUuuh, *now* it sinks in?? :)  you're the one that wanted
>to implement floating-point, ya muppet!
>https://www.youtube.com/watch?v=B7UmUX68KtE
LOL. Takes time to sink in but worth the struggle of doing it :)


>... starts to make sense what jacob was saying about hardware
>not supporting 2 ** 0.5?
Yep, all clear


>just keep iterating and exploring, ok? not sure of something, simply
>write a small bit of code to check if it's true.
>
>the faster you can do that, the quicker you'll understand, and the
>quicker you can write the code.
>
>i regularly (like... 15 times a day) run python3 interactively just
>to do some exploring, not even write some code, just type directly
>at the prompt
Just like i did when learning verilog. Literaly had 100s of files for designs that i was trying to build.

Everything takes practice to master, so i will practice as much as i can!
Comment 11 Aleksandar Kostovic 2019-04-28 10:53:00 BST
I pushed the commit of the python code of the previous C version of SQRT function found in that paper. BUT it doesnt work yet, as i had to init a variable with no value(it was done like that in c version) so now python trows an error because of it. also i commented the old version so there exists a "vorking version" of sqrt function.

  File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 28, in sqrt
    R = (R<<2)|((D>>(i+i))&3)
TypeError: unsupported operand type(s) for >>: 'NoneType' and 'int'

Any ideas on how to fix this are welcome.
Comment 12 Luke Kenneth Casson Leighton 2019-04-28 11:23:20 BST
this is how you do decrementing ranges in python:

-    for i in range(15):
-        i -= 1
+    r = 0 # remainder
+    for i in range(15, -1, -1): # negative ranges are weird...

from the c code:

    //initialize all the variables.
    a = num;
    q = 0;
    i = 0;
    left = 0;   //input to adder/sub
    right = 0;  //input to adder/sub
    r = 0;  //remainder

.... ---> python ....

-    r = None
-    D = None
+    D = num # D is input (from num)
+    r = 0 # remainder

i also restored the "simple" version, so that it can be used as
a comparative visual test against the new one.

now if you do "assert sqrt(Q) == int(Q**0.5)" in that loop, and make
the loop say... run to 65536, you'll get a good idea of whether it's
going to work.
Comment 13 Aleksandar Kostovic 2019-04-28 12:21:45 BST
i see, works the same as the simpler function...

really need to find an algorithms that would suit floating point math. any useful links or pseudo code i could use?
Comment 14 Aleksandar Kostovic 2019-04-28 12:34:43 BST
https://pdfs.semanticscholar.org/5060/4e9aff0e37089c4ab9a376c3f35761ffe28b.pdf
Found this from google search- aslo includes a diagram for in hardware processing ;)
Comment 15 Luke Kenneth Casson Leighton 2019-04-28 13:30:05 BST
(In reply to Aleksandar Kostovic from comment #13)
> i see, works the same as the simpler function...

 it does... except... try sqrt(65536)...

 try this:

for Q in range(1, int(1e7)):
    print(Q, sqrt(Q), sqrtsimple(Q), int(Q**0.5))
    assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q
    assert int(Q**0.5) == sqrt(Q), "Q sqrt fail %d" % Q

> really need to find an algorithms that would suit floating point math.

 that *is* the algorithm that suits FP math!

 congratulations, sqrt() and sqrtsimple() - if it worked - handle the
 mantissa.

 now you just need to do the exponent.


> any useful links or pseudo code i could use?

 you're already doing it.  don't worry, keep going, ok?


 you know how to do the square root of an exponent, right?

 you just... divide it by 2.  yes, really!

 dang this is actually hard to find proper simple illustrations, online...
 a whole lot of modern total bullshit about a new-fangled concept called
 "radicals" (wtf??)...

 google search pulls up results that are such total bullshit i'm just going
 to write it out for you.


 the square root of a number when it is written in exponent form (e^x)
 is basically that you multiply the exponent (x) by 0.5.

 exactly as with that x ** 0.5 thing you discovered that python does,
 remember?


 so:

 sqrt( e ^ x ) == e ^ (x*0.5)

 therefore, very very simply (far simpler than you're imagining that it is),
 to take the square root of the exponent of an IEEE754 FP number...

 you DIVIDE THE EXPONENT BY TWO!

 that's just a shift operation!

 that's it.

 that's all it is.

 so with that in mind, does comment #3 now make a bit more sense?
 http://bugs.libre-riscv.org/show_bug.cgi?id=74#c3

 obviously, in IEEE754 FP, exponent is also an integer, so if it is
 an odd integer, you lose information.  to adjust for that, you add
 1 to the exponent and shift the mantissa.  now the exponent is
 even, *now* it will divide cleanly by 2 (shift).


 basically you are nearly there.  it really is this simple.

 you should be able to write the function which takes e and m as arguments,
 (from comment #3) that handles the core algorithm, in about... 6 lines.
Comment 16 Aleksandar Kostovic 2019-04-28 14:10:19 BST
> try this:
>
>for Q in range(1, int(1e7)):
>    print(Q, sqrt(Q), sqrtsimple(Q), int(Q**0.5))
>    assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q
>    assert int(Q**0.5) == sqrt(Q), "Q sqrt fail %d" % Q

Thats interesting, so as you said it may fail, it fails! Left out the sqrtsimple and let sqrt run for a while. It worked well

>you should be able to write the function which takes e and m as arguments,
> (from comment #3) that handles the core algorithm, in about... 6 lines.
Got it. Made a function for that and pushed it. Take a look :)
Comment 17 Luke Kenneth Casson Leighton 2019-04-28 14:54:29 BST
(In reply to Aleksandar Kostovic from comment #16)
> > try this:
> >
> >for Q in range(1, int(1e7)):
> >    print(Q, sqrt(Q), sqrtsimple(Q), int(Q**0.5))
> >    assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q
> >    assert int(Q**0.5) == sqrt(Q), "Q sqrt fail %d" % Q
> 
> Thats interesting, so as you said it may fail, it fails! Left out the
> sqrtsimple and let sqrt run for a while. It worked well

 great.  so... bye bye sqrt for now.  can always track that down later.

> >you should be able to write the function which takes e and m as arguments,
> > (from comment #3) that handles the core algorithm, in about... 6 lines.
> Got it. Made a function for that and pushed it. Take a look :)

great.  ok so the call to Q() wasn't needed (or, we could do an Object
named Q, but hey, let's keep it dead-simple for now)


btw do you already have sfpy installed and built?  if so, we can move
to the next phase of "from sfpy import Float32" and use that to mangle/de-mangle FP numbers into sign/mantissa/exponent, using sfpy.Float32.get_bits

plus use sfpy.Float32.sqrt function.

i added some "convenience" functions which, given a 32-bit value, will
allow extraction of mantissa, sign and exponent.  and reconstruction.

so, next stage, you want to write a test for-loop around the main() function?
maybe run another for-loop "for e in range(25)" or something, and maaybe
reduce the for-loop around mantissa just a tad? :)

don't delete the existing for-loop and asserts, add *another* for-loop
(this one with 2 nested for's, one for e one for m)
Comment 18 Aleksandar Kostovic 2019-04-28 16:27:30 BST
> great.  so... bye bye sqrt for now.  can always track that down later.
I think you misunderstood. Its the sqrtsimple function that is failing:

File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 88, in <module>
    assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q

So we use the regular sqrt func.

>btw do you already have sfpy installed and built?  if so, we can move
>to the next phase of "from sfpy import Float32" and use that to mangle/de-mangle >FP numbers into sign/mantissa/exponent, using sfpy.Float32.get_bits
Yes i have it installed


>so, next stage, you want to write a test for-loop around the main() function?
>maybe run another for-loop "for e in range(25)" or something, and maaybe
>reduce the for-loop around mantissa just a tad? :)
>
>don't delete the existing for-loop and asserts, add *another* for-loop
>(this one with 2 nested for's, one for e one for m)
Will try to do it now
Comment 19 Aleksandar Kostovic 2019-04-28 16:36:20 BST
>def main(mantissa, exponent):
>    if exponent & 1 != 0:
>        return sqrt(mantissa << 1), # shift mantissa up
>                ((exponent - 1) / 2) # subtract 1 from exp to compensate
>    return sqrt(mantissa),      # mantissa as-is
>           (exponent / 2)       # no compensating needed on exp

    ((exponent - 1) / 2) # subtract 1 from exp to compensate
    ^
IndentationError: unexpected indent

now i get indenattion error which i dont know how to resolve. added and removed bothh spaces and tabs but nothing works...
Comment 20 Jacob Lifshay 2019-04-28 16:39:38 BST
(In reply to Aleksandar Kostovic from comment #18)
> > great.  so... bye bye sqrt for now.  can always track that down later.
> I think you misunderstood. Its the sqrtsimple function that is failing:
sqrtsimple fails because, by line 8, bit needs to be the smallest power of 4 that is not less than num. 65536 fails because 1 << 14 is too small.

also, python doesn't let you wrap lines after a comma unless the comma is in one of () [] or {}.
Comment 21 Aleksandar Kostovic 2019-04-28 16:50:05 BST
fixed, on to the testing of for loop:

as you can see here:https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/add/fsqrt.py;h=285492c60f5ef11a5a90c6c11d4a44bdfbc13f89;hp=edb7d4ba17ad916b5043cc60a986d6fcc421ec66;hb=988c785b20b569cef6f9140401fab2ed413259a4;hpb=19fdaa150749a27d4a88d0e9ca3a36ae90c64291

I added two for loops, but when i run it i get the following:

(0, 0)
(1, 0)
(1, 0)
(1, 0)
(2, 0)
(2, 0)
(2, 0)
(2, 0)
(2, 0)
(3, 0)
.
.
.
.

SO it isnt working
Comment 22 Luke Kenneth Casson Leighton 2019-04-28 17:48:57 BST
(In reply to Jacob Lifshay from comment #20)
> (In reply to Aleksandar Kostovic from comment #18)
> > > great.  so... bye bye sqrt for now.  can always track that down later.
> > I think you misunderstood. Its the sqrtsimple function that is failing:

 whoops

> sqrtsimple fails because, by line 8, bit needs to be the smallest power of 4
> that is not less than num. 65536 fails because 1 << 14 is too small.

oh duh!

-     bit = 1 << 14
+     bit = 1 << 32

fixed.

> also, python doesn't let you wrap lines after a comma unless the comma is in
> one of () [] or {}.

i often use \ which i don't see many other people use.  so:

x = fred + joe + mary + \
    shawn_the_sheep
Comment 23 Luke Kenneth Casson Leighton 2019-04-28 18:03:59 BST
(In reply to Aleksandar Kostovic from comment #21)
> fixed, on to the testing of for loop:
> 
> as you can see
> here:https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/add/
> fsqrt.py;h=285492c60f5ef11a5a90c6c11d4a44bdfbc13f89;
> hp=edb7d4ba17ad916b5043cc60a986d6fcc421ec66;
> hb=988c785b20b569cef6f9140401fab2ed413259a4;
> hpb=19fdaa150749a27d4a88d0e9ca3a36ae90c64291
> 
> I added two for loops, but when i run it i get the following:
> 
> (0, 0)
> (1, 0)
> (1, 0)
> (1, 0)
> (2, 0)
> (2, 0)
> (2, 0)
> (2, 0)
> (2, 0)
> (3, 0)
> .
> .
> .
> .
> 
> So it isnt working

you're not looking far enough down the results... also, not having
the original m/e to compare with doesn't help... i added this:

+            ms, es = main(m, e)
+            print("m:%d e:%d sqrt: m:%d e:%d" % (m, e, ms, es))

that gives the original values of the for-loop side-by-side with
the sqrt'ed answer

_now_ it's possible to see that yes, when the exponent is odd,
mantissa is multiplied by 2 before being integer-sqrt'ed.

so.... 

m:8 e:25 sqrt: m:4 e:12

because e==25, so it gets reduced to 24 and m gets multiplied by 2
which means, now e==24 and m==16

the sqrt of the exponent is 12 (just shift down by one aka "divide by 2")
the sqrt of the mantissa (16) is 4.


m:8 e:24 sqrt: m:2 e:12

because e==24, it does NOT get reduced by 1, so m remains the SAME.

sqrt of exponent e=24 is 12 (div by 2)
sqrt of mantissa (8) is 2 with a LOT of remainder missing.

check that with python floating point..

>>> pow(8.0, 0.5)
2.8284271247461903

wow, so with the integer sqrt we lost a *LOT* of digits in the rounding, there. but that's ok, we can deal with that later.
Comment 24 Luke Kenneth Casson Leighton 2019-04-28 18:09:03 BST
(In reply to Aleksandar Kostovic from comment #18)
> > great.  so... bye bye sqrt for now.  can always track that down later.
> I think you misunderstood. Its the sqrtsimple function that is failing:
> 
> File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 88, in
> <module>
>     assert int(Q**0.5) == sqrtsimple(Q), "Q sqrtsimpl fail %d" % Q
> 
> So we use the regular sqrt func.
> 
> >so, next stage, you want to write a test for-loop around the main() function?
> Will try to do it now

looks great, works well.

> >btw do you already have sfpy installed and built?  if so, we can move
> >to the next phase of "from sfpy import Float32" and use that to mangle/de-mangle >FP numbers into sign/mantissa/exponent, using sfpy.Float32.get_bits
> Yes i have it installed

ok great, 

so next phase will be to use "from sfpy import Float32" and create some
arbitrary numbers, say... oh i dunno, pick one! 753.415926535897932

x = Float32(fred_the_number)
xbits = x.bits

then use the function decode_fp32(xbits) to get the s, m, e

and from there you can throw those into the function main() and see what
happens.

theeeeennn... you can do sq_test = x.sqrt() above... and on the answer,
use the function decodefp32(sq_test)...

and print those out next to the answer from using main().

if those are even remotely close we have a winner!
Comment 25 Aleksandar Kostovic 2019-04-29 07:52:00 BST
>so next phase will be to use "from sfpy import Float32" and create some
>arbitrary numbers, say... oh i dunno, pick one! 753.415926535897932
>
>x = Float32(fred_the_number)
>xbits = x.bits
>
>then use the function decode_fp32(xbits) to get the s, m, e
>
>and from there you can throw those into the function main() and see what
>happens.
>
>theeeeennn... you can do sq_test = x.sqrt() above... and on the answer,
>use the function decodefp32(sq_test)...
>
>and print those out next to the answer from using main().
>
>if those are even remotely close we have a winner!

Okay i understood up until the x = Float32(num) and xbits = x.bits. So i decided to print out the values of decode_fp32 function first, to check if its working.... aaaaand its not.

x = Float32(1234.123456789)
xbits = x.bits
    

print(decode_fp32(x))

which gives:
Traceback (most recent call last):
  File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 108, in <module>
    print(decode_fp32(x))
  File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 74, in decode_fp32
    return get_sign(x), get_exponent(x), get_mantissa(x)
  File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 64, in get_sign
    return ((x & 0x80000000) >> 31)
TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int'
Comment 26 Luke Kenneth Casson Leighton 2019-04-29 10:29:03 BST
(In reply to Aleksandar Kostovic from comment #25)

> Okay i understood up until the x = Float32(num) and xbits = x.bits. So i
> decided to print out the values of decode_fp32 function first, to check if
> its working.... aaaaand its not.
> 
> x = Float32(1234.123456789)
> xbits = x.bits
>     
> 
> print(decode_fp32(x))
> TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int'

yes, that is precisely and exactly the correct error, because you tried
to decode x (an object) not xbits (an integer)
Comment 27 Aleksandar Kostovic 2019-04-29 11:18:47 BST
>theeeeennn... you can do sq_test = x.sqrt() above... and on the answer,
>use the function decodefp32(sq_test)...
>
>and print those out next to the answer from using main().

Okay so i am confused by the thing i should do but tried to understand, so  added:

+    sq_test = x.sqrt()
+    e, m = decode_fp32(sq_test)
+    print(main(e, m))

and get an error:
File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 64, in get_sign
    return ((x & 0x80000000) >> 31)
TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int'
Comment 28 Luke Kenneth Casson Leighton 2019-04-29 11:25:42 BST
(In reply to Aleksandar Kostovic from comment #27)
> >theeeeennn... you can do sq_test = x.sqrt() above... and on the answer,
> >use the function decodefp32(sq_test)...
> >
> >and print those out next to the answer from using main().
> 
> Okay so i am confused by the thing i should do but tried to understand, so 
> added:
> 
> +    sq_test = x.sqrt()
> +    e, m = decode_fp32(sq_test)
> +    print(main(e, m))
> 
> and get an error:
> File "/home/aleksandar/projects/ieee754fpu/src/add/fsqrt.py", line 64, in
> get_sign
>     return ((x & 0x80000000) >> 31)
> TypeError: unsupported operand type(s) for &: 'sfpy.float.Float32' and 'int'

you did exactly the same thing, doh!  you passed in sq_test (an object)
*NOT* the bits *OF* the object, which are an integer.

-    sq_test = x.sqrt()
+    sq_test = x.sqrt.bits

also, decode_fp32 returns a tuple of 3 items: sign, exponent, mantissa.
you can't assign a tuple of length 3 to a tuple of length 2 (e, m)

-    e, m = decode_fp32(sq_test.bits)
+    s, e, m = decode_fp32(sq_test.bits)

try git pull.
Comment 29 Aleksandar Kostovic 2019-04-29 11:33:06 BST
>you did exactly the same thing, doh!  you passed in sq_test (an object)
>*NOT* the bits *OF* the object, which are an integer.
i think i get it now
>try git pull.
okay. i see. the question now remains how do i assemble it all?
Comment 30 Luke Kenneth Casson Leighton 2019-04-29 12:23:09 BST
(In reply to Aleksandar Kostovic from comment #29)
> >you did exactly the same thing, doh!  you passed in sq_test (an object)
> >*NOT* the bits *OF* the object, which are an integer.
> i think i get it now
> >try git pull.
> okay. i see. the question now remains how do i assemble it all?

well... run the example, you'll see that it's almost there:

1234.1234130859375 <class 'sfpy.float.Float32'>
1150960627 <class 'int'>
0 10 1721331 0x1a43f3
our sqrt 4 860665 0xd21f9
sf32 sqrt 0 5 820535 0xc8537

okay!  so now we can start exploring... and at this point, it's
probably best to go look at softfloat source code and/or also
hardfloat... 1 sec let me just commit some extra print/debug
stuff...
Comment 31 Luke Kenneth Casson Leighton 2019-04-29 12:37:51 BST
https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f32_sqrt.c#L90

what on eeeearth?  moo??  the fp16 version isn't much easier to understand:
https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f16_sqrt.c#L115

https://github.com/ucb-bar/berkeley-hardfloat/blob/master/src/main/scala/DivSqrtRecF64_mulAddZ31.scala

not understandable at all...

RocketChip uses that hardfloat library, so no joy there...

anyone else find anything?  we might be stuck with trying to work out
how f32_sqrt.c operates.
Comment 32 Luke Kenneth Casson Leighton 2019-04-29 15:42:40 BST
shock! near-total guess-work, and this works!

    m |= 1<<23 # set top bit (the missing "1" from mantissa)
    m <<= 25

    sm, se = main(m, e)
    sm >>= 1
    sm = get_mantissa(sm)

ok so the |= 1<<23 wasn't total guess-work, that's what softfloat does:

sigA = (sigA | 0x00800000)<<8;

which is to set bit 23 (and then shift up by 8)

i shifted up by 25 - as a guess - to give a couple extra bits in the
mantissa, then shifted down by one (sm >>= 1)

the call to get_mantissa() masks out the top bits... we should really
call "normalise" to adjust exp/mantissa...

so quite a lot to be done, however that's the hard part.

dang that was lucky.


x 1234.1234130859375 <class 'sfpy.float.Float32'>
sqrt 35.13009262084961
1150960627 <class 'int'>
x decode 0 10 1721331 0x1a43f3
our  sqrt 0 5 820535 0xc8537 0b11001000010100110111
sf32 sqrt 0 5 820535 0xc8537 0b11001000010100110111

x 32.099998474121094 <class 'sfpy.float.Float32'>
sqrt 5.665686130523682
1107322470 <class 'int'>
x decode 0 5 26214 0x6666
our  sqrt 0 2 3493196 0x354d4c 0b1101010100110101001100
sf32 sqrt 0 2 3493197 0x354d4d 0b1101010100110101001101

x 16.0 <class 'sfpy.float.Float32'>
sqrt 4.0
1098907648 <class 'int'>
x decode 0 4 0 0x0
our  sqrt 0 2 0 0x0 0b0
sf32 sqrt 0 2 0 0x0 0b0

x 8.0 <class 'sfpy.float.Float32'>
sqrt 2.8284270763397217
1090519040 <class 'int'>
x decode 0 3 0 0x0
our  sqrt 0 1 3474675 0x3504f3 0b1101010000010011110011
sf32 sqrt 0 1 3474675 0x3504f3 0b1101010000010011110011

x 8.5 <class 'sfpy.float.Float32'>
sqrt 2.915475845336914
1091043328 <class 'int'>
x decode 0 3 524288 0x80000
our  sqrt 0 1 3839784 0x3a9728 0b1110101001011100101000
sf32 sqrt 0 1 3839784 0x3a9728 0b1110101001011100101000
Comment 33 Aleksandar Kostovic 2019-04-30 10:42:31 BST
>shock! near-total guess-work, and this works!
YEY! Thats awesome :)

>the call to get_mantissa() masks out the top bits... we should really
>call "normalise" to adjust exp/mantissa...
>
>so quite a lot to be done, however that's the hard part.
so now we create a function normalise or what?
Comment 34 Luke Kenneth Casson Leighton 2019-04-30 11:58:23 BST

On Tue, Apr 30, 2019 at 10:42 AM <bugzilla-daemon@libre-riscv.org> wrote:
>
> http://bugs.libre-riscv.org/show_bug.cgi?id=74
>
> --- Comment #33 from Aleksandar Kostovic <alexandar.kostovic@gmail.com> ---
> >shock! near-total guess-work, and this works!
> YEY! Thats awesome :)
>
> >the call to get_mantissa() masks out the top bits... we should really
> >call "normalise" to adjust exp/mantissa...
> >
> >so quite a lot to be done, however that's the hard part.
> so now we create a function normalise or what?

rounding and normalising, yes.  rounding based on that new variable "lowbits" being >= 2, and normalise... mmm...

             m.d.sync += z.m.eq(z.m + 1) # mantissa rounds up
            with m.If(z.m == z.m1s): # all 1s
                m.d.sync += z.e.eq(z.e + 1) # exponent rounds up

looks like we're supposed to just increment the exponent by one if the mantissa becomes all 1s.

so just something like this:

if (lowbits >= 2):
    m += 1
    if get_mantissa(m) == ((1<<24)-1):
       e += 1

something like that.  it might be sm and se not m and e

then we can look at re-constructing the IEEE754 number, doing something similar
to the unit test code.

also, we need to work out the exceptions, NaN, Inf, etc.
Comment 35 Luke Kenneth Casson Leighton 2019-04-30 14:58:26 BST
https://github.com/ucb-bar/berkeley-softfloat-3/blob/master/source/f32_sqrt.c#L66

the NaN/Inf stuff looks quite straightforward
Comment 36 Aleksandar Kostovic 2019-04-30 15:18:07 BST
>the NaN/Inf stuff looks quite straightforward
Yes, i will now create functions normalize. Lets see how that works...
Comment 37 Aleksandar Kostovic 2019-04-30 15:43:51 BST
Created the normalise func(without special cases for NaN and inf)
Actually the more i look at it, it isnt clear to me how to create the special cases for NaN and inf
Comment 38 Luke Kenneth Casson Leighton 2019-04-30 17:24:40 BST
(In reply to Aleksandar Kostovic from comment #37)
> Created the normalise func(without special cases for NaN and inf)
> Actually the more i look at it, it isnt clear to me how to create the
> special cases for NaN and inf

Theyre a separate function anyway.
Earlier in process, way earlier than
rounding.

Lets start with the c code,
Convert to python.
Can you add separate fn taking
block of code from softfloat link
Commit as-is first then try doing
straight conversion.
check jondawson verilog code
for notes on what INF means,
what NAN.
Piece together. will help tomorrow.
Comment 39 Luke Kenneth Casson Leighton 2019-04-30 21:55:42 BST
* moved normalise up to where it's used (above fsqrt_test)
* added lowbits arg (was missed out)
* called normalise
* seems to work on the 8-or-so examples so far.


+#normalization function
+def normalise(s, m, e, lowbits):
+    if (lowbits >= 2):
+        m += 1
+    if get_mantissa(m) == ((1<<24)-1):
+        e += 1
+    return s, m, e
+
+
 def fsqrt_test(x):
 
     xbits = x.bits
@@ -99,6 +108,9 @@ def fsqrt_test(x):
     sm >>= 2
     sm = get_mantissa(sm)
     #sm += 2
+
+    s, sm, se = normalise(s, sm, se, lowbits)
+
     print("our  sqrt", s, se, sm, hex(sm), bin(sm), "lowbits", lowbits,
                                                     "rem", hex(sr))
     if lowbits >= 2:
Comment 41 Luke Kenneth Casson Leighton 2019-05-07 01:27:31 BST
(In reply to Aleksandar Kostovic from comment #40)
 
> Done this from FPU verilog code, but i still dont know how i would integrate
> it better.
> 
> Can anyone help out?

 i added some comments that will help you to transition it from the
 IEEE754 format that z is in, into s/e/m format.

 basically bit 31 is s, bits 0..22 are the mantissa, and 23 to 30 are the
 exponent.

 also the IEEE754 format adds 128 to the exponent.

 by working in that 32-bit format, you have nothing that can be returned
 from that function.

 so you need to *translate* the z-stuff *back* into s / e / m and i've
 provided some comments which will allow you to do that.
Comment 42 Aleksandar Kostovic 2019-05-07 14:19:54 BST
>i added some comments that will help you to transition it from the
> IEEE754 format that z is in, into s/e/m format.
Thanks very much. I think i understood ;)

https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/fpsqrt/fsqrt.py;h=f7410703643e763c0ad5c43c57e6c39b3c12e71e;hp=5ae2bb253fa1356ec89f2960930b884faf51a2db;hb=75f9bd28e14c789872060997a655335bbdaed1aa;hpb=5690d1f94f158cbc8e715b9943d3eaba32cec9e3

Take a look
Comment 43 Luke Kenneth Casson Leighton 2019-05-08 00:04:41 BST
(In reply to Aleksandar Kostovic from comment #42)
> >i added some comments that will help you to transition it from the
> > IEEE754 format that z is in, into s/e/m format.
> Thanks very much. I think i understood ;)
> 
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/
> fpsqrt/fsqrt.py;h=f7410703643e763c0ad5c43c57e6c39b3c12e71e;
> hp=5ae2bb253fa1356ec89f2960930b884faf51a2db;
> hb=75f9bd28e14c789872060997a655335bbdaed1aa;
> hpb=5690d1f94f158cbc8e715b9943d3eaba32cec9e3
> 
> Take a look

        s = 1           # sign (so, s=1)
        e = 255         # exponent (minus 128, so e = 127

note the comment that the format of IEEE offsets by 128?  so when this value 255 goes through the IEEE format processing, 128 will be added and the value attempted to be stored will be 255+128.


        m = 1<<22     # high bit of mantissa, so m = 1<<22 i think

good...

        m = 1

... so... err... why are you trying to then destroy the value?

        m = 1<<22     # rest of mantissa is zero, so m = 1<<22 is good.

... and then set it again....

        m = 0

... and then destroy it again?
Comment 45 Luke Kenneth Casson Leighton 2019-05-08 15:59:24 BST
(In reply to Aleksandar Kostovic from comment #44)
> oh i made so stupid errors...
> 
> so check out now:
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blobdiff;f=src/ieee754/
> fpsqrt/fsqrt.py;h=0c8aa1e539f1e9e61d6d8b89610900cda9312a0b;
> hp=f7410703643e763c0ad5c43c57e6c39b3c12e71e;
> hb=eb7c98a477798c050c7c3103296e3f8f9d6e4bff;
> hpb=c9a2ef1e7e5510db37ebde1c3a5faa9efb9a305f

looks great, and would be fantastic if i hadn't got the offset wrong.
it's supposed to be "subtract 127" not "subtract 128" (*sigh*),
have a look at the FPNumDecode class.

once that's sorted, the next task is in comment #35 above.
Comment 46 Luke Kenneth Casson Leighton 2019-05-08 15:59:46 BST
> once that's sorted, the next task is in comment #35 above.

hi alexander if you can sort out the NaN and INF (special cases),
we can declare this one done and you can get paid.
Comment 47 Jacob Lifshay 2020-01-28 08:06:44 GMT
Should we declare this complete enough, pay Aleksandar, and close it? We have a working HW implementation (div/mod/sqrt/rsqrt pipeline) and working SW emulation (simple-soft-float)
Comment 48 Luke Kenneth Casson Leighton 2020-01-28 12:44:50 GMT
(In reply to Jacob Lifshay from comment #47)
> Should we declare this complete enough, pay Aleksandar, and close it? We
> have a working HW implementation (div/mod/sqrt/rsqrt pipeline) and working
> SW emulation (simple-soft-float)

yes good idea.  aleksander are you around?  we have $ for you :)
Comment 49 Aleksandar Kostovic 2020-01-28 15:25:22 GMT
(In reply to Luke Kenneth Casson Leighton from comment #48)
> (In reply to Jacob Lifshay from comment #47)
> > Should we declare this complete enough, pay Aleksandar, and close it? We
> > have a working HW implementation (div/mod/sqrt/rsqrt pipeline) and working
> > SW emulation (simple-soft-float)
> 
> yes good idea.  aleksander are you around?  we have $ for you :)

So I saw this is marked as done huh. Haven't been on the mailing list in ages. See you guys are making good progress all-arround.

I am more than happy to accept. Will email you at lkcl@lkcl.net with acc details :)