Bug 1087 - add pseudocode to properly setup for fp traps according to PowerISA spec convention
Summary: add pseudocode to properly setup for fp traps according to PowerISA spec conv...
Status: IN_PROGRESS
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Specification (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- normal
Assignee: Jacob Lifshay
URL:
Depends on:
Blocks: 1076 1072
  Show dependency treegraph
 
Reported: 2023-05-21 10:41 BST by Jacob Lifshay
Modified: 2023-10-03 09:11 BST (History)
3 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2023-05-21 10:41:19 BST
see table 63 on v3.1B page 670(696) for a full list of how fp trap/exception handling should be done (at least for vsx, tbd if that matches sffs ignoring VSR vs. FPR)

* DONE: adjust fcvt*/fmv*
* TODO: adjust fadd/fsub/etc. existing PowerISA instructions without spec. pseudocode
* TODO: adjust fptrans/fft/dct/etc. proposed instructions that can cause fp traps.

original comment:

https://libre-soc.org/irclog/%23libre-soc.2023-05-21.log.html#t2023-05-21T10:29:37

things like fcvtfg don't need to prevent writing the output since the output is in a different regfile and thus can't possibly overlap with the input reg.

things like fsinpi f3, f3 need to prevent writing the output reg when they trap so the trap can see what the input was.
Comment 1 Jacob Lifshay 2023-05-21 11:16:57 BST
basically we need to add/remove the pseudocode that explicitly sets up register state (FRT/RT and FPSCR, not SRR0 or other trap machinery) for taking a fp trap.
see xscvdpsxds for an example -- we need to add/remove the `if vex_flag`
Comment 2 Luke Kenneth Casson Leighton 2023-05-21 11:50:12 BST
(In reply to Jacob Lifshay from comment #0)
> https://libre-soc.org/irclog/%23libre-soc.2023-05-21.log.html#t2023-05-21T10:
> 29:37
> 
> things like fcvtfg don't need to prevent writing the output since the output
> is in a different regfile and thus can't possibly overlap with the input reg.

this is a misunderstanding of how the simulator works. a register that
requires writing *is* a local variable, *is* a return result from the
function.

    def op_fcvtfg(self, RB, FPSCR):
        if eq(IT[0], 0):
            FRT = copy_assign_rhs(result)
        return (FRT, FPSCR,)

> things like fsinpi f3, f3 need to prevent writing the output reg when they
> trap so the trap can see what the input was.

this is automatic and inherent.  the write of the results *is* automatically
inherently 100% without fail absolute without fail prevented.

there is no other choice.  declaring this bugreport invalid as there is
no change required (and no change ever going to happen) and it was raised
based on a misunderstanding.
Comment 3 Luke Kenneth Casson Leighton 2023-05-21 11:59:40 BST
(In reply to Jacob Lifshay from comment #1)
> basically we need to add/remove the pseudocode that explicitly sets up
> register state (FRT/RT and FPSCR, not SRR0 or other trap machinery) for
> taking a fp trap.
> see xscvdpsxds for an example -- we need to add/remove the `if vex_flag`

i have absolutely no idea why on earth that would be necessary.

can you please confirm that you understand that there is not and will never be
a change to action what you have recommended?  and that you understand
that FRT is a *local* variable returned as a *local* result, where ISACaller
has control *after* the pseudocode function is called as to whether the
result is written to the regfile?

what you should *not* have done is bypass the mechanism that was established
2 years ago for passing in registers and returning them

        if name in ('XLEN', 'FPSCR') or name in BFP_FLAG_NAMES:
            attr = ast.Name("self", ast.Load())
            p[0] = ast.Attribute(attr, name, ast.Load(), lineno=p.lineno(1))

that is *NOT* the correct behaviour.

look at the pseudocode

    def op_fcvttg_(self, FRB, ****>>>>> FPSCR <<<<<<*****):

        if eq(vex_flag, 0):
            RT = copy_assign_rhs(result)
            >>> NO! self. <<<< FPSCR.FPRF = copy_assign_rhs(self.undefined(
                SelectableInt(value=0x0, bits=5)))

        return (RT, *****>>>>> FPSCR <<<<<*****, overflow,)

you should *NOT* have attempted to bypass FPSCR being passed in as
a local variable, bypassed FPSCR as a return result.

updating of FPSCR goes into *check_write()*
Comment 4 Luke Kenneth Casson Leighton 2023-05-21 12:07:16 BST
> change pseudocode to prevent output register write only when 
> causing a fp trap and output is in same regfile as input

whatever you do, do *not* do this.  i will change the bugreport
to the correct (required) action.
Comment 5 Luke Kenneth Casson Leighton 2023-05-21 12:13:18 BST
commit 46690bb38bf2534ea200503d753a422dc866a000
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sun May 21 12:01:45 2023 +0100

    FPSCR should never have been added to "bypass" the incoming
    local parameter or its return result
Comment 6 Luke Kenneth Casson Leighton 2023-05-21 12:18:15 BST
done.

+        # write FPSCR
+        if name in ['FPSCR', ]:
+            log("write FPSCR 0x%x" % (output.value))
+            self.FPSCR.eq(output)
+            return


commit 20cf6adbcb6800f5d9e15c59a5f7e280ebef1352 (HEAD -> master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sun May 21 12:17:39 2023 +0100

    explicitly update FPSCR from list of return results
Comment 7 Luke Kenneth Casson Leighton 2023-05-21 12:31:23 BST
(In reply to Luke Kenneth Casson Leighton from comment #4)
> > change pseudocode to prevent output register write only when 
> > causing a fp trap and output is in same regfile as input
> 
> whatever you do, do *not* do this.  i will change the bugreport
> to the correct (required) action.

so the bugreport is invalid (but should not be closed as such
until it has been understood why).

to "effect" a "no change", the variable that requires "not to be changed"
must:

1) be passed in as an INPUT parameter
2) not be modified by the pseudocode
3) be passed OUT as a return result.

the "no change" is achieved by simply not making any modifications to the
local variable (parameter)

the return result **WILL** get written (caveats: VLi=0) but the contents
will **HAPPEN** to be indentical... resulting in "the appearance of no
change having taken place"

CPU efficiency of the Simulator is *not* a high priority here.
Comment 8 Luke Kenneth Casson Leighton 2023-05-21 14:36:57 BST
ok you'll need to pass in FPSCR into all helper routines,
or better we use @inject().  i think that is safer/better,
a lot less hassle - let me try it out
Comment 9 Luke Kenneth Casson Leighton 2023-05-21 15:03:13 BST
(In reply to Luke Kenneth Casson Leighton from comment #8)
> ok you'll need to pass in FPSCR into all helper routines,
> or better we use @inject().  i think that is safer/better,
> a lot less hassle - let me try it out

done - sorted

commit 7df4bbfd7b87dde3d1c2b82e1e7ffbb58e075f02 (HEAD -> master, origin/master)
Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date:   Sun May 21 15:02:44 2023 +0100

    hack-add @inject() into pyfnwriter, also take the
    opportunity to stop doing "import ISACallerFnHelper as something"
    by doing a hack-substitute on the class name
Comment 10 Jacob Lifshay 2023-05-21 19:38:15 BST
(In reply to Luke Kenneth Casson Leighton from comment #2)
> (In reply to Jacob Lifshay from comment #0)
> > https://libre-soc.org/irclog/%23libre-soc.2023-05-21.log.html#t2023-05-21T10:
> > 29:37
> > 
> > things like fcvtfg don't need to prevent writing the output since the output
> > is in a different regfile and thus can't possibly overlap with the input reg.
> 
> this is a misunderstanding of how the simulator works.

this bug is not motivated by how the simulator works, but by how the PowerISA spec has decided to setup state in preparation for fp traps. we need to match the PowerISA spec. the simulator must be adjusted to match pre-existing convention, not convention adjusted to match our arbitrary simulator internals.

(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #1)
> > see xscvdpsxds for an example -- we need to add/remove the `if vex_flag`
> 
> i have absolutely no idea why on earth that would be necessary.

because that's the convention PowerISA decided on and, unless you want to propose to the ISA WG that they rewrite the pseudocode for all their existing fp ops (both Appendix A (see SNaN Operand for examples) and all fp vmx/vsx ops and probably decimal fp too), we need to match their existing convention.

please confirm you understand that we need to solve the issue of getting the right pseudocode for the spec to match existing conventions, *because those conventions will 99.9% likely not change to suit our existing simulator because of how massive the change would be*.

(In reply to Luke Kenneth Casson Leighton from comment #4)
> > change pseudocode to prevent output register write only when 
> > causing a fp trap and output is in same regfile as input
> 
> whatever you do, do *not* do this.  i will change the bugreport
> to the correct (required) action.

please *don't repurpose bugs* like this, open a new bug and close the old one instead. it is especially problematic when the original bug is still a bug that needs fixing (like in this case), but also just confusing for people who see an email for a new bug and click on the link but are lead to a completely different bug because the entry was usurped.

please open a new bug for "change ISACaller and correct bug introduced in parser.py where it bypasses FPSCR as a local parameter and a return result" and 
change this bug's title back to "change pseudocode to prevent output register write only when causing a fp trap and output is in same regfile as input" or at least "add pseudocode to properly setup for fp traps according to PowerISA spec convention"
Comment 11 Luke Kenneth Casson Leighton 2023-05-21 20:49:44 BST
(In reply to Jacob Lifshay from comment #10)

> this bug is not motivated by how the simulator works, but by how the
> PowerISA spec has decided to setup state in preparation for fp traps. we
> need to match the PowerISA spec.

yes? and that's being done, yes? i see no difference between this
(xscvdpsxds)

if vex_flag=0 then do
   VSR[32xTX+T].dword[1] <- result
   VSR[32xTX+T].dword[2] <- 0x0000_0000_0000_0000
   FPSCR.FPRF <- 0bUUUUU
   FPSCR.FR <- inc_flag
   FPSCR.FI <- xx_flag
end
else do
   FPSCR.FR <- 0b0
   FPSCR.FI <- 0b0
end

and this (fcvttg):

    vex_flag <- FPSCR.VE & vx_flag
    if vex_flag = 0 then
        RT <- result
        FPSCR.FPRF <- undefined(0b00000)
        FPSCR.FR <- inc_flag
        FPSCR.FI <- xx_flag
    else
        FPSCR.FR <- 0
        FPSCR.FI <- 0

i see no difference between those in any way.


> the simulator must be adjusted to match
> pre-existing convention, 

yes? and i have not seen anything that needs to be changed *to* match?


> not convention adjusted to match our arbitrary
> simulator internals.

agreed... so what precisely and exactly are you envisioning needs to "change"?

you have not shown me where anything is that actually *needs to change*


> please confirm you understand that we need to solve the issue of getting the
> right pseudocode for the spec to match existing conventions,

i see neither an error in the pseudocode nor anything that needs to
change in the simulator to support a change that i cannot see is even
needed.



> please *don't repurpose bugs* like this, open a new bug and close the old
> one instead. it is especially problematic when the original bug is still a
> bug that needs fixing (like in this case),

i see no bug.

you need to tell me *precisely and exactly* what change is needed, and
where.  at present as i literally see no difference in the pseudocode
i am completely unable to understand why there is any change in the
simulator's behaviour needed.

> change this bug's title back to "change pseudocode to prevent output
> register write only when causing a fp trap and output is in same regfile as
> input" or at least "add pseudocode to properly setup for fp traps according
> to PowerISA spec convention"

sorry, i'm not doing that until - unless - you can demonstrate that there
is an actual change needed.

i cannot see any functional difference between the two pieces of text, above.
the only difference i see is the register target VSR instead of RT and that
is immaterial.
Comment 12 Jacob Lifshay 2023-05-21 21:09:21 BST
(In reply to Luke Kenneth Casson Leighton from comment #11)
> (In reply to Jacob Lifshay from comment #10)
> 
> > this bug is not motivated by how the simulator works, but by how the
> > PowerISA spec has decided to setup state in preparation for fp traps. we
> > need to match the PowerISA spec.
> 
> yes? and that's being done, yes? i see no difference between this
> (xscvdpsxds)
> 
<snip>
> 
> and this (fcvttg):
> 
<snip>
> 
> i see no difference between those in any way.

yes, because the places that we must change are all our other fp instructions (fptrans, fft ops, etc.).

e.g. fatan2s needs to be changed to something like:
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/fptrans.mdwn;h=e6799bb1a72767793330bfae70e5a17b6c6d6af5;hb=371d91b299c0e4bd7b23e660b9936ed40debb824#l14

result <- DOUBLE(bfp32_ATAN2(SINGLE((FRA)), SINGLE((FRB))))
if FPSCR.FEX = 0 then # computed as vex_flag in fcvt*
    FRT <- result
else
    # don't write FRT
    # do other trap setup -- TBD

ops like fmv don't need to change since they never cause fp traps.

additionally, fcvt* might change to always write [F]RT (since that simplifies hardware and can't overwrite the input, unlike basically all other fp ops) but that change isn't technically necessary.
Comment 13 Jacob Lifshay 2023-05-21 21:16:00 BST
(In reply to Jacob Lifshay from comment #12)
> e.g. fatan2s needs to be changed to something like:
> https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/
> fptrans.mdwn;h=e6799bb1a72767793330bfae70e5a17b6c6d6af5;
> hb=371d91b299c0e4bd7b23e660b9936ed40debb824#l14
> 
> result <- DOUBLE(bfp32_ATAN2(SINGLE((FRA)), SINGLE((FRB))))
> if FPSCR.FEX = 0 then # computed as vex_flag in fcvt*

actually, rather than testing FEX, we need to test if *this instruction* sets FEX (hence vex_flag), since FEX can be 1 on entry to the instruction (if MSR exceptions are disabled or in non-precise exception modes)

>     FRT <- result
> else
>     # don't write FRT
>     # do other trap setup -- TBD
> 
> ops like fmv don't need to change since they never cause fp traps.
> 
> additionally, fcvt* might change to always write [F]RT (since that
> simplifies hardware and can't overwrite the input, unlike basically all
> other fp ops) but that change isn't technically necessary.
Comment 14 Luke Kenneth Casson Leighton 2023-05-21 21:18:56 BST
(In reply to Jacob Lifshay from comment #12)

> yes, because the places that we must change are all our other fp
> instructions (fptrans, fft ops, etc.).

ah ok.  so given that the places that *are* in the right pattern
(the if FEX then FRT/FPSCR else FPSCR parts) require no change
to the simulator, and that *works* right now (does the right
thing and requires - required - no change), i still do not see
why any change would be needed once all of those instructions
are modified to follow the exact same pattern

> e.g. fatan2s needs to be changed to something like:
>
> result <- DOUBLE(bfp32_ATAN2(SINGLE((FRA)), SINGLE((FRB))))
> if FPSCR.FEX = 0 then # computed as vex_flag in fcvt*
>     FRT <- result
> else
>     # don't write FRT
>     # do other trap setup -- TBD

where i assume by "do other trap setup" you mean "modify FPSCR flags
just like in the other locations"?
Comment 15 Jacob Lifshay 2023-05-21 21:22:56 BST
(In reply to Luke Kenneth Casson Leighton from comment #14)
> once all of those instructions
> are modified to follow the exact same pattern

this bug is for modifying all those instructions' pseudocode
> 
> > e.g. fatan2s needs to be changed to something like:
> >
> > result <- DOUBLE(bfp32_ATAN2(SINGLE((FRA)), SINGLE((FRB))))
> > if FPSCR.FEX = 0 then # computed as vex_flag in fcvt*
> >     FRT <- result
> > else
> >     # don't write FRT
> >     # do other trap setup -- TBD
> 
> where i assume by "do other trap setup" you mean "modify FPSCR flags
> just like in the other locations"?

yes, basically.
Comment 16 Jacob Lifshay 2023-05-21 21:36:09 BST
(In reply to Jacob Lifshay from comment #15)
> > where i assume by "do other trap setup" you mean "modify FPSCR flags
> > just like in the other locations"?
> 
> yes, basically.

see table 63 on v3.1B page 670(696) for a full list of how fp trap/exception handling should be done
Comment 17 Jacob Lifshay 2023-05-21 21:42:00 BST
(In reply to Luke Kenneth Casson Leighton from comment #11)
> > please *don't repurpose bugs* like this, open a new bug and close the old
> > one instead. it is especially problematic when the original bug is still a
> > bug that needs fixing (like in this case),
> 
> i see no bug.

even if you see no bug in the future, since I obviously think there is a bug and it's better to not mix mostly unrelated issues, please don't repurpose bugs like this. please just open a new one.
> 
> > change this bug's title back to "change pseudocode to prevent output
> > register write only when causing a fp trap and output is in same regfile as
> > input" or at least "add pseudocode to properly setup for fp traps according
> > to PowerISA spec convention"
> 
> sorry, i'm not doing that until - unless - you can demonstrate that there
> is an actual change needed.

since that has been demonstrated in comment #12 and comment #15, please do make the corresponding title change
Comment 18 Luke Kenneth Casson Leighton 2023-05-21 23:38:11 BST
(In reply to Jacob Lifshay from comment #17)

> since that has been demonstrated in comment #12 and comment #15, please do
> make the corresponding title change

willdo, tomorrow.
Comment 19 Jacob Lifshay 2023-05-24 02:16:10 BST
(In reply to Luke Kenneth Casson Leighton from comment #18)
> (In reply to Jacob Lifshay from comment #17)
> 
> > since that has been demonstrated in comment #12 and comment #15, please do
> > make the corresponding title change
> 
> willdo, tomorrow.

it's been a few days, so I just made the change and opened a new bug (bug #1088) for "change ISACaller and correct bug introduced in parser.py where it bypasses FPSCR as a local parameter and a return result" since i have some comments on that.
Comment 20 Jacob Lifshay 2023-05-24 02:30:29 BST
other *fun* side-effects of fcvtfg not writing RT (if we decide to not accept my proposal) is that we either have to manually calculate CR0 in the pseudocode or not change CR0 (messes with data-dependent fail-first) or (probably the worst option) read RT and calculate CR0 from the read-in value. (CR1 doesn't have that problem since it's input always comes from FPSCR).

I think the best option is to just have fcvtfg always write RT as I proposed, since suppressing writes (so the trap handler can see the input) is only necessary when targeting the same register file as inputs.
Comment 21 Jacob Lifshay 2023-05-24 02:34:20 BST
(In reply to Jacob Lifshay from comment #20)
> other *fun* side-effects of fcvtfg

whoops, I meant fcvttgo.

> not writing RT (if we decide to not
> accept my proposal) is that we either have to manually calculate CR0 in the
> pseudocode or not change CR0 (messes with data-dependent fail-first) or
> (probably the worst option) read RT and calculate CR0 from the read-in
> value. (CR1 doesn't have that problem since it's input always comes from
> FPSCR).
> 
> I think the best option is to just have fcvtfg

same here

> always write RT as I
> proposed, since suppressing writes (so the trap handler can see the input)
> is only necessary when targeting the same register file as inputs.
Comment 22 Luke Kenneth Casson Leighton 2023-05-24 10:28:33 BST
(In reply to Jacob Lifshay from comment #20)

jacob: i think you keep misunderstanding: RT *is* always written by
the simulator. i am NEVER going to change the Simulator to stop the
value returned by the pseudocode function from being
written.

*regardless* of what the contents are the contents of FRT
are **ALWAYS** passed to the function ISACaller.do_outregs_nia().

whatever goes in to RT *is* written out back into FPR, period.
this is NEVER going to change.

it is therefore up to you to make a choice in the pseudocode
whether to CHANGE the value of FRT or NOT change the value of
FRT.

but the value regardless of whether it is or is not changed
by the pseudocode WILL ALWAYS BE WRITTEN BACK INTO THE REGISTER.

now, what you *might* be assuming is that "there is a flag
in the operator "<-" which if used on a return result it
indicates to the Simulator to store the result in the register",
this is false: you are likely thinking of *hardware* here
(which is perfectly reasonable).

comments in the unit test such as this are therefore misleading
or at least ambiguous:

+                # FIXME: #1087 proposes to change pseudocode of fcvt* to
+                # always write output, this implements reading RT when output
+                # isn't written, which is terrible

it's not terrible at all for the *Simulator*, it doesn't care.

what you might be thinking of is *hardware* always writing output,
and/or always reading input, which yes that would waste resources
if it was unnecessary.  when writing the HDL it would be one of
the jobs of the implementation to set the "output-enable" bit,
indicating that the register is to be written (or cancelled if
not set).

with the code-reuse from the unit tests it's probably a good
idea to clarify that hardware should watch out for that.
the most important bit there will be if it is extremely easy
(based preferably on immediate-flags and nothing else,
certainly not reading the contents of the register) to
determine if RT needs to be read at all.

also, i have to say, i don't fully understand what you mean
"reading RT when output isn't written" - are you saying that
the instruction has to become a READ-MODIFY-WRITE?  or that
in *hardware* it is necessary to read RT, perform some
checks, and then decide "actually RT need not be written"?
this makes no sense: there really should only be MODIFY-WRITE
(no read)

(there are only a few read-modify-write instructions, they're
 explicitly listed as such in the spec, hence why i ask)
Comment 23 Jacob Lifshay 2023-05-24 10:48:34 BST
(In reply to Luke Kenneth Casson Leighton from comment #22)
> (In reply to Jacob Lifshay from comment #20)
> 
> jacob: i think you keep misunderstanding: RT *is* always written by
> the simulator.

i understand that perfectly fine, my proposal has nothing to do with how the simulator works, it is to make the pseudocode always write to RT so hw doesn't have to mask off writes to CR0 (causes problems for ddff) or read RT from the regfile to compute CR0.

> comments in the unit test such as this are therefore misleading
> or at least ambiguous:
> 
> +                # FIXME: #1087 proposes to change pseudocode of fcvt* to
> +                # always write output, this implements reading RT when
> output
> +                # isn't written, which is terrible

this is terrible for hw, it means it neads to read another register just for the very rare case when it traps so it can properly calculate CR0 from the RT value it read.
> 
> it's not terrible at all for the *Simulator*, it doesn't care.

yup.

> also, i have to say, i don't fully understand what you mean
> "reading RT when output isn't written" - are you saying that
> the instruction has to become a READ-MODIFY-WRITE?

technically it'd be read-or-write in hardware

>  or that
> in *hardware* it is necessary to read RT, perform some
> checks, and then decide "actually RT need not be written"?
> this makes no sense: there really should only be MODIFY-WRITE
> (no read)

read needed to calculate CR0 when trapping, since then RT is not written

since we all agree that's terrible, I want to change the pseudocode to always write RT even if it traps, that way it'd never need to read RT or mask off writes (ignoring SVP64 predicates). since you had said to not make that change (before you understood what I was talking about) i'm trying to get confirmation that making fcvttg's pseudocode always write RT is fine with you.
Comment 24 Luke Kenneth Casson Leighton 2023-05-24 10:56:24 BST
wait... are you saying that, at the moment, a zero is written into
the regfile when a TRAP is called?

are you then saying "that is bad"?  because if so don't believe it is:
compilers have to expect that register contents to be (statically)
allocated regardless of the exception.

IBM declares the result register "UNDEFINED" under these circumstances
Comment 25 Jacob Lifshay 2023-05-24 11:02:40 BST
(In reply to Luke Kenneth Casson Leighton from comment #24)
> wait... are you saying that, at the moment, a zero is written into
> the regfile when a TRAP is called?

no, at the moment, the GPR regfile is unmodified when a fp trap occurs. the value of RT must be read from the GPR regfile so Rc=1 can compute the correct value of CR0.
> 
> are you then saying "that is bad"?

yes, for hw.

>  because if so don't believe it is:
> compilers have to expect that register contents to be (statically)
> allocated regardless of the exception.
> 
> IBM declares the result register "UNDEFINED" under these circumstances

imho that's unacceptable here, RT must either be unmodified (not written; for consistency with fp ops even though the reason that's done for fp ops doesn't apply here) or have the correct output (following the defined conversion semantics).
Comment 26 Jacob Lifshay 2023-05-24 11:10:07 BST
current semantics of fcvttgo.:
v = fptoint(FRB)
if overflow and enabled:
    # do *not* modify RT
    CR0 = compute_rc(RT)
    trap()
else:
    RT = v
    CR0 = compute_rc(RT)

proposed semantics of fcvttgo.:
v = fptoint(FRB)
if overflow and enabled:
    RT = v # writing is fine since it can't overwrite FRB due to FPR vs. GPR
    CR0 = compute_rc(RT)
    trap()
else:
    RT = v
    CR0 = compute_rc(RT)
Comment 27 Luke Kenneth Casson Leighton 2023-05-24 11:16:49 BST
(In reply to Jacob Lifshay from comment #23)
> (In reply to Luke Kenneth Casson Leighton from comment #22)
> > (In reply to Jacob Lifshay from comment #20)
> > 
> > jacob: i think you keep misunderstanding: RT *is* always written by
> > the simulator.
> 
> i understand that perfectly fine, my proposal has nothing to do with how the
> simulator works, it is to make the pseudocode always write to RT so hw
> doesn't have to mask off writes to CR0 (causes problems for ddff) or read RT
> from the regfile to compute CR0.

no you cannot do that, it entirely breaks how CR-field co-results work.

such a change "in some cases Rc=1 means test the previous value" would
not go down well with the ISA WG.


> this is terrible for hw, it means it neads to read another register just for
> the very rare case when it traps so it can properly calculate CR0 from the
> RT value it read.

but it should not have read RT [in order to do that] as that is totally
the wrong behaviour and a misunderstanding of what CRf co-results actually
are.

co-results are associated with the result being computed by the
hardware (ok pseudocode).

and in the Simulator, the *return* results are what are tested - even
before they're put into the regfile.

that's why handle_condition goes to the trouble of hunting inside the
list of return results, not re-reading the contents of any registers.

> read needed to calculate CR0 when trapping, since then RT is not written

then perform the change to CR0 explicitly, and leave RT alone.
(btw it's perfectly valid to have RT "undefined" and CR0 to be
"defined").

parser.py is smart enough to spot that the pseudocode contains
CR0, and tell ISACaller *not* attempt to perform the [usual] computation,
because parser.py places CR0 into the list of return results.

there is at least one place where this is done (hence why i had to
add the feature).  oh!  yes - prefix_codes!

    openpower/isa/prefix_codes.mdwn:
    CR0 <- ra_used || (tree_index >= 64) || found || hit_end

yes this would mean that in the *non*-exceptional case you have to
duplicate the functionality of Rc=1.  yes it's a pain.
Comment 28 Luke Kenneth Casson Leighton 2023-05-24 11:20:33 BST
(In reply to Jacob Lifshay from comment #26)
> current semantics of fcvttgo.:
> v = fptoint(FRB)
> if overflow and enabled:
>     # do *not* modify RT
>     CR0 = compute_rc(RT)
>     trap()

no, really don't do that.  it should be this:

v = fptoint(FRB)

if overflow and enabled:
    CR0.SO = 1  CR.EQ/LT/GT = UNDEFINED
    trap()

actually, when overflow-bit is set, ISACaller already
spots that and already sets CR.SO.

and i think it reasonable to simply leave the rest of
the contents of CR0 completely alone (in the pseudocode)
and have english-language text stating "if overflow
all bits of CR0 except for CR0.SO are UNDEFINED"
Comment 29 Jacob Lifshay 2023-05-24 11:23:26 BST
(In reply to Luke Kenneth Casson Leighton from comment #27)
> (In reply to Jacob Lifshay from comment #23)
> > (In reply to Luke Kenneth Casson Leighton from comment #22)
> > > (In reply to Jacob Lifshay from comment #20)
> > > 
> > > jacob: i think you keep misunderstanding: RT *is* always written by
> > > the simulator.
> > 
> > i understand that perfectly fine, my proposal has nothing to do with how the
> > simulator works, it is to make the pseudocode always write to RT so hw
> > doesn't have to mask off writes to CR0 (causes problems for ddff) or read RT
> > from the regfile to compute CR0.
> 
> no you cannot do that, it entirely breaks how CR-field co-results work.

which part can i not do? the part with the current semantics which i thoroughly dislike with reading RT? or the part with changing the pseudocode to always write RT which I think is the best option overall?
Comment 30 Luke Kenneth Casson Leighton 2023-05-24 11:25:05 BST
just leave it as this:

   v = fptoint(FRB)
   if overflow and enabled:
       trap()
   else:
       RT = v

* overflow will be written into CR0 by ISACaller (post-pseudocode-execution)
* CR0's remaining bits are undefined (to be mentioned in the english language
  spec section)
* even if RT is analysed by ISACaller (post-pseudocode-execution) it is
  "Not Your Problem(tm)" - because RT is undefined.
* software will need to be told "on exception the only CR0 bit you can
  trust is CR0.SO".
Comment 31 Jacob Lifshay 2023-05-24 11:26:54 BST
for comparison, fcmpo always writes to the destination since it is in a different register file and therefore can't be overwriting any inputs even when fcmpo causes an identical trap.
Comment 32 Luke Kenneth Casson Leighton 2023-05-24 11:27:42 BST
(In reply to Jacob Lifshay from comment #29)

> which part can i not do? the part with the current semantics which i
> thoroughly dislike with reading RT? or the part with changing the pseudocode
> to always write RT which I think is the best option overall?

neither.  just mention in the english language of the spec page

   "on a FP exception, RT is UNDEFINED.  Additionally, if Rc=1,
    CR0.SO is set to 1, all other bits of CR0 are UNDEFINED"

and don't touch CR0 in the psuedocode at all.
Comment 33 Jacob Lifshay 2023-05-24 11:31:11 BST
(In reply to Luke Kenneth Casson Leighton from comment #30)
> just leave it as this:
> 
>    v = fptoint(FRB)
>    if overflow and enabled:
>        trap()
>    else:
>        RT = v
> 
> * overflow will be written into CR0 by ISACaller (post-pseudocode-execution)
> * CR0's remaining bits are undefined (to be mentioned in the english language
>   spec section)
> * even if RT is analysed by ISACaller (post-pseudocode-execution) it is
>   "Not Your Problem(tm)" - because RT is undefined.

that's not how pseudocode works in the PowerISA spec, if it doesn't explicitly write `undefined` to RT, then RT is not undefined. in this case RT is unmodified.

plus, this doesn't match fcmpo, the only other kind of inter-regfile fp op with traps i could find.
Comment 34 Luke Kenneth Casson Leighton 2023-05-24 11:34:12 BST
(In reply to Jacob Lifshay from comment #31)
> for comparison, fcmpo always writes to the destination since it is in a
> different register file and therefore can't be overwriting any inputs even
> when fcmpo causes an identical trap.

fcmpo does not provide a good precedent here to follow because it is not
a "co-result" instruction (not an Rc=1) it is a 5-bit instruction targetting
a specific CR-bit (not a CR-field bit)

fcmpo:

* reads two FPRs
* writes one bit to CR

these instructions are an *arithmetic* profile

* read one FPR
* write to one GPR *and a co-result CR0*

in other words, the issue you think is an issue is actually invalid.

this *is* doing the right thing:

    if vex_flag = 0 then
        RT <- result
        FPSCR.FPRF <- undefined(0b00000)
        FPSCR.FR <- inc_flag
        FPSCR.FI <- xx_flag
    else
        FPSCR.FR <- 0
        FPSCR.FI <- 0

RT *must not* be written to [in the pseudocode]
CR0 *must not* be written to [in the pseudocode]
writing to overflow *is* correct

there is no change required to the pseudocode.
Comment 35 Luke Kenneth Casson Leighton 2023-05-24 11:39:59 BST
(In reply to Jacob Lifshay from comment #33)

> that's not how pseudocode works in the PowerISA spec, if it doesn't
> explicitly write `undefined` to RT, then RT is not undefined. in this case
> RT is unmodified.

great. that's fine, good catch.  CR0 should still be UNDEFINED in all its 3 bits
except overflow, with words to that effect written into the english language
part of the spec (not the pseudocode).


 
> plus, this doesn't match fcmpo, the only other kind of inter-regfile fp op
> with traps i could find.

as i said that is not an Rc=1 co-result instruction and comparing with
it is invalid.


action required:

* NONE (no change) on the pseudocode
* English language text stating "on exception CR0 fields UNDEFINED except CR0.SO"
Comment 36 Jacob Lifshay 2023-05-26 06:19:35 BST
(In reply to Luke Kenneth Casson Leighton from comment #35)
> action required:
> 
> * NONE (no change) on the pseudocode
> * English language text stating "on exception CR0 fields UNDEFINED except
> CR0.SO"

I phrased it differently since that's not exactly correct (since I was glossing over one case when discussing with you):

> When `RT` is not written (`vex_flag = 1`), all CR0 bits
> except SO are undefined.

https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;h=72076851a4927c15c077db046d2ec0a234033fa0

commit 72076851a4927c15c077db046d2ec0a234033fa0
Author: Jacob Lifshay <programmerjake@gmail.com>
Date:   Thu May 25 22:09:31 2023 -0700

    fcvttg CR0 fields (except SO) are undefined when RT is not written
    
    this can occur even when fp traps are disabled in MSR, so writing
    "on exceptions" is incorrect.
Comment 37 Luke Kenneth Casson Leighton 2023-10-03 09:11:29 BST
(In reply to Jacob Lifshay from comment #36)

> > When `RT` is not written (`vex_flag = 1`), all CR0 bits
> > except SO are undefined.
> 
> https://git.libre-soc.org/?p=libreriscv.git;a=commitdiff;
> h=72076851a4927c15c077db046d2ec0a234033fa0

ok that's really clear. good catch that the wording also needed fixing.
interesting that it's not made clear in the Power ISA spec itself, i will
have to do a quick search for vex_flag

> commit 72076851a4927c15c077db046d2ec0a234033fa0
> Author: Jacob Lifshay <programmerjake@gmail.com>
> Date:   Thu May 25 22:09:31 2023 -0700
> 
>     fcvttg CR0 fields (except SO) are undefined when RT is not written

ok yes i forgot. been too long.  ok so implementation-wise vex_flag
needs to be done like overflow in ISACaller. see bug #1177.