TODO: SFFS ISACaller userspace ELF support for dynamic linking and PIC, also add a lot more system calls. * TODO: syscall "write to file using mmap" test (needs the default-overrides, bug #1173, bug #1174 * WIP: statically-linked-to-glibc * BASICALLY DONE: argv, auxv, envp. (current code makes glibc crash still, hence not completely done) * DONE: brk, read, writev, openat, uname, newuname, readlink, readlinkat * WIP: mmap, mmap2 -- private mmaps work, shared not yet
I implemented brk and a bunch of smaller misc. syscalls and added a test of a hello world statically linked to glibc, it fails when it reaches stwcx. https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=ca1a0b0061f287c615f45b61ea9f6a70e38ccf69 error: if rc_en and ins_name not in ['svstep']: if outs_ok.get('FPSCR', False): FPSCR = outs['FPSCR'] else: FPSCR = self.FPSCR yield from self.do_rc_ov( > ins_name, results[0], overflow, cr0, cr1, FPSCR) E TypeError: 'NoneType' object is not subscriptable <snip> results = None <snip> src/openpower/decoder/isa/caller.py:2474: TypeError
(In reply to Jacob Lifshay from comment #1) > I implemented brk and a bunch of smaller misc. syscalls and added a test of > a hello world statically linked to glibc, it fails when it reaches stwcx. > > https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog; > h=ca1a0b0061f287c615f45b61ea9f6a70e38ccf69 > > > error: > if rc_en and ins_name not in ['svstep']: > if outs_ok.get('FPSCR', False): > FPSCR = outs['FPSCR'] > else: > FPSCR = self.FPSCR > yield from self.do_rc_ov( > > ins_name, results[0], overflow, cr0, cr1, FPSCR) > E TypeError: 'NoneType' object is not subscriptable > > <snip> > results = None > <snip> > > src/openpower/decoder/isa/caller.py:2474: TypeError errm... ermerm... ok. good catch. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=ca1a0b0061f287c615f45b61ea9f6a70e38ccf69 we need to isolate that a bit more into a MVCE. (it is 09:45 UTC therefore 4am for you so i may do it later today). i don't think i ever implemented stwcx in ISACaller.
(In reply to Luke Kenneth Casson Leighton from comment #2) > i don't think i ever implemented stwcx in ISACaller. nope. it's in TestIssuer but there is no pseudocode. raise as bugreport? # Store word Conditional Indexed X-Form * stwcx. RS,RA,RB Pseudo-code: # TODO undefined(0) Special Registers Altered: CR0
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=58773876a09183484c47f0594a8a880bc30e2b5f found the pseudocode, just adding it now. will need lwarx as well
commit 56463beacd0691ae1b5fa73fd9763dbff9f504ad (HEAD -> 1228-elf-dynamic, origin/1228-elf-dynamic) Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Sun Dec 3 20:30:27 2023 +0000 add initial lwarx unit test and pseudocode first version, gets you started, tired will do stwcx tomorrow, go for it if holdding you up ok? just did "enough" feel free to codemorph/fix
(In reply to Luke Kenneth Casson Leighton from comment #5) > first version, gets you started, tired will do > stwcx tomorrow, go for it if holdding you up ok? > just did "enough" feel free to codemorph/fix thanks, I'll fix the pseudocode, probably later today. (In reply to Luke Kenneth Casson Leighton from comment #2) > we need to isolate that a bit more into a MVCE. > (it is 09:45 UTC therefore 4am for you so i may do it > later today). actually, i'm in UTC-8 for the winter, so that was 1:45 am.
I fixed lwarx and stwcx., and added the b/h/d versions while I was at it and added some tests for them. I also added a stub sync instruction, since it was missing. I also added lqarx/stqcx. since i was adding the others anyway, but I added no tests since I expect them to be broken due to RTp/RSp handling. I also added some FIXME `comment2`s in the CSVs to them and lq/stq since those should be 16-byte load/stores (they're atomic after all) and use RTp/RSp, but the csv entries don't have that. I implemented the writev syscall, and the statically-linked glibc binary finally printed a message: malloc(): corrupted top size this indicates memory corruption somewhere, which will be *fun* to debug -- or maybe I just need to implement relocations and that'll fix it... I rebased on master: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=7a91bc2921e3b5fff1d647d6dba0573d259fb95e
(In reply to Jacob Lifshay from comment #7) > I fixed lwarx and stwcx., and added the b/h/d versions while I was at it and > added some tests for them. oh, I also added support for memory to ExpectedState.
(In reply to Jacob Lifshay from comment #7) > I fixed lwarx and stwcx., and added the b/h/d versions while I was at it and > added some tests for them. fantastic. > I also added a stub sync instruction, since it was missing. good idea. quad instructions are out of scope at the moment but see how that goes, if there is a compile-time option to exclude them then take it. > I implemented the writev syscall, and the statically-linked glibc binary > finally printed a message: > malloc(): corrupted top size wha-hey! > this indicates memory corruption somewhere, which will be *fun* to debug -- > or maybe I just need to implement relocations and that'll fix it... yyeah the heap/stack may just not be big enough. (In reply to Jacob Lifshay from comment #8) > oh, I also added support for memory to ExpectedState. bizarre, had i not already added it? well it is a good addition. my thoughts: single-stepping through the entire program under qemu using the Test API would give a FULL instruction AND register AND memory-access trace that would allow instruction-by-instruction EXACT and i do mean EXACT replication such that "diff -u" tells you precisely and immediately the deviation. this was the technique i used (in HDL, under verilator, using the DMI interface) to get precise and exact interoperability between TestIssuer and Microwatt. tens of millions of instructions, and yet spotting the one that went wrong was trivial due to nothing more complex than running "diff -u"
anyway let's park this one as weare mot getting paid!
(In reply to Luke Kenneth Casson Leighton from comment #10) > anyway let's park this one as weare mot getting paid! i'm waiting till dec 7 since that's the deadline you previously gave. I added argv and envp support, and partial auxv support. I also added checking for invalid memory accesses when in emulating-mmap mode, turns out the initial program state is missing something so glibc does a null dereference, but I never knew until I tried to run the same program from approximately the same state in gdb in a ppc64le VM. I just copied the stack pointer and argv/envp/auxv state to gdb, so this means it probably isn't relocations causing the issue (since those were applied when running in gdb), I'm guessing a missing auxv entry somewhere. https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=1257b4f1fcfca75f80f5e5348feefe2d7e4eb153
(In reply to Luke Kenneth Casson Leighton from comment #9) > my thoughts: single-stepping through the entire program under qemu > using the Test API would give a FULL instruction AND register AND > memory-access trace that would allow instruction-by-instruction EXACT and > i do mean EXACT replication such that "diff -u" tells you precisely > and immediately the deviation. two issues with this idea: the code we have uses qemu in system mode, in system mode qemu doesn't emulate syscalls, so it won't match. in user mode, qemu uses an arbitrary non-default code base address of 0x4000...something (i tried changing it, didn't seem to work), this is not at all what happens when just running it in gdb under a ppc64le kernel. to match qemu would require implementing relocations and figuring out what all qemu does, and then we would match linux itself less well. so, because of that, I'm just running under gdb with `record full` enabled: on a ppc64le system/VM (qemu-system with a linux install works): gdb my-program.elf starti record full <change stack pointer and insert argv/envp/auxv if necessary> c then you can use gdb's reverse debugging features to try and find where it matches ISACaller or not.
I added the rest of auxv and afaict it matches linux (with some adjustment for program args and name and different AT_HWCAP[2] values since we don't support altivec, vsx, etc.) I also fixed brk to not try to return page-aligned values when the input isn't page-aligned. I also added some code to dump gdb commands that can be copy-pasted for easy debugging while matching load_elf's initial state. https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=e99f2f2679cc04fd7d0a63b6db29e61d4dc1bcdf now, I'm running into the issue that ISACaller is causing an unaligned-memory-access exception: 0x1002FBFC: 7D20FC28 call ldbrx ldbrx read reg 0/0: 0x0 read reg 31/0: 0x7fffffffffd9 read reg 9/0: 0xfd9 memory unaligned exception, DAR 140737488355289 MemException('unaligned', 'Unaligned access: remainder 1 width 8') This exception needs to be suppressed and the load performed like normal, as is required by the programming note on: PowerISA v3.1B Book III section 7.5.9 page 1287 (1313) > If an Alignment interrupt occurs for a case in the > second bulleted list above, the Alignment interrupt > handler should emulate the instruction. The emula- > tion must satisfy the atomicity requirements > described in Section 1.4 of Book II. ldbrx falls under the second list, so user-mode emulation must execute the instruction. (actually, all instructions in the second list *may* cause alignment faults rather than being required to, so I think we just need to add a mode flag to suppress alignment faults for those instructions and then we can set that flag by default)
(In reply to Jacob Lifshay from comment #12) > (In reply to Luke Kenneth Casson Leighton from comment #9) > > my thoughts: single-stepping through the entire program under qemu > > using the Test API would give a FULL instruction AND register AND > > memory-access trace that would allow instruction-by-instruction EXACT and > > i do mean EXACT replication such that "diff -u" tells you precisely > > and immediately the deviation. > > two issues with this idea: the code we have uses qemu in system mode, in > system mode qemu doesn't emulate syscalls, so it won't match. https://git.libre-soc.org/?p=kvm-minippc.git;a=summary installed and runnabe on talos1.libre-soc.org qemu-user *should* be single-steppable. talos1 has been placed into "HASH VM" not RADIX mode in order to take advantage of hypervisor single-step debugging. i don't know if that will help? > in user mode, qemu uses an arbitrary non-default code base address of > 0x4000...something (i tried changing it, didn't seem to work), this is not > at all what happens when just running it in gdb under a ppc64le kernel. to > match qemu would require implementing relocations and figuring out what all > qemu does, and then we would match linux itself less well. > > so, because of that, I'm just running under gdb with `record full` enabled: pffh, if it works it works. awesome. > then you can use gdb's reverse debugging features to try and find where it > matches ISACaller or not. combined with the ISACaller instruction logging. which hm should have a more "verbose dump" mode to correspond with the verilator/microwatt debug (DMI interface) work i did 3+ years ago.
(In reply to Luke Kenneth Casson Leighton from comment #14) > > i don't know if that will help? not really, again, it's system mode, not user mode afaict. today I tried to create a simple ppc64le VM iso where init immediately runs gdbserver on the serial port, the idea is that there would be a new class like SimRunner that would run qemu-system on that iso and then connect either using the gdb-remote protocol or through gdb/mi using powerpc64le-linux-gnu-gdb on the host. this would mean we have an actual linux kernel doing the ELF loading, so we have a reference to compare against. turns out creating a ppc64le VM iso is much harder than you might think if you haven't done it before and are trying to not require building the iso from a ppc64le host (cuz then you don't even need the VM). I have a half-baked script that doesn't really work, I can commit it if you like, but there might be a much better way to do it. extra fun: I tried live-build, but apparently ppc64el debian 10's live-build doesn't support ppc64le, and I can't get cross-building from amd64 to work on ubuntu 20.04, and it turns out that the ppc64el debian 12 installer iso won't boot in qemu...so I'm trying installing debian 10 in a VM and upgrading to debian 12. I'll just be done with this bug for now. on to other things...
(In reply to Jacob Lifshay from comment #15) > I'll just be done with this bug for now. on to other things... yeah probably best :) but at least we have a really good starting point for the future continuation.
(In reply to Luke Kenneth Casson Leighton from comment #16) > (In reply to Jacob Lifshay from comment #15) > > I'll just be done with this bug for now. on to other things... > > yeah probably best :) but at least we have a really good starting point > for the future continuation. oh, actually, i think i should change the failing test to skip and merge the branch, this will help it from getting horribly out-of-date or forgotten. sound good?
(In reply to Jacob Lifshay from comment #17) > oh, actually, i think i should change the failing test to skip and merge the > branch, this will help it from getting horribly out-of-date or forgotten. > sound good? good idea. or just leave it as a reminder. it is perfectly fine to have tests that fail. that's what they are there for, as reminders. disabling tests "because fail" is actually extremely bad practice. think it through as to why.
(In reply to Luke Kenneth Casson Leighton from comment #18) > (In reply to Jacob Lifshay from comment #17) > > > oh, actually, i think i should change the failing test to skip and merge the > > branch, this will help it from getting horribly out-of-date or forgotten. > > sound good? > > good idea. or just leave it as a reminder. > > it is perfectly fine to have tests that fail. that's what > they are there for, as reminders. well, the problem with that is that CI won't show more than a few failing test cases (otherwise the log is too long and gitlab truncates it, removing the test summary), and also if we have too many, it's hard to find the tests we want because they get lost in the noise of known-failing test cases. > > disabling tests "because fail" is actually extremely bad > practice. think it through as to why. disabling tests because you broke them and won't fix them is bad practice, disabling tests because you never completed them and know that and have a bug tracking them is imo fine. I also decided to commit the script for building the gdb vm (marked as not working), we can just ignore it until we get around to fixing it. better than losing all the progress completely. I'm pushing un-rebased to 1228-elf-dynamic: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=41f1911e9191b2e031e91f645c7e65884878ef7a I tried to rebase on master, but something on master broke tons of tests recently, somewhere between: https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=8e9fdf5b4f9dfaee84665286c00b521c1f820b53 https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4998615 and https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=5c672ca8e3e02fe4d441b48e8a9b35aa46150038 https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/5013498 commit 41f1911e9191b2e031e91f645c7e65884878ef7a Author: Jacob Lifshay <programmerjake@gmail.com> Date: Tue Dec 12 10:20:03 2023 -0800 elf/simple_cases: disable case_static_glibc for now, re-enable when we work on it again. commit 94e0bc79a57037f41ee2410bfc70802815c22884 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Tue Dec 12 10:11:04 2023 -0800 add make_gdb_vm_image.sh script, it doesn't work yet but could be useful see https://bugs.libre-soc.org/show_bug.cgi?id=1228#c15
(In reply to Jacob Lifshay from comment #19) > (In reply to Luke Kenneth Casson Leighton from comment #18) > > (In reply to Jacob Lifshay from comment #17) > > > > > oh, actually, i think i should change the failing test to skip and merge the > > > branch, this will help it from getting horribly out-of-date or forgotten. > > > sound good? > > > > good idea. or just leave it as a reminder. > > > > it is perfectly fine to have tests that fail. that's what > > they are there for, as reminders. > > well, the problem with that is that CI won't show more than a few failing > test cases (otherwise the log is too long and gitlab truncates it, removing > the test summary), and also if we have too many, it's hard to find the tests > we want because they get lost in the noise of known-failing test cases. > > > > disabling tests "because fail" is actually extremely bad > > practice. think it through as to why. > > disabling tests because you broke them and won't fix them is bad practice, agreed. and would not be permitted, and is not the case here. > > disabling tests because you never completed them and know that and have a > bug tracking them is imo fine. ah no. *attempting to run them and putting the bug URL in the @messge* is the correct action there. reminds everyone except the assignee to ignore it. > I also decided to commit the script for building the gdb vm (marked as not > working), we can just ignore it until we get around to fixing it. better > than losing all the progress completely. indeed. i trust it is based on the project standard dstro, debian/10? (official use of anything else is prohibited, but if you *are* using anything else *do not* encourage people to follow what you do. if it includes anything other than debian/10 please remove it until it does)
(In reply to Luke Kenneth Casson Leighton from comment #20) > indeed. i trust it is based on the project standard dstro, debian/10? > (official use of anything else is prohibited, but if you *are* using > anything else *do not* encourage people to follow what you do. > if it includes anything other than debian/10 please remove it until > it does) no, it's based on debian 11's initrd (it will just run gdbserver in initrd for faster boot, also cuz there's no root fs), iirc i ran into some issue with debian 10, icr what.
(In reply to Jacob Lifshay from comment #21) > no, it's based on debian 11's initrd (it will just run gdbserver in initrd > for faster boot, also cuz there's no root fs), iirc i ran into some issue > with debian 10, icr what. sigh. sounds about righ5 should be abe to use the initrd/kernl of debian/11 with deban10 rootfs by doing a "debootstrap" (1st stage). qemu-user used to chroot in, avoids kernel. can then finish d10 chroot and go from there. https://wiki.debian.org/EmDebian/CrossDebootstrap#Cross-installing_Debian_using_debootstrap.2Fmultistrap needs multiarch and binfmt foriegn... if you can execute ppc64le helloworld on x86 you already have everything needed incl. qemu-user-ppc64le.
(In reply to Luke Kenneth Casson Leighton from comment #22) > (In reply to Jacob Lifshay from comment #21) > > > no, it's based on debian 11's initrd (it will just run gdbserver in initrd > > for faster boot, also cuz there's no root fs), iirc i ran into some issue > > with debian 10, icr what. > > sigh. sounds about righ5 > should be abe to use the initrd/kernl of debian/11 with > deban10 rootfs by doing a "debootstrap" (1st stage). no need, the script creates an .iso with *only* grub, kernel and initrd, nothing else.
(In reply to Jacob Lifshay from comment #23) > no need, the script creates an .iso with *only* grub, kernel and initrd, > nothing else. ahh awesome then i'm perfectly happy with that.
(In reply to Jacob Lifshay from comment #19) > I tried to rebase on master, but something on master broke tons of tests > recently, somewhere between: > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=8e9fdf5b4f9dfaee84665286c00b521c1f820b53 > https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/4998615 > > and > > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=5c672ca8e3e02fe4d441b48e8a9b35aa46150038 > https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/jobs/5013498 according to git bisect: 5025354342d8e92f200b9c0c0a04f96a53d7f0e3 is the first bad commit commit 5025354342d8e92f200b9c0c0a04f96a53d7f0e3 Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> Date: Tue Nov 28 20:41:01 2023 +0000 fix elwidth overrides when sw=8 the way that XLEN works is it must be MAX(sw,dw) which is not what was happening, it was fixed at sw (source width) Luke, please don't push stuff to master unless the tests have been run and pass! if you don't want to run tests, please just push to a branch, then you can wait for CI.
(In reply to Jacob Lifshay from comment #25) > 5025354342d8e92f200b9c0c0a04f96a53d7f0e3 is the first bad commit > commit 5025354342d8e92f200b9c0c0a04f96a53d7f0e3 > Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> > Date: Tue Nov 28 20:41:01 2023 +0000 > assert ew_src == XLEN, "TODO fix elwidth conversion" E NameError: name 'XLEN' is not defined src/openpower/decoder/isa/caller.py:2582: NameError I fixed it: https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=5e085b368fbd0a95590b6aeaedcdd796afe92d60 commit d69b8c15517dd22e782d2ff0825bc4375f950de6 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Tue Dec 12 16:49:24 2023 -0800 caller.py: XLEN must be accessed as self.XLEN also later found: > if insn_name.startswith("sv.bc") or ffirst: E NameError: name 'ffirst' is not defined src/openpower/decoder/isa/caller.py:2927: NameError fixed: commit ae8bacd2eb456b0b83a4d29f7de23474eff90eed Author: Jacob Lifshay <programmerjake@gmail.com> Date: Tue Dec 12 16:50:27 2023 -0800 caller.py: fix undefined ffirst, hope I guessed the correct value I discovered that is_ffirst_mode is a generator function, yet is usually called without yield from, e.g.: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=2bb4b6086eab4ed48b70180c677cd77237065243;hb=d3c7a0ddc169877e8daf38a347091c366e85624b#l2444 2444 if not rc_en or not is_ffirst_mode(self.dec2): 2445 return False, False fixed: commit 5e085b368fbd0a95590b6aeaedcdd796afe92d60 Author: Jacob Lifshay <programmerjake@gmail.com> Date: Tue Dec 12 16:51:33 2023 -0800 caller.py: use yield from on is_ffirst_mode since it's a generator other new errors: > expected = deepcopy(vec) E NameError: name 'vec' is not defined src/openpower/decoder/isa/test_caller_svp64_dd_ffirst.py:149: NameError I'll let luke/shriya fix the last error, since it's in the test case you just recently worked on. test results (only new failure is test_caller_svp64_dd_ffirst.py): > FAILED src/openpower/decoder/isa/test_caller_svp64_dd_ffirst.py::DDFFirstTestCase::test_1 > FAILED src/openpower/decoder/isa/test_caller_svp64_ldst.py::DecoderTestCase::test_sv_load_dd_ffirst_excl > [case_sc] SUBFAIL src/openpower/decoder/isa/test_caller_syscall.py::TestSysCall::test > [case_2_rfid] SUBFAIL src/openpower/decoder/isa/test_caller_trap.py::TrapTest::test > [7:sv.cmp/zz/ff=gt/m=r3] SUBFAIL src/openpower/sv/trans/test_pysvp64dis.py::SVSTATETestCase::test_20_cmp > [0:rldimi] SUBFAIL src/openpower/sv/trans/test_pysvp64dis.py::SVSTATETestCase::test_37_extras_rldimi > [0:sv.rldimi] SUBFAIL src/openpower/sv/trans/test_pysvp64dis.py::SVSTATETestCase::test_36_extras_rldimi > [0:sv.rldimi.] SUBFAIL src/openpower/sv/trans/test_pysvp64dis.py::SVSTATETestCase::test_36_extras_rldimi_
rebased and merged 1228-elf-dynamic to master, tests pass (except I canceled the last two tests, I don't want to wait 60min, CI will catch them). https://git.libre-soc.org/?p=openpower-isa.git;a=shortlog;h=86e18e3a95a3cebacdf9c55e041408d4491a6eaf
(In reply to Jacob Lifshay from comment #26) > (In reply to Jacob Lifshay from comment #25) > > 5025354342d8e92f200b9c0c0a04f96a53d7f0e3 is the first bad commit > > commit 5025354342d8e92f200b9c0c0a04f96a53d7f0e3 > > Author: Luke Kenneth Casson Leighton <lkcl@lkcl.net> > > Date: Tue Nov 28 20:41:01 2023 +0000 > > > assert ew_src == XLEN, "TODO fix elwidth conversion" > E NameError: name 'XLEN' is not defined > > src/openpower/decoder/isa/caller.py:2582: NameError > > I fixed it: WHOAWHOAWHOA STOP. rhat looks like REALLY OLD code. shit shit shit. something's gone badly wrong. don't do ANYTHING.
(In reply to Luke Kenneth Casson Leighton from comment #28) > (In reply to Jacob Lifshay from comment #26) > don't do ANYTHING. ok fortunately this was resolved.