Algorithms we want to demo: * DONE: UTF-8 validation https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=7217fe80d54a5dab33566e6d8fff949b84ce433e Links: https://www.json.org/JSON_checker/utf8_decode.c https://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html
additional useful links: converting utf-8 <-> utf-16 (useful for JS and Java) https://web.archive.org/web/20210625032530/https://researcher.watson.ibm.com/researcher/files/jp-INOUEHRS/IPSJPRO2008_SIMDdecoding.pdf validating UTF-8 (useful for JSON decoding and many many other things) https://github.com/rusticstuff/simdutf8 it's very common to only care if you have correct utf-8 and where the first error is rather than needing to decode the unicode codepoints -- the unicode codepoints aren't that much more useful than the bytes for many purposes -- parsing (e.g. JSON) is nearly always faster on just the utf-8 bytes rather than having to decode to utf-32 first.
(In reply to Jacob Lifshay from comment #1) > additional useful links: also: https://github.com/simd-lite/simd-json
ironically whilst everyone else is desperately trying to smash their heads against a SIMD wall we need to track down simple scalar versions of algorithms because the insistence "But SIMD Makes It Fast" makes it astoundingly difficult to comprehend. it took 3 weeks for example to track down easy-to-read DCT source code. assessing this one therefore needs a similar (the usual) strategy: 1) work out the hotspots (good to hear UTF8 to UTF16 is common for example) 2) find *readable* non-assembler non-optimised non-parallelised reference implementations.
(In reply to Jacob Lifshay from comment #1) > additional useful links: > converting utf-8 <-> utf-16 (useful for JS and Java) > https://web.archive.org/web/20210625032530/https://researcher.watson.ibm.com/ > researcher/files/jp-INOUEHRS/IPSJPRO2008_SIMDdecoding.pdf this will be useful to know about only that 8-16 is desirable (which is great). trying to understand what on earth the SIMD assembly is doing, not so much. > validating UTF-8 (useful for JSON decoding and many many other things) > https://github.com/rusticstuff/simdutf8 this is about as bad as it gets: https://github.com/rusticstuff/simdutf8/blob/main/src/implementation/x86/avx2.rs hopelessly unreadable, the level of "optimisations" is so deeply embedded within that code that it is worse than useless! with SVP64 being based in the abstract on "Multi-Issue parallelisation of Scalar operations by dropping hardware for-loops around them", time and time again it has been shown that progress is made by starting from a *scalar* proof-of-concept, never from someone's heavily optimised SIMD Assembler.
(In reply to Jacob Lifshay from comment #2) > also: > https://github.com/simd-lite/simd-json https://github.com/simd-lite/simd-json/blob/main/src/neon/deser.rs i can't even begin to comprehend what that is doing :) whereas the c code from comment #0 is both dead simple, well documented, and serial in nature. it is the serial nature which makes mapping it straight to SVP64 so easy, the comments are a bonus. pretty ironic, huh? you'd think "oh yeah it's fast with NEON therefore it MUST contain useful inspiration", right? turns out this instinct is dead wrong, every single time. sigh.
additional links: (WTF-8 is UTF-8 but modified to also represent unpaired surrogates, like in ill-formed UTF-16. this is useful for Windows File Names, Java/JS Strings, etc.) https://simonsapin.github.io/wtf-8/ https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf Table 3-7 (modified to put a star next to where the original used bold text) Well-Formed UTF-8 Byte Sequences Code Points First Byte Second Byte Third Byte Fourth Byte U+0000..U+007F 00..7F U+0080..U+07FF C2..DF 80..BF U+0800..U+0FFF E0 *A0..BF 80..BF U+1000..U+CFFF E1..EC 80..BF 80..BF U+D000..U+D7FF ED 80..*9F 80..BF U+E000..U+FFFF EE..EF 80..BF 80..BF U+10000..U+3FFFF F0 *90..BF 80..BF 80..BF U+40000..U+FFFFF F1..F3 80..BF 80..BF 80..BF U+100000..U+10FFFF F4 80..*8F 80..BF 80..BF
(In reply to Luke Kenneth Casson Leighton from comment #4) > this is about as bad as it gets: > https://github.com/rusticstuff/simdutf8/blob/main/src/implementation/x86/ > avx2.rs > > hopelessly unreadable, that's cuz you're reading the isa abstraction layer, not the core algorithm. the algorithm is here: https://github.com/rusticstuff/simdutf8/blob/main/src/implementation/algorithm.rs
(In reply to Jacob Lifshay from comment #7) > that's cuz you're reading the isa abstraction layer, not the core algorithm. > the algorithm is here: > https://github.com/rusticstuff/simdutf8/blob/main/src/implementation/ > algorithm.rs the papers describing the algorithms: https://github.com/simdjson/simdjson#about-simdjson links from above link: * enjoy reading our paper https://arxiv.org/abs/1902.08318 * Parsing Gigabytes of JSON per Second https://arxiv.org/abs/1902.08318 * Validating UTF-8 In Less Than One Instruction Per Byte https://arxiv.org/abs/2010.03090 * blog post providing some background and context https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/ * simdjson at QCon San Francisco 2019 http://www.youtube.com/watch?v=wlvKAT7SZIQ
still unintelligable at an algorithmic level due to this: idx += SIMD_CHUNK_SIZE no explanations at all: let byte_1_low = prev1.and(SimdU8Value::splat(0x0F)).lookup_16( CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4, CARRY | OVERLONG_2, and there's zero code comments. the very attempt to include lookup tables and to perform SIMD-ification is precisely what makes this code 100% hostile. no comments just buries what is already dead another 6ft under :) i have some ideas floating around but until appropriate *scalar* non-optimised *simple* implementations are found i cannot nail those ideas down. i expect finding such implementations to be just as hard as for DCT because "why would you bother, like, y'know, that's so slow ya wasting time, man" i need to understand the *principle* behind utf8, and when doing REMAP it needs the *REMAP* system to perform the looping, not the "concept called packed SIMD where you throw SIMD_CHUNK_SIZEs of data at a wall and hope for the best". anything that uses Packed SIMD catastrophically interferes with REMAP, and with Data-Dependent FailFirst and Predicate-Result Modes. just like how strncpy in RVV with fail-first LDST is only 13 assembler instructions but when using Power ISA Packed SIMD it requires 240.
(In reply to Jacob Lifshay from comment #8) > * blog post providing some background and context > https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/ ok, that's about parsing JSON, not about parsing UTF8. although, parsing of {Insert Graph-based Data Format} is part of what Extra-V was designed for.
there is a hardware design concept i would like to consider here, it is an advancement of the Eth Zurich Snitch core https://arxiv.org/pdf/2002.10143 specifically the idea of putting an intercept in to register usage which instead connects to a synchronous FIFO. reading or writing the FIFO would be wired to an advancement of svstep. if also connected to Memory LDST just like in Snitch but also Data Dependent failfirst and REMAP then there is the possibility to cover strange algorithms like UTF8 and JSON parsing i had a think, i see the value of identifying starting points and end points, creating a DOM from a sequential stream, that is BIG. could even be used for Message Passing between processors or processes. must look at design of OpenCAPI properly.
found one that is obvious and simple to understand. https://codereview.stackexchange.com/questions/159814/utf-8-validation/159832#159832 class Solution(object): def validUtf8(self, data): """ Check that a sequence of byte values follows the UTF-8 encoding rules. Does not check for canonicalization (i.e. overlong encodings are acceptable). >>> s = Solution() >>> s.validUtf8([197, 130, 1]) True >>> s.validUtf8([235, 140, 4]) False """ data = iter(data) for leading_byte in data: leading_ones = self._count_leading_ones(leading_byte) if leading_ones in [1, 7, 8]: return False # Illegal leading byte for _ in range(leading_ones - 1): trailing_byte = next(data, None) if trailing_byte is None or trailing_byte >> 6 != 0b10: return False # Missing or illegal trailing byte return True @staticmethod def _count_leading_ones(byte): for i in range(8): if byte >> (7 - i) == 0b11111111 >> (7 - i) & ~1: return i return 8 *now* it is obvious that validation starts by counting the number of 1s in the first character, then you must check that the top 2 bits of UTF8 characters must be 0b10. this simplicity is utterly destroyed by efforts made by optimised SIMD. attempting to even understand the validation algorithm from looking at optimised SIMD is not only wasting time it risks making mistakes. SimpleV is such a different paradigm we literally have to go back to scalar unoptimised implementations. this algorithm is quite fascinating, one byte will contain a count of the number of bytes that need to be checked for a match with 0b10------ needs some thought.
(In reply to Luke Kenneth Casson Leighton from comment #12) > Check that a sequence of byte values follows the UTF-8 encoding > rules. Does not check for canonicalization (i.e. overlong encodings > are acceptable). canonicalization and surrogate encodings needs to be checked, otherwise you can have security flaws such as smuggling / characters through a http server by encoding them as 0xC0 0xAF rather than 0x2F, which then allows you to access stuff outside the /var/www/html directory, e.g. by accessing https://example.com/..%C0%AF..%C0%AF..%C0%AFetc/passwd
ok i think i have a strategy. firstly, note that the leading 1s == QTY(1) is the patern 0b10------ which is also the "invalid" pattern. secondly, there is a cntleadones scalar instruction in v3.1. a vector of the 1s_count can be created. thirdly, some sv.cmpi on that tells us where utf8 starts and ends, where the end points may go into the next instruction as a mask fourthly, a sv.addi/satu/ew=8/m=eq *RT,*RA,0xff where RT is 1 greater than RA will perform a cascading non-rollover subtract of 1 from each element. anything that started as a count of 2 3 4 5 or 6 will count down *overwriting* the next register, but due to unsigned saturate it will not wrap back to 0xff 0xfe etc. furthermore due to the predicate mask the cascade *only* starts and continues from non-terminating points. it may be necessary to shift the mask down by one as you want the subtract-cascade to stop at the character *before* the beginning of the next utf8 sequence. if there are only zeros at these last characters, then the expected length is equal to the observed length. setting VL to 64 would get you about... maybe 14-18 instructions per 64 bytes? the only thing about those cascading subtracts is, they could create some horrendous hazard dependencies. therefore another potential way to do it would be to have a loop-unrolled sequence of sv.addi operations, bouncing back and forth between two pairs of registers. maybe three with a shift-incremented mask? another potential way would be to use bmask, to analyse the start and end points.
I have a different strategy that I think will work well, I started adding it as a test case in openpower-isa.git, but ran out of time today because I started writing a super simple svp64 emulator since I thought I'd need elwidth overrides (which iirc the simulator doesn't yet support), but turns out I don't so I converted it to a TestAccumulatorBase test case. basic idea: load current chunk of bytes to regs 64-95, expanding to 1 byte per register, zero pad (because nul is always a 1-byte utf-8 char) to 32 regs. put previous iteration's chunk in regs 32-63. now match the regs against the valid utf-8 patterns (see comment #6), accessing previous bytes by using regs 63-94, 62-93, 61-92, etc. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=f640d6b5c0ca5ae72d70cdaa95cda4f7e68e7e60
(In reply to Jacob Lifshay from comment #15) > now match the regs against the valid utf-8 patterns (see comment #6), > accessing previous bytes by using regs 63-94, 62-93, 61-92, etc. using sv.cmprb should help a bunch.
(In reply to Jacob Lifshay from comment #15) > I have a different strategy that I think will work well, I started adding it > as a test case in openpower-isa.git, but ran out of time today because I > started writing a super simple svp64 emulator for goodness sake don't waste time doing that. > since I thought I'd need > elwidth overrides (which iirc the simulator doesn't yet support), you can help add it! any time spent working on ISACaller no matter how small is infinitely more useful than any amount of time spent on duplication of effort. > now match the regs against the valid utf-8 patterns (see comment #6), > accessing previous bytes by using regs 63-94, 62-93, 61-92, etc. i'll be fascinated to see how that goes (using sv.cmprb, offset by one each time). estimating the clocks/byte is challenging as it will depend fundamentally on the micro-architecture. there *may* be yet *another* way - to use Vertical-First Mode and rely on a Multi-issue Engine. due to the branches though i don't think it would be beneficial.
(In reply to Luke Kenneth Casson Leighton from comment #9) > i have some ideas floating around but until appropriate *scalar* > non-optimised > *simple* implementations are found i cannot nail those ideas down. a good simple scalar algorithm is Algorithm 1 in: https://arxiv.org/pdf/2010.03090.pdf
the branch-range one? yyeah... i wonder, if it would work to do a sequence of cmps (and cmprbs), every one of the tests in each case statement, then transfer them into INTs (crweirds), do a 1-bit 2-bit 3-bit and 4-bit shift on them, then use ANDs ORs and BMASKs to perform a parallel bitlevel version of that switch statement? (no branches at all) what would be insane is that by doing sv.ANDs, sv.ors and sv.bmasks you could, with say 8-way multi-issue, be doing the equivalent of 64x8 switch statements all simultaneously.
(In reply to Luke Kenneth Casson Leighton from comment #19) > the branch-range one? > yyeah... i wonder, if it would work to do a sequence of cmps (and cmprbs), [cmps cmprbs and countleading1s].
thinking about it, it would be very useful to have a quick way to do what risc-v v's vslideup/vslidedown do: https://github.com/riscv/riscv-v-spec/blob/b6368b3c44d775f8eb01c7ce0ad017db19944aa7/v-spec.adoc#163-vector-slide-instructions it can be done using remap, but takes several instructions to be set up. imho we should use one of the svshape reserved combinations for this. it should not set vl and mvl as part of svshape. the example code I've been writing works around that by expanding each byte to fill a whole 64-bit register -- pretty wasteful.
(In reply to Jacob Lifshay from comment #21) > thinking about it, it would be very useful to have a quick way to do what > risc-v v's vslideup/vslidedown do: > https://github.com/riscv/riscv-v-spec/blob/ > b6368b3c44d775f8eb01c7ce0ad017db19944aa7/v-spec.adoc#163-vector-slide- > instructions if register aligned a simple sv.ori rt, rt+1, 0 does that. /mrr inverts the loop order. except if nonaligned then REMAP offset is needed. > it can be done using remap, but takes several instructions to be set up. two. that's hardly "several", is it. and if in a loop and there are no other uses, the svshape can be set once, outside, and the svremap on-demand as needed. leave SVSHAPE0-3 alone, activate them when needed, not setting the "persist" bit. then the remaps apply to the next instruction only and switch off again... *without* changing SVSHAPE0-3 though. > imho we should use one of the svshape reserved combinations for this. it > should not set vl and mvl as part of svshape. the 3 current purposes for svshape at the moment are to absolutely minimise those 3 uses: matrix dct fft. anything else is a welcome bonus. > the example code I've been writing works around that by expanding each byte > to fill a whole 64-bit register -- pretty wasteful. First approximation, good enough, then work out what can be done better. for example by doing 64-bit svshape (sv.svshape) an extra 24 bits magically becomes available. i have no problem at all in some of those bits expanding the options that had to be limited or missed entirely for the 32 bit svshape, such as the offset. matrix mode is perfectly capable of being set to 1D which when combined with offset gives the desired result here. 5+1 bits are also enough to set a small range of remap options as well (see svindex for how that can be done)
(In reply to Luke Kenneth Casson Leighton from comment #22) > (In reply to Jacob Lifshay from comment #21) > > thinking about it, it would be very useful to have a quick way to do what > > risc-v v's vslideup/vslidedown do: > > https://github.com/riscv/riscv-v-spec/blob/ > > b6368b3c44d775f8eb01c7ce0ad017db19944aa7/v-spec.adoc#163-vector-slide- > > instructions > > if register aligned a simple sv.ori rt, rt+1, 0 does > that. /mrr inverts the loop order. > > except if nonaligned then REMAP offset is needed. > > > it can be done using remap, but takes several instructions to be set up. > > two. that's hardly "several", is it. you forgot the setvl again since svshape put junk in it... I didn't see how to get the svshape instruction to set offset... Also, the algorithm constantly needs to switch between several offsets, making a dedicated mode desirable.
I wrote out the full algorithm, but was stymied trying to get `sv.andi.` to assemble: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=e347fb846bba92dbec07b33f08e185daad9df68b File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/test/algorithms/svp64_utf_8_validation.py", line 234, in run_case lst = list(isa) File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/sv/trans/svp64.py", line 617, in __iter__ yield from self.trans File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/sv/trans/svp64.py", line 1358, in translate yield from self.translate_one(insn) File "/home/jacob/projects/libreriscv/openpower-isa/src/openpower/sv/trans/svp64.py", line 667, in translate_one raise Exception("opcode %s of '%s' not supported" % Exception: opcode andi of 'sv.andi. *80, *47, 15' not supported I'll debug that more later.
fixed assembling `sv.andi.`: https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=6a79227deb29927ad71115ab99d9ff054173bd84 rewrote a lot of the utf-8 validation code to workaround simulator quirks/unimplemented-stuff, the utf-8 validation code is still not working yet -- now, to figure out if that's due to flaws in my code, or flaws in the svp64 implementation... https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=1e445b5efce833d158950c5084d8ee1dce0be0f8 I added code so you can use self.subTest(...) with TestAccumulatorBase, as well as adding src_loc_at so you can specify which function in the backtrace is the one you care about. https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=b64cafd74bd05c6d5cf42ffb224f3227395bc796
(In reply to Jacob Lifshay from comment #23) > I didn't see how to get the svshape instruction to set offset... it can't. although i may have worked out a way to do it, by using these SVRM modes https://libre-soc.org/openpower/sv/remap/ 0b1000 reserved 0b1001 reserved it would mean sacrificing 3 out of 3D (when setting offset) i.e. only being able to do 1 or 2D REMAP, because svshape SVxd,SVyd,SVzd,SVRM,vf * SVxd would be interpreted as the offset * SVyd as an rmm (see svindex instruction) * SVzd as-is (the dimension) so by sort-of combining what's already been done in svindex with svshape it *should* be possible. > Also, the algorithm constantly needs to switch between several offsets, > making a dedicated mode desirable. interesting. ok so that also means having the "nonpersist" mode is also a priority, and being able to set up several SVSHAPEs simultaneously. ok this is all doable. (In reply to Jacob Lifshay from comment #24) > raise Exception("opcode %s of '%s' not supported" % > Exception: opcode andi of 'sv.andi. *80, *47, 15' not supported oink. --- a/src/openpower/sv/trans/svp64.py +++ b/src/openpower/sv/trans/svp64.py @@ -1535,6 +1535,9 @@ if __name__ == '__main__': 'fmvis 5,64', 'fmvis 5,32768', ] + lst = [ + 'sv.and. *80, *80, 1', + ] isa = SVP64Asm(lst, macros=macros) log("list", list(isa)) asm_process() sv.and is detected/supported but sv.andi is not. moo? i bet that's just entirely missing from the RM*.csv files i.e. missing entirely from sv_analysis.py as a recognised pattern. ../openpower/isatables/RM-2P-1S1D.csv:andi.,NORMAL,,2P,EXTRA3, d:RA;d:CR0,s:RS,0,0,RS,0,0,RA,0,CR0,0 oink. noo, it's there - that's even weirder. leave it with me.
(In reply to Jacob Lifshay from comment #24) > Exception: opcode andi of 'sv.andi. *80, *47, 15' not supported sorted.
https://libre-soc.org/openpower/sv/remap/discussion/ drat. i think that's going to need a new instruction. svoffset or svshape2 or something. it's almost the same but not close enough. in HDL it can be covered by svshape but it is sufficiently different to likely need a new instruction. sigh
194 # set bit 0x80 (TwoContinuations) if input is >= 0xF0 195 f"sv.subi/satu *80, *45, {0xF0 - 0x80}", saturation isn't implemented yet, use minu/maxu with a constant scalar RB=0. https://libre-soc.org/openpower/isa/av/ hm, thought just occurred to me, would (RB|0) be useful in mins/maxs?
ok so whilst svshape2 doesn't yet exist you can use svindex: setvl svstep sv.addi svindex blah so you set the length, then get svstep to output the indices into an array, then add one to them, then use them. once the array of offsets is set up as long as you don't overwrite them obviously they are reusable, it only takes one instruction (svindex) to activate them. there is a persistent mode for svindex and a nonpersistent. you almost certainly want the nonpersistent one, for which the rmm argument is a bitmask which specifies whether, in lsb to msb order, RA RB RC RT EA/2nd-outputreg is to be REMAPped. so if you want only RB of sv.add *RT,*RA,*RB to be REMAPped set rmm=0b00010. RT and RA, set rmm=0b01001 this same rmm field will be in the svshape2 instruction as well.
utf-8 <-> utf-16 https://woboq.com/blog/utf-8-processing-using-simd.html
did more work on utf-8 validation using svp64. I got it to run successfully for validating the empty string! It's still broken for " " though... I'm having to fix the simulator a bunch, setvl. didn't correctly output CR0. I changed log() to support more granular enable/disable: https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=774ed2fd6547e7dc7ebea89e6f522b4c21792108 I added support to the svp64 assembler for comments. I also added the original instruction as comments on the `.long`s generated by the svp64 assembler -- it makes debugging much easier to see: .long 0x580005B6 # setvl 0, 0, 3, 0, 1, 1 rather than: .long 0x580005B6
As an example of log filtering: > SILENCELOG='!instr_in_outs' python src/openpower/decoder/isa/test_caller_svp64_utf_8_validation.py outputs: <LogKind.Default: 'default'> silenced <LogKind.InstrInOuts: 'instr_in_outs'> active <LogKind.SkipCase: 'skip_case'> silenced running test: case_empty {'data': b'', 'expected': 1} <snip> 0x003C: 58A40FB7 .long 0x58A40FB7 # setvl. 5, 4, 8, 0, 1, 1 read reg r4: 0x0 write reg CR: 0x20000000 write reg SVSTATE: 0x1000000000000000 write reg CTR: 0x0 0x0040: 418200BC bc 12, 2, final_check # beq final_check write reg CTR: 0x0 write reg CR: 0x20000000 write reg LR: 0x10000000 0x00FC: 580001B6 .long 0x580001B6 # setvl 0, 0, 1, 0, 1, 1 read reg r0: 0x15CEE3293AA9BFBE write reg CR: 0x20000000 write reg SVSTATE: 0x204000000000000 write reg CTR: 0x0 0x0100: 05400100 .long 0x05400100 # sv.cmpli 0, 1, 45, 240 read reg r45: 0x0 write reg CR: 0x80000000 0x0108: 4080FFEC bc 4, 0, fail # bge fail write reg CTR: 0x0 write reg CR: 0x80000000 write reg LR: 0x10000000 <snip>
44 else if _RA != 0 then 45 if (RA) >u 0b1111111 then VL <- 0b1111111 46 else VL <- (RA)[57:63] i have no idea why when i added exactly this a few days ago it is not already committed, duh + if Rc = 1 then + if step = 0 then c <- 0b001 + else c <- 0b010 + CR[32:35] <- c || XER[SO] this should already be done and does not look correct according to spec https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/caller.py;h=f3d9d8085115bc0c053116707b37a2cba5e40d6b;hb=HEAD#l1665 ah hang on yes check_step_increment is not called on 32bit scalar ops. that may need fixing esp. for "svstep."
203 # sv.andi. is buggy, sorted. if not please do add a unit test and i will take a look.
I got UTF-8 validation to work! I had to do a bunch of instruction substitution to work around limitations/bugs in the instruction simulator, most of the limitations/bugs are documented in the comments on the function that generates the assembly: https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/test/algorithms/svp64_utf_8_validation.py;h=f040f1e6f114927bfcae0714d9d76bb737e0c42c;hb=918f3eadf7118a6ecd0e2eb6caaaed9da6936299#l135 I also implemented a better memory dump for logging, like hexdump -C: https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=661ae80360644dfe3a9e7f3610d534cc3a7e545f I also implemented support for svp64 prefixed instructions that have a libre-soc-custom suffix, e.g. sv.maxu: https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=0e80cab3b809d432354ca05464e95dc53db11b64 I also added support to Expected for when tests don't care what so, ov, and ca get set to, those can just be set to None: https://git.libre-soc.org/?p=openpower-isa.git;a=commit;h=07f5f22461d5eda844141b2ffd33e021d2b43ffb
(In reply to Jacob Lifshay from comment #36) > I got UTF-8 validation to work! frickin-A! > I had to do a bunch of instruction substitution to work around > limitations/bugs in the instruction simulator, most of the limitations/bugs > are documented in the comments on the function that generates the assembly: see comment #35 and you can use "svstep." instead of this: 190 f"sv.addi *{cur_bytes + 1}, *{cur_bytes}, 1", # create indexes and this: 183 # clear cur bytes, so bytes beyond end end up being zeros 184 f"setvl 0, 0, {vec_sz}, 0, 1, 1", # set VL to vec_sz is what data-dependent fail-first is for (although it needs implementing) it will auto-truncate VL at the terminating zero. you need to set the "/vli" option to include the failing-terminating-zero. > > I also implemented a better memory dump for logging, like hexdump -C: > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=661ae80360644dfe3a9e7f3610d534cc3a7e545f brilliant > I also implemented support for svp64 prefixed instructions that have a > libre-soc-custom suffix, e.g. sv.maxu: > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=0e80cab3b809d432354ca05464e95dc53db11b64 mmm... this may have damaged detection of "sv.fmadds." please check that. + if not v30b_op.endswith('.'): + v30b_op += rc # argh, sv.fmadds etc. need to be done manually if v30b_op == 'ffmadds': > I also added support to Expected for when tests don't care what so, ov, and > ca get set to, those can just be set to None: > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > h=07f5f22461d5eda844141b2ffd33e021d2b43ffb excellent.
(In reply to Luke Kenneth Casson Leighton from comment #37) > (In reply to Jacob Lifshay from comment #36) > > I also implemented support for svp64 prefixed instructions that have a > > libre-soc-custom suffix, e.g. sv.maxu: > > https://git.libre-soc.org/?p=openpower-isa.git;a=commit; > > h=0e80cab3b809d432354ca05464e95dc53db11b64 > > mmm... this may have damaged detection of "sv.fmadds." > please check that. it passed all tests in the openpower-isa repo on my computer, so I assumed that means I didn't break anything. i'll work on moving all those sv.* special cases to CUSTOM_INSNS and add the apprpriate sv.*. mnemonics tomorrow, that should be a good cleanup. since fmadds. specifically is a v3.0b op, sending it straight to gas should work fine, no special case should be needed.
(In reply to Jacob Lifshay from comment #38) > i'll work on moving all those sv.* special cases to CUSTOM_INSNS and add the > apprpriate sv.*. mnemonics tomorrow, that should be a good cleanup. in the process, do *not* pre-pend "." onto v30b_op until *after* all processing has been completed. that is what caused the failure. or, ensure that CUSTOM_INSNS has the required match-patterns: both "ffmadds" *and* "ffmadds.", "maxu" *and* "maxu." > since fmadds. specifically is a v3.0b op, sending it straight to gas should > work fine, no special case should be needed. "ffmadds." not "fmadds."
(In reply to Luke Kenneth Casson Leighton from comment #39) > or, ensure that CUSTOM_INSNS has the required match-patterns: > both "ffmadds" *and* "ffmadds.", "maxu" *and* "maxu." that's what I said I'd do: > and add the > apprpriate sv.*. mnemonics tomorrow > > since fmadds. specifically is a v3.0b op, sending it straight to gas should > > work fine, no special case should be needed. > > "ffmadds." not "fmadds." k, i responded about fmadds. since you said fmadds. in comment #37
Pushed the fixed cherry-picked code to master. CI passes: https://salsa.debian.org/Kazan-team/mirrors/openpower-isa/-/commit/7217fe80d54a5dab33566e6d8fff949b84ce433e/pipelines
uuh, actually just utf-8 verification is done, neither utf-8 <-> utf-16 algorithm is done, so payment should be on a subtask rather than this bug directly
(In reply to Jacob Lifshay from comment #42) > uuh, actually just utf-8 verification is done, neither utf-8 <-> utf-16 > algorithm is done, so payment should be on a subtask rather than this bug > directly moving them to a separate bugreport for future work is fine. moving the entire budget to a separate task is not.