in writing programs even in assembler SimpleV needs ppc64 compiler and binutils support even at a basic fundamental level. whilst not doing full optimisation this milestone allows: * convenience c/c++ wrapper macros around standard OpenPOWER v3.0B assember to create SimpleV-Vectorisation context * basic support in ppc64 binutils for SimpleV * basic support in gcc for SimpleV by first adding abstracted intrinsics and extended register files and going from there. the primary objective is to first support writing of assembly and move upwards to "correct" (non-optimised) programs, sufficient to do a much more advanced optimisation phase at a much later date.
in issue #615 i am keeping notes from various conversations with ppc binutils and gcc maintainers, as well as OPF. summary of OPF advice: an architectural fork inside gcc will not be well received due to the implication of ecosystem fragmentation. one idea came up from David to use the same trick intended for v3.1: there they intend mark entries in rs6000.md as "v3.1prefixableto64bit", and David said he would have no problem with us doing the same thing: set attribute "svp64vectoriseable". for us this would indicate that when it came to assembly output there would be a special 32bit EXT01 assembly instruction outputted at the front of any instruction marked with the attribute. Segher then suggested *redefining* the underlying data structure that is used by the macro system for representing registers. this combination effectively empowers all svp64-marked macro patterns to have a massive addition set of matching capabilities. on registers alone this would be: * RT=s RA=s RB=s * RT=v RA=s RB=s * .... * RT=v RA=v RB=v when element-width overrides are introduced these permutations multiply by 4 for source elwidth override *and another* four for dest elwidth override. when additional capabilities such as a saturation are also added, the thought of creating a macro file even one that is autogenerated with all these permutations *per macro* listed explicitly is, at best, described as insane and, frankly, stupid. a little intelligent thought shows that the pattern-matching can be done implicitly (using existing rs6000.md patterns) when marked with an appropriate attribute. this will allow us to do very basic (and i mean very basic) matching between vector patterns and svp64-attribute-marked rs6000.md macros. anything not part of a conditional if/else computation for example: straight unconditional for-loops. where it gets more complicated is anything that's computed which is to be used for a branch decision. this requires predication (like is used in arm32bit) which is not a "normal" part of ppc except in very special unique circumstances. avoiding that situation for now and simply doing unconditional for-loop expansion would still be a huge leap forward.
I observe a change with lfs. .desc = { .in1 = SVP64_IN1_SEL_RA_OR_ZERO, - .in2 = SVP64_IN2_SEL_CONST_SVD, - .in3 = SVP64_IN3_SEL_RC, + .in2 = SVP64_IN2_SEL_CONST_SI, + .in3 = SVP64_IN3_SEL_NONE, .out = SVP64_OUT_SEL_FRT, - .out2 = SVP64_OUT_SEL_NONE, + .out2 = SVP64_OUT_SEL_FRT, .cr_in = SVP64_CR_IN_SEL_NONE, .cr_out = SVP64_CR_OUT_SEL_NONE, .sv_ptype = SVP64_PTYPE_P2, - .sv_etype = SVP64_ETYPE_EXTRA3, - .sv_in1 = SVP64_EXTRA_IDX1, + .sv_etype = SVP64_ETYPE_EXTRA2, + .sv_in1 = SVP64_EXTRA_NONE, .sv_in2 = SVP64_EXTRA_NONE, .sv_in3 = SVP64_EXTRA_NONE, .sv_out = SVP64_EXTRA_IDX0, This breaks the remapping algorithm, it was not ready at all for such change. Apparently I miss how to remap this stuff. Ideas/suggestions?