Bug 602 - low performance bare minimum functionality SIMD emulator required
Summary: low performance bare minimum functionality SIMD emulator required
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Source Code (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 241
  Show dependency treegraph
 
Reported: 2021-02-17 16:46 GMT by Luke Kenneth Casson Leighton
Modified: 2021-06-07 11:22 BST (History)
4 users (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2021-02-17 16:46:26 GMT
CRITICAL NOTE: THERE IS NO INTENTION OF ANY KIND OR IMPLICATION OF ANY KIND TO ADOPT VSX IN LIBRESOC

please do NOT use this bugreport to discuss, promote or advocate adoption of VSX in LibreSOC, in any way shape or form, including but not limited to changes in register file formats, ordering, or hardware changes of any kind that would imply that VSX *might* be adopted at some future point.

see https://www.sigarch.org/simd-instructions-considered-harmful/ and please consider SIMD literally to be harmful to implement.


the purpose of this bugreport is to act as a stop-gap measure due to an unforseen consequence of SIMD being made mandatory in ELF v2, some years ago.

ELF v1.5 (with 1.9 addendum to 64 bit) only has BE distro implementations, and major software projects (including golang) are DROPPING BE support entirely.
(correction regarding golang: the decision was delayed https://github.com/golang/go/issues/34850)

any OpenPOWER Compliant systems that choose not to implement the Optional SIMD as per the Linux Compliancy Subset in v3.0C such as Microwatt, A2O, A2I and LibreSOC are accidentally and unintentionally completely excluded from being able to run major modern distros, and with ABI changes taking estimated 3 to 5 years to propagate, there are very few options. 

* gcc / libc6 triplets require an entire new distro port (or multilib addition to an existing port), the adoption window is 18 months for binutils alone.

* VSX is a whopping 600 instructions and if unrealistic estimates of 1 instruction per day are utilised this is still 2.5 *years* of HDL development and unit tests.

consequently although performance would be terrible, an emulator is about the only workable least disruptive least costly option.

emulation using qemu or qemu-kvm (which requires hypervisor mode) is not entirely realistic: the project is too large.

an option which cuts significant time off the development cycle is to write a compiler that autogenerates c code similar to how ISACaller works.  the v3.0B pseudocode may be morphed into executable code and language-translated into c.

therefore the first priority is to extract all relevant pseudocode into the wiki at

http://libre-soc.org/openpower/isa

note the syntax hints at the above page

performance is the absolute lowest priority here.  simplicity and expediency are top priority.
Comment 1 Luke Kenneth Casson Leighton 2021-02-19 20:07:16 GMT
to be edited, who is doing which pages.  these are listed in ISA Manual v3.0B
obtain from http://ftp.libre-soc.org

* JT: page 242 through to 246, LD/ST 
* rwilbur: 246 to 252, vector pack/unpack
* vklr:

TODO:

* Vector merge 253 to 255
* Vector splat 256 to 257
* Vector permute and select 258
* vector select 259 
* vector shift 260 to 264
* vector extract 265
* vector insert 266
* vector int add 267 to 272 
* vector int sub 273 to 277
* vector int mul 279 to 282
* vector int madd 283 to 287
* vector int sum across 288 to 290
* vector int neg 291
* vector int exts 292 to 293
* vector int avg 294 to 295
* vector int absdiff 296 to 297
* vector int minmax 298 to 301 
* vector int cmp 303 to 310
* vector logical 311 to 313
* vector rot shift 314 to 319
* vector fp arith 320 to 321
* vector fp minmax 322
* vector fp round/cvt 323 to 326
* vector fp cmp 327 to 329
* vector fp estimate 330 to 331
* vector xorbased 332 to 337
* vector gather 338
* vector countlz 339
* vector counttz 340 to 341
* vector extract 342 to 343
* vector popcount 344
* vector bitperm 345
* vsx load 464 to 501
* vsx store 502 to 515
* vsx fp abs 516 to 517
* vsx fp add 518 to 525
* vsx fp cmp 526 to 536
* vsx fp sgn/cvt 537 to 566
* vsx fp div 567 to 571
* vsx fp iexp 572 to 573
* vsx fp muladd 574 to 583
* vsx fp maxmin 584 to 595
* vsx fp mulsub 596 to 603
* vsx fp mul 604 to 610
* vsx fp neg 611 to 612
* vsx fp negmul 613 to 632
* vsx fp roundint 633 to 637
* vsx fp recipest 638 to 639
* vsx scalar round 640 to 644
* vsx fp rsqrt 645 to 650
* vsx fp sub 651 to 656
* vsx fp test 657 to 661
* vsx extract 662 to 663
* vsx fp vec abs 664
* vsx fp vec add 665 to 670
* vsx fp vec cmp 671 to 676
* vsx fp vec sgn/cvt 677 to 702
* vsx fp vec div 703 to 706
* vsx fp vec expinsert 707
* vsx fp vec muladd 708 to 713
* vsx fp vec minmax 714 to 720
* vsx fp vec mulsub 721 to 727
* vsx fp vec mul 728 to 731
* vsx fp vec neg 732 to 733
* vsx fp vec mulneg 734 to 747
* vsx fp vec round 748 to 750
* vsx fp vec recipest 751 to 752
* vsx fp vec roundint 753 to 756
* vsx fp vec rsqrt/est 757 to 759
* vsx fp vec sub 760 to 763
* vsx fp vec test 764 to 768
* vsx fp vec extract 769 to 770
* vsx fp vec byterev 771 to 772
* vsx int vec extract/insert 773
* vsx vec logicops 774 to 777
* vsx vec merge 778
* vsx vec permute 779 to 780
* vsx vec shiftleft 781
* vsx vec splat 781





format must be readable by pagereader.py
https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/pseudo/pagereader.py;hb=HEAD
Comment 3 Luke Kenneth Casson Leighton 2021-02-24 20:28:59 GMT
https://libre-soc.org/openpower/isa/

i started putting in hints on the syntax at the above page.  it is also described in the Ref Manual, there are however key differences due to the Ref Manual pseudocode not actually supposed to be executable (it is now)
Comment 4 Luke Kenneth Casson Leighton 2021-02-25 00:03:31 GMT
arch/powerpc/lib/sstep.c
Comment 5 Luke Kenneth Casson Leighton 2021-02-25 21:20:20 GMT
alternative uses for the emulator:

* explicit intrinsic replacement for VSX
* dynamic loadable replacement with autodetection.
Comment 7 Luke Kenneth Casson Leighton 2021-02-26 17:06:36 GMT
i've been looking over sstep.c and it already contains most of the VSX/VMX
LOAD/STORE instructions.  for example lxvw4x
https://elixir.bootlin.com/linux/latest/source/arch/powerpc/lib/sstep.c#L2540

it also sets the precedent for what we want to do.
Comment 8 Richard Wilbur 2021-03-16 20:25:32 GMT
You mentioned ISA v3.0B:

Did you mean "PowerISA_public.v3.0B.pdf"?
Or should we go for what looks like newer documentation in "PowerISA_public.v3.0C.pdf" or "PowerISA_10_public_v3.1.pdf"?  I guess I'm not familiar with the provenance of these documents.
Comment 9 Luke Kenneth Casson Leighton 2021-03-17 10:21:50 GMT
(In reply to Richard Wilbur from comment #8)
> You mentioned ISA v3.0B:
> 
> Did you mean "PowerISA_public.v3.0B.pdf"?
> Or should we go for what looks like newer documentation in
> "PowerISA_public.v3.0C.pdf" or "PowerISA_10_public_v3.1.pdf"?  I guess I'm
> not familiar with the provenance of these documents.

they're both the same except a preamble at the front for Compliance Subsets.
Comment 10 Veera 2021-06-06 18:08:46 BST
(In reply to Luke Kenneth Casson Leighton from comment #1)
> to be edited, who is doing which pages.  these are listed in ISA Manual v3.0B
> obtain from http://ftp.libre-soc.org
> 
> * JT: page 242 through to 246, LD/ST 
> * rwilbur: 246 to 252, vector pack/unpack
> * vklr:
> 
> TODO:
> 
> * Vector merge 253 to 255
> * Vector splat 256 to 257
> * Vector permute and select 258
> * vector select 259 
> * vector shift 260 to 264
> * vector extract 265
> * vector insert 266
> * vector int add 267 to 272 
> * vsx vec merge 778
> * vsx vec permute 779 to 780
> * vsx vec shiftleft 781
> * vsx vec splat 781
> 
> 
> 
> 
> 
> format must be readable by pagereader.py
> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/decoder/pseudo/
> pagereader.py;hb=HEAD

This link is not existent. In git checkout also I can't find it.
Is this bug work still relevant?
Comment 11 Luke Kenneth Casson Leighton 2021-06-06 19:22:47 BST
(Veera please trim replies, it was not necessary to quote the 20 lines
of the instructions when discussing just the link)

(In reply to vklr@vkten.in from comment #10)

> This link is not existent.


updated.

> In git checkout also I can't find it.
> Is this bug work still relevant?

yes, we are however using OCR being developed by richard before
doing it by hand.  this will save massive amounts of time.
Comment 12 Veera 2021-06-07 08:46:29 BST
(In reply to Luke Kenneth Casson Leighton from comment #11)
> 
> > This link is not existent.
> 
> updated.
> 
> > In git checkout also I can't find it.
> > Is this bug work still relevant?
> 
> yes, we are however using OCR being developed by richard before
> doing it by hand.  this will save massive amounts of time.

Where is the updated link, I can't find?

What is OCR?

Do we have to update this page https://libre-soc.org/openpower/isa/ with pseudo code for vsx instructions or not?
Comment 13 Jacob Lifshay 2021-06-07 08:55:55 BST
(In reply to vklr@vkten.in from comment #12)
> (In reply to Luke Kenneth Casson Leighton from comment #11)
> > yes, we are however using OCR being developed by richard before
> > doing it by hand.  this will save massive amounts of time.
> 
> Where is the updated link, I can't find?
> 
> What is OCR?

https://en.wikipedia.org/wiki/Optical_character_recognition
Comment 14 Jacob Lifshay 2021-06-07 09:12:36 BST
(In reply to Luke Kenneth Casson Leighton from comment #11)
> yes, we are however using OCR being developed by richard before
> doing it by hand.  this will save massive amounts of time.

I ended up looking through Wikipedia's list of OCR programs, and I noticed Tessarect (and several others) supports outputting to hOCR format, an HTML-based format, which seems like it would be waay easier to parse than trying to manually roll-your-own text column/row/formatting detector based on Octave and FFTs...

hOCR:
http://kba.cloud/hocr-spec/1.2/
Comment 15 Luke Kenneth Casson Leighton 2021-06-07 10:53:58 BST
(In reply to Jacob Lifshay from comment #14)

> I ended up looking through Wikipedia's list of OCR programs, and I noticed
> Tessarect (and several others) supports outputting to hOCR format, an
> HTML-based format, which seems like it would be waay easier to parse than
> trying to manually roll-your-own text column/row/formatting detector based
> on Octave and FFTs

my feeling is it's better to let richard do what he's doing.
also in this particular case we don't need to know the contents of
the formatting: all that is needed is the XY WidthHeight to pass
to the OCR to extract the required text.


(In reply to vklr@vkten.in from comment #12)

> Where is the updated link, I can't find?

comment #1

> Do we have to update this page https://libre-soc.org/openpower/isa/ with
> pseudo code for vsx instructions 

yes.

we are waiting for richard.

when richard has completed his work.

he will provide the text.

it will be extracted by OCR.

the text can then be made into mdwn files.

do not attempt to extract or type out the text from the PDF yourself.
Comment 16 Jacob Lifshay 2021-06-07 11:06:05 BST
(In reply to Luke Kenneth Casson Leighton from comment #15)
> (In reply to Jacob Lifshay from comment #14)
> 
> > I ended up looking through Wikipedia's list of OCR programs, and I noticed
> > Tessarect (and several others) supports outputting to hOCR format, an
> > HTML-based format, which seems like it would be waay easier to parse than
> > trying to manually roll-your-own text column/row/formatting detector based
> > on Octave and FFTs
> 
> my feeling is it's better to let richard do what he's doing.
> also in this particular case we don't need to know the contents of
> the formatting: all that is needed is the XY WidthHeight to pass
> to the OCR to extract the required text.

yeah, mostly posting the above for richard's benefit -- if it already extracts the required information into the hOCR, why duplicate the logic when you can just use a xml processor and save tons of effort?
Comment 17 Luke Kenneth Casson Leighton 2021-06-07 11:14:49 BST
(In reply to Jacob Lifshay from comment #16)
>
> yeah, mostly posting the above for richard's benefit -- if it already
> extracts the required information into the hOCR, why duplicate the logic
> when you can just use a xml processor and save tons of effort?

because richard's efforts are only about 1000 lines long.
Comment 18 Jacob Lifshay 2021-06-07 11:22:39 BST
(In reply to Luke Kenneth Casson Leighton from comment #17)
> (In reply to Jacob Lifshay from comment #16)
> >
> > yeah, mostly posting the above for richard's benefit -- if it already
> > extracts the required information into the hOCR, why duplicate the logic
> > when you can just use a xml processor and save tons of effort?
> 
> because richard's efforts are only about 1000 lines long.

well, assuming you can use something like `jq` but for xml, it could be like 3 lines of code:
use imagemagick or something to convert pdf to list of png images
use tessarect or similar to convert pngs to hOCR
use jq-like program to extract right part

now you have the text for all the sections you care about