659 – SVP64 demo / unit test of CRM data-dependent ffirst mode implementing insertion sort

Bug 659 - SVP64 demo / unit test of CRM data-dependent ffirst mode implementing insertion sort

Summary: SVP64 demo / unit test of CRM data-dependent ffirst mode implementing inserti...

Status:	CONFIRMED

Alias:	None

Product:	Libre-SOC's first SoC
Classification:	Unclassified
Component:	Source Code (show other bugs)
Version:	unspecified
Hardware:	Other Linux

Importance:	--- enhancement
Assignee:	Luke Kenneth Casson Leighton

URL:

Depends on:
Blocks:	952 953
	Show dependency tree / graph

Reported:	2021-07-21 11:03 BST by Luke Kenneth Casson Leighton
Modified:	2024-02-25 02:45 GMT (History)
CC List:	1 user (show)

See Also:	672 676 897 222
NLnet milestone:	NLnet.2022-08-051.OPF
total budget (EUR) for completion of task and all subtasks:	2000
budget (EUR) for this task, excluding subtasks' budget:	2000
parent task for budget allocation:	953
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Luke Kenneth Casson Leighton 2021-07-21 11:03:09 BST

it should be possible to do this using 1<<r3 predicate
for selection and insertion, data-dependent failfirst
to do the inner loop (Reverse REMAP schedule) and a compare
xchg instruction.

failfirst will drop out with VL set to the failed element,
getvl can copy this into r3 and use 1<<r3 as the predicate
to insert key_item.

excluding LDs it should be possible to do in registers
in about 10 instructions.

def insertion_sort(array):
    for i in range(1, len(array)):
        key_item = array[i]
        j = i - 1
        while j >= 0 and array[j] > key_item:
            array[j + 1] = array[j]
            j -= 1
        array[j + 1] = key_item
    return array

Comment 1 Luke Kenneth Casson Leighton 2021-07-21 13:15:40 BST

sublists are small enough to do with insertionsort
medians can use a (new) treereduce mode, with a predicate
of 0b00100 or 1<<r3 r3=2
low/high can use Vector cmp and VCOMPRESS with CR.gt/lt
k can be computed from cntlz

sublists once calculated can be part of timsort.

def median_of_medians(A, i):

    sublists = [A[j:j+5] for j in range(0, len(A), 5)]
    medians = [sorted(sublist)[len(sublist)/2] for sublist in sublists]
    if len(medians) <= 5:
        pivot = sorted(medians)[len(medians)/2]
    else:
        pivot = median_of_medians(medians, len(medians)/2)

    low = [j for j in A if j < pivot]
    high = [j for j in A if j > pivot]

    k = len(low)
    if i < k:
        return median_of_medians(low,i)
    elif i > k:
        return median_of_medians(high,i-k-1)
    else: #pivot = k
        return pivot

#Here are some example lists you can use to see how the algorithm works
#A = [1,2,3,4,5,1000,8,9,99]
#B = [1,2,3,4,5,6]
#print median_of_medians(A, 0) #should be 1
#print median_of_medians(A,7) #should be 99
#print median_of_medians(B,4) #should be 5

Comment 2 Luke Kenneth Casson Leighton 2021-07-21 13:20:47 BST

this should be possible to do with vectorised CMP creating
a Vector of CRs.  followed by 3 CR-Predicated VCOMPRESSed
mvs, one with CR.eq, one with CR.gt, the other with CR.lt

from random import randint

def quicksort(array):
    if len(array) < 2:
        return array
    low, same, high = [], [], []
    pivot = array[randint(0, len(array) - 1)]
    for item in array:
        if item < pivot:
            low.append(item)
        elif item == pivot:
            same.append(item)
        elif item > pivot:
            high.append(item)
    return quicksort(low) + same + quicksort(high)

Comment 3 Luke Kenneth Casson Leighton 2021-07-21 16:35:46 BST

actual core of timsort looks very similar to a butterfly
schedule. note that min_run is picked to ensure merges are
as close to equal size power 2 as possible.

def timsort(array):
    min_run = 32
    n = len(array)
    for i in range(0, n, min_run):
        insertion_sort(array, i, min((i + min_run - 1), n - 1))
    size = min_run
    while size < n:
        for start in range(0, n, size * 2):
            midpoint = start + size - 1
            end = min((start + size * 2 - 1), (n-1))
            merged_array = merge(
                left=array[start:midpoint + 1],
                right=array[midpoint + 1:end + 1])
            array[start:start + len(merged_array)] = merged_array
        size *= 2

    return array

Comment 4 Jacob Lifshay 2021-09-23 02:43:38 BST

changing milestone to match parent task for budget allocation

Comment 5 Luke Kenneth Casson Leighton 2022-10-05 20:20:03 BST

inverting the order:

def insertion_sort(array):
    lim = len(array)-1
    for i in range(lim,-1,-1):
        key_item = array[i]
        j = i + 1
        while j <= lim and array[j] > key_item:
            array[j - 1] = array[j]
            j += 1
        array[j - 1] = key_item
    return array

    ctr = alen-1
    li r10, 1 # prepare mask
    sld r10, alen, r10
    addi r10, r10, -1 # all 1s. must be better way
loop:
    setvl r3, ctr
    sv.mv/m=1<<r3 key, *array   # get key item
    sld r10, 1  # shift in another zero MSB
    sv.cmp/ff=GT/m=~r10 *0, *array, key # stop cmp at 1st GT fail
    sv.mv/m=GT *array-1, *array    # after cmp and ffirst
    getvl r3
    sub r3, 1                   # reduce by one
    sv.mv/m=1<<r3 *array, key   # put key into array
    bc 16, loop                 # dec CTR, back around

Comment 6 Luke Kenneth Casson Leighton 2022-10-07 00:24:49 BST

https://github.com/thatsOven/pdqsort-python/blob/main/pdqsort.py

Comment 7 Luke Kenneth Casson Leighton 2024-02-25 02:45:12 GMT

(In reply to Luke Kenneth Casson Leighton from comment #0)
> it should be possible to do this using 1<<r3 predicate
> for selection and insertion, 

or with unary masks like in maxloc. see bug #676