it should be possible to do this using 1<<r3 predicate for selection and insertion, data-dependent failfirst to do the inner loop (Reverse REMAP schedule) and a compare xchg instruction. failfirst will drop out with VL set to the failed element, getvl can copy this into r3 and use 1<<r3 as the predicate to insert key_item. excluding LDs it should be possible to do in registers in about 10 instructions. def insertion_sort(array): for i in range(1, len(array)): key_item = array[i] j = i - 1 while j >= 0 and array[j] > key_item: array[j + 1] = array[j] j -= 1 array[j + 1] = key_item return array
sublists are small enough to do with insertionsort medians can use a (new) treereduce mode, with a predicate of 0b00100 or 1<<r3 r3=2 low/high can use Vector cmp and VCOMPRESS with CR.gt/lt k can be computed from cntlz sublists once calculated can be part of timsort. def median_of_medians(A, i): sublists = [A[j:j+5] for j in range(0, len(A), 5)] medians = [sorted(sublist)[len(sublist)/2] for sublist in sublists] if len(medians) <= 5: pivot = sorted(medians)[len(medians)/2] else: pivot = median_of_medians(medians, len(medians)/2) low = [j for j in A if j < pivot] high = [j for j in A if j > pivot] k = len(low) if i < k: return median_of_medians(low,i) elif i > k: return median_of_medians(high,i-k-1) else: #pivot = k return pivot #Here are some example lists you can use to see how the algorithm works #A = [1,2,3,4,5,1000,8,9,99] #B = [1,2,3,4,5,6] #print median_of_medians(A, 0) #should be 1 #print median_of_medians(A,7) #should be 99 #print median_of_medians(B,4) #should be 5
this should be possible to do with vectorised CMP creating a Vector of CRs. followed by 3 CR-Predicated VCOMPRESSed mvs, one with CR.eq, one with CR.gt, the other with CR.lt from random import randint def quicksort(array): if len(array) < 2: return array low, same, high = [], [], [] pivot = array[randint(0, len(array) - 1)] for item in array: if item < pivot: low.append(item) elif item == pivot: same.append(item) elif item > pivot: high.append(item) return quicksort(low) + same + quicksort(high)
actual core of timsort looks very similar to a butterfly schedule. note that min_run is picked to ensure merges are as close to equal size power 2 as possible. def timsort(array): min_run = 32 n = len(array) for i in range(0, n, min_run): insertion_sort(array, i, min((i + min_run - 1), n - 1)) size = min_run while size < n: for start in range(0, n, size * 2): midpoint = start + size - 1 end = min((start + size * 2 - 1), (n-1)) merged_array = merge( left=array[start:midpoint + 1], right=array[midpoint + 1:end + 1]) array[start:start + len(merged_array)] = merged_array size *= 2 return array
changing milestone to match parent task for budget allocation
inverting the order: def insertion_sort(array): lim = len(array)-1 for i in range(lim,-1,-1): key_item = array[i] j = i + 1 while j <= lim and array[j] > key_item: array[j - 1] = array[j] j += 1 array[j - 1] = key_item return array ctr = alen-1 li r10, 1 # prepare mask sld r10, alen, r10 addi r10, r10, -1 # all 1s. must be better way loop: setvl r3, ctr sv.mv/m=1<<r3 key, *array # get key item sld r10, 1 # shift in another zero MSB sv.cmp/ff=GT/m=~r10 *0, *array, key # stop cmp at 1st GT fail sv.mv/m=GT *array-1, *array # after cmp and ffirst getvl r3 sub r3, 1 # reduce by one sv.mv/m=1<<r3 *array, key # put key into array bc 16, loop # dec CTR, back around
https://github.com/thatsOven/pdqsort-python/blob/main/pdqsort.py
(In reply to Luke Kenneth Casson Leighton from comment #0) > it should be possible to do this using 1<<r3 predicate > for selection and insertion, or with unary masks like in maxloc. see bug #676