Bug 1078 - svp64 counting sort demo
Summary: svp64 counting sort demo
Status: CONFIRMED
Alias: None
Product: Libre-SOC's first SoC
Classification: Unclassified
Component: Documentation (show other bugs)
Version: unspecified
Hardware: Other Linux
: --- enhancement
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks: 953
  Show dependency treegraph
 
Reported: 2023-05-05 11:52 BST by Luke Kenneth Casson Leighton
Modified: 2023-05-05 15:54 BST (History)
1 user (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kenneth Casson Leighton 2023-05-05 11:52:36 BST
https://en.m.wikipedia.org/wiki/Counting_sort#Pseudocode

function CountingSort(input, k)
    
    count ← array of k + 1 zeros
    output ← array of same length as input
    
    for i = 0 to length(input) - 1 do
        j = key(input[i])
        count[j] = count[j] + 1

    for i = 1 to k do
        count[i] = count[i] + count[i - 1]

    for i = length(input) - 1 down to 0 do
        j = key(input[i])
        count[j] = count[j] - 1
        output[count[j]] = input[i]

    return output
Comment 1 Luke Kenneth Casson Leighton 2023-05-05 12:07:12 BST
(In reply to Luke Kenneth Casson Leighton from comment #0)

>     for i = 0 to length(input) - 1 do
>         j = key(input[i])
>         count[j] = count[j] + 1

    setvl k
    svindex on keys
    sv.addi *count, *count, 1 # Indexed on count

>     for i = 1 to k do
>         count[i] = count[i] + count[i - 1]

    setvl k-1
    sv.add/mr *count+1, *count+1, *count
    (or use parallel prefix sum)

>     for i = length(input) - 1 down to 0 do
>         j = key(input[i])
>         count[j] = count[j] - 1
>         output[count[j]] = input[i]

VF mode, 2 stages:

    setvl len(input)
    svindex on keys
    sv.addi/mrr *count, *count, -1 # Indexed on count
    sv.addi/mrr *countcopy, *count # Indexed only on count

2nd stage (HF mode):

    svindex on countcopy:
    sv.addi *output, *input, 0 # Index countcopy on output

first step is doing a histogram. next is cumulative sum of histogram
counts.  then find unique areas. finally copy which has the fully
sorted order.

this algorithm critically relies on Dependency Hazards within the
hardware, to not mess up for example multiple REMAP-Indexed atomic
increments.
Comment 2 Luke Kenneth Casson Leighton 2023-05-05 15:54:03 BST
# python implementation
from typing import List

def get_max(arr: List[int]) -> int:
	mx = arr[0]
	for i in range(1, len(arr)):
		if arr[i] > mx:
			mx = arr[i]
	return mx

def count_sort(arr: List[int], exp: int):
	output = [0] * len(arr)
	count = [0] * 10

	# Store count of occurrences in count[]
	for i in range(len(arr)):
		count[(arr[i] // exp) % 10] += 1

	# Change count[i] so that count[i] now contains actual
	# position of this digit in output[]
	for i in range(1, 10):
		count[i] += count[i - 1]

	# Build the output array
	for i in range(len(arr) - 1, -1, -1):
		output[count[(arr[i] // exp) % 10] - 1] = arr[i]
		count[(arr[i] // exp) % 10] -= 1

	# Copy the output array to arr[], so that arr[] now
	# contains sorted numbers according to current digit
	for i in range(len(arr)):
		arr[i] = output[i]


def parallel_count_sort(arr: List[int]):
	m = get_max(arr)

	# Do counting sort for every digit. Note that instead
	# of passing digit number, exp is passed. exp is 10 ^ i
	# where i is current digit number
	exp = 1
	while m // exp > 0:
		for i in range(len(arr)):
			count_sort(arr, exp)
		exp *= 10

# Driver program to test above functions
def main():
	arr = [170, 45, 75, 90, 802, 24, 2, 66]
	print("Array before sorting:")
	print(arr)

	parallel_count_sort(arr)

	print("Array after sorting:")
	print(arr)

if __name__ == "__main__":
	main()

# This code is contributed by ksam24000

https://summerofhpc.prace-ri.eu/fastest-sorting-algorithm-for-distributed-systems-parallel-radix-sort-difficulty-medium/

https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html