https://en.m.wikipedia.org/wiki/Counting_sort#Pseudocode function CountingSort(input, k) count ← array of k + 1 zeros output ← array of same length as input for i = 0 to length(input) - 1 do j = key(input[i]) count[j] = count[j] + 1 for i = 1 to k do count[i] = count[i] + count[i - 1] for i = length(input) - 1 down to 0 do j = key(input[i]) count[j] = count[j] - 1 output[count[j]] = input[i] return output
(In reply to Luke Kenneth Casson Leighton from comment #0) > for i = 0 to length(input) - 1 do > j = key(input[i]) > count[j] = count[j] + 1 setvl k svindex on keys sv.addi *count, *count, 1 # Indexed on count > for i = 1 to k do > count[i] = count[i] + count[i - 1] setvl k-1 sv.add/mr *count+1, *count+1, *count (or use parallel prefix sum) > for i = length(input) - 1 down to 0 do > j = key(input[i]) > count[j] = count[j] - 1 > output[count[j]] = input[i] VF mode, 2 stages: setvl len(input) svindex on keys sv.addi/mrr *count, *count, -1 # Indexed on count sv.addi/mrr *countcopy, *count # Indexed only on count 2nd stage (HF mode): svindex on countcopy: sv.addi *output, *input, 0 # Index countcopy on output first step is doing a histogram. next is cumulative sum of histogram counts. then find unique areas. finally copy which has the fully sorted order. this algorithm critically relies on Dependency Hazards within the hardware, to not mess up for example multiple REMAP-Indexed atomic increments.
# python implementation from typing import List def get_max(arr: List[int]) -> int: mx = arr[0] for i in range(1, len(arr)): if arr[i] > mx: mx = arr[i] return mx def count_sort(arr: List[int], exp: int): output = [0] * len(arr) count = [0] * 10 # Store count of occurrences in count[] for i in range(len(arr)): count[(arr[i] // exp) % 10] += 1 # Change count[i] so that count[i] now contains actual # position of this digit in output[] for i in range(1, 10): count[i] += count[i - 1] # Build the output array for i in range(len(arr) - 1, -1, -1): output[count[(arr[i] // exp) % 10] - 1] = arr[i] count[(arr[i] // exp) % 10] -= 1 # Copy the output array to arr[], so that arr[] now # contains sorted numbers according to current digit for i in range(len(arr)): arr[i] = output[i] def parallel_count_sort(arr: List[int]): m = get_max(arr) # Do counting sort for every digit. Note that instead # of passing digit number, exp is passed. exp is 10 ^ i # where i is current digit number exp = 1 while m // exp > 0: for i in range(len(arr)): count_sort(arr, exp) exp *= 10 # Driver program to test above functions def main(): arr = [170, 45, 75, 90, 802, 24, 2, 66] print("Array before sorting:") print(arr) parallel_count_sort(arr) print("Array after sorting:") print(arr) if __name__ == "__main__": main() # This code is contributed by ksam24000 https://summerofhpc.prace-ri.eu/fastest-sorting-algorithm-for-distributed-systems-parallel-radix-sort-difficulty-medium/ https://dirtyhandscoding.github.io/posts/vectorizing-small-fixed-size-sort.html