While thinking about pop-count, I remembered that, once unnecessary ops are removed, prefix-sum and return the last output is a tree reduction, allowing us to replace the pop-count class with basically just a function call -- so I added tree-reduction and pop-count functions to prefix_sum.py and added a formal proof for pop-count. TODO still: change soc to use the new pop_count function in the implementation of the PopCount class.
renember there is already a treereduce function in nmutil, please do not duplicate it, it will be embarrassing to publicly announce that two identical functions are present in an EU funded project.
https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/util.py;h=5864d5edb92b8db47c752280c1fb3ae83145c698;hb=453f43543348a61fe7c1752e88e71a9ddbbe49a0#l22
(In reply to Luke Kenneth Casson Leighton from comment #2) > renember there is already a treereduce function > in nmutil, please do not duplicate it, it will > be embarrassing to publicly announce that two > identical functions are present in an EU funded > project. I didn't remember that. The version I added (tree_reduction) is still useful because iirc it matches the reduction algorithm used by SVP64 (without predication), whereas treereduce doesn't -- it instead matches the algorithm used by Arm SVE's reduction ops. also, tree_reduction is a very simple wrapper around partial_prefix_sum_ops which is a prefix sum that allows you to choose which outputs you want and eliminates unnecessary operations, leaving just the operations needed for just those outputs. an additional benefit of the version I wrote is you can easily get the full list of operations run by the tree reduction, just: list(tree_reduction_ops(...)) https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/prefix_sum.py;hb=1b02a4882e2eaacd1a2fdb31f9ee302e346b33d1#l267
(In reply to Jacob Lifshay from comment #4) > (In reply to Luke Kenneth Casson Leighton from comment #2) > > renember there is already a treereduce function > > in nmutil, please do not duplicate it, it will > > be embarrassing to publicly announce that two > > identical functions are present in an EU funded > > project. > > I didn't remember that. understandable as it is 7 lines if that, no comments and no docs or unit tests. If you can cross ref some comments,... I will do it. > The version I added (tree_reduction) is still useful > because iirc it matches the reduction algorithm used by SVP64 (without > predication), nice. Compliance with the spec there is going to be interesting given dynamic skipping https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/preduce.py;hb=HEAD > whereas treereduce doesn't -- it instead matches the algorithm > used by Arm SVE's reduction ops. great ill put that in docstrings. > also, tree_reduction is a very simple wrapper around partial_prefix_sum_ops > which is a prefix sum that allows you to choose which outputs you want and > eliminates unnecessary operations, leaving just the operations needed for > just those outputs. doing that dynamically at runtime in HDL is where it gets hair raising > an additional benefit of the version I wrote is you can easily get the full > list of operations run by the tree reduction, just: > list(tree_reduction_ops(...)) nice. if you can fix it by removing data class and keep clear of unnecessary (i.e. all) type-based assertions it'll be ok. I don't mind length based assertions which make sure mistakes that create silent errors. But if someone passes in a string instead of an int and a runtime exception occurs then great that is good enough.
after reading through the Popcount class some more, I'm realizing that it is incredibly PowerISA-specific, since it specifically operates on the packed simd types u8x8, u32x2, or u64x1 (no u16x4), rather than what most pop-count functions do: pop-count only 1 thing. Therefore, I think Popcount shouldn't be moved to nmutil, since it isn't a general utility class. The Popcount class is formally proven correct by the logical pipeline's formal proof. The more-general pop_count function I added to nmutil also has a formal correctness proof in nmutil, therefore I think this task is complete.
(In reply to Jacob Lifshay from comment #6) > after reading through the Popcount class some more, I'm realizing that it is > incredibly PowerISA-specific, since it specifically operates on the packed > simd types u8x8, u32x2, or u64x1 (no u16x4), rather than what most pop-count > functions do: pop-count only 1 thing. hmm hmm yeah i really liked how that python code absorbed the first level. ah well. needs converting at some point to use nmutil popcount but not now. > The more-general pop_count function I added to nmutil also has a formal > correctness proof in nmutil, therefore I think this task is complete. makes perfect sense. i moved pop_count() to popcount.py so that anyone looking at the top-level directory listing will go "oh, there's a popcount.py" otherwise the function pop_count() is buried in a file that people will not at first glance realise is related.