normalize.py already contains a few functions to normalize C/C++ sources before hashing. So far it's very simple:
- remove comments
- remove empty lines
- pipe through
clang-format
This would mostly improve detection for embedded libraries in projects that were treated with an auto-formatter. It's not clear whether this is common at all. Before implementing this properly, try it on a large codebase, i.e. an entire distro. If it delivers at least 1 or 2 additional hits, considering adding this feature.
normalize.pyalready contains a few functions to normalize C/C++ sources before hashing. So far it's very simple:clang-formatThis would mostly improve detection for embedded libraries in projects that were treated with an auto-formatter. It's not clear whether this is common at all. Before implementing this properly, try it on a large codebase, i.e. an entire distro. If it delivers at least 1 or 2 additional hits, considering adding this feature.