Current confusable maps treat all pairs as equally dangerous. In practice, some pairs are indistinguishable across all common fonts (Cyrillic a / Latin a) while others only collide in specific typefaces.
Rendering confusable pairs across standard system fonts and measuring pixel similarity could produce empirically weighted confidence scores that feed into the existing risk scoring pipeline, with no runtime rendering dependency.
Prior art: GlyphNet demonstrates attention-based CNN detection on 4M rendered domain images. Their approach is domain-specific and image-based. The opportunity here is to distil rendering results into static map weights consumable at runtime.
Phases (if pursued):
- Render confusable pairs across 20-30 common system fonts, measure pixel similarity per pair per font
- Distil into confidence-weighted confusable map (replaces flat weights in
confusableDistance)
- Discover novel confusable pairs not yet in confusables.txt
- Export font-stability metadata per pair
This would be a separate offline tool that produces artifacts namespace-guard imports, not a runtime dependency.
Current confusable maps treat all pairs as equally dangerous. In practice, some pairs are indistinguishable across all common fonts (Cyrillic a / Latin a) while others only collide in specific typefaces.
Rendering confusable pairs across standard system fonts and measuring pixel similarity could produce empirically weighted confidence scores that feed into the existing risk scoring pipeline, with no runtime rendering dependency.
Prior art: GlyphNet demonstrates attention-based CNN detection on 4M rendered domain images. Their approach is domain-specific and image-based. The opportunity here is to distil rendering results into static map weights consumable at runtime.
Phases (if pursued):
confusableDistance)This would be a separate offline tool that produces artifacts namespace-guard imports, not a runtime dependency.