Check for existing issues
Describe the feature
Summary
Vale currently relies on regex and dictionary-based matching, which makes it difficult to handle inflected word forms (e.g., pluralization, conjugation, declension). This can lead to incomplete rule coverage or complex and hard-to-maintain patterns.
Problem
When defining rules, users often need to account for multiple inflected forms of the same word manually. For example, a rule targeting a base form like run will not match:
To compensate, users must either:
- enumerate all variants explicitly, or
- write complex regex patterns (e.g.,
run(s|ning)?)
This approach is error-prone, and reduces readability and maintainability of rules.
This limitation becomes significantly more severe in languages with richer morphology, such as German.
This gives a short overview about what you can expect for most languages other than English: https://youtu.be/ettP9Ayrho8?is=g0hMRMZqVJE74K8p
Minimal example (in German)
Rule:
extends: substitution
message: Consider using '%s' instead of '%s'
level: warning
ignorecase: false
swap:
gut: hervorragend
Text:
Das ist eine gute Lösung.
Current behavior:
Expected behavior:
- Match based on shared lemma (“gut” → “gute”)
For example, the adjective „gut“ can appear in many forms depending on case, gender, and number:
- gut
- gute
- guten
- gutem
- guter
Similarly, verbs like „gehen“ produce forms such as:
- gehe, gehst, geht
- ging, gegangen
Covering these via regex or explicit lists quickly becomes impractical. As a result:
- rule definitions become bloated
- important variants are easily missed
- false negatives increase significantly
This makes Vale harder to use effectively for non-English content and limits its usefulness in multilingual environments.
Discussion / possible approaches
I understand that Vale is intentionally lightweight and primarily regex/dictionary-based, and that performance and simplicity are key design goals.
With that in mind, a possible direction could be:
- We could basically do some sort of macro expansion with a syntax that triggers an expansion to all valid forms through a dictionary: "Word A" in a rule becomes "Word A - Variant 1|Word A -Variant 2| ... | Word A - Variant n". * All of that happens before the actual linting so it is only done once per linting run. Potentially, vale could even cache results for performance.
Benefits
- Simpler and more maintainable rules
- Better coverage with fewer false negatives
- Improved support for morphologically rich languages (German, Finnish, Slavic languages, etc.)
- Better usability in multilingual teams
Question
Would morphology-aware matching be considered within Vale’s scope? I could also try working on it if we agree on an approach that would be accepted as PR.
Check for existing issues
Describe the feature
Summary
Vale currently relies on regex and dictionary-based matching, which makes it difficult to handle inflected word forms (e.g., pluralization, conjugation, declension). This can lead to incomplete rule coverage or complex and hard-to-maintain patterns.
Problem
When defining rules, users often need to account for multiple inflected forms of the same word manually. For example, a rule targeting a base form like
runwill not match:To compensate, users must either:
run(s|ning)?)This approach is error-prone, and reduces readability and maintainability of rules.
This limitation becomes significantly more severe in languages with richer morphology, such as German.
This gives a short overview about what you can expect for most languages other than English: https://youtu.be/ettP9Ayrho8?is=g0hMRMZqVJE74K8p
Minimal example (in German)
Rule:
Text:
Current behavior:
Expected behavior:
For example, the adjective „gut“ can appear in many forms depending on case, gender, and number:
Similarly, verbs like „gehen“ produce forms such as:
Covering these via regex or explicit lists quickly becomes impractical. As a result:
This makes Vale harder to use effectively for non-English content and limits its usefulness in multilingual environments.
Discussion / possible approaches
I understand that Vale is intentionally lightweight and primarily regex/dictionary-based, and that performance and simplicity are key design goals.
With that in mind, a possible direction could be:
Benefits
Question
Would morphology-aware matching be considered within Vale’s scope? I could also try working on it if we agree on an approach that would be accepted as PR.