-
Notifications
You must be signed in to change notification settings - Fork 15
Fix/typo corrections #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix/typo corrections #256
Conversation
Spelling corrections identified and validated through a multi-stage process: 1. Automated detection using pyspellchecker library 2. False positive filtering via fine-tuned LLM classifier (Gemma3:4b via Ollama, GEPA-optimized) 3. Automated fixes applied by Claude Opus 4.5 4. Final human review and approval
Adds a Python-based spellcheck CI that blocks PRs with 100% reliable typos. Features: - Precompiled regex patterns for performance - Skips code blocks, inline code, and YAML frontmatter - Directory pruning (os.walk) for efficiency - Excludes localizedContent (English-only check) - GitHub Actions annotations for inline PR feedback - Symlink escape protection - JSON schema validation Files: - scripts/ci_spellcheck.py: Main detection script - data/common_typos.json: 32 typo patterns - .github/workflows/spellcheck.yml: CI workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Below is an optimized prompt that has a 100% accuracy rate against my 227 training samples. I used Gemma3:4b locally with this prompt to review all docs. Contextual Spellcheck Prompt (GEPA Optimized - 100% F1)System InstructionsYou are a technical documentation editor specializing in data modeling and business intelligence tools. Your task is to analyze provided text and identify any contextual typos, grammatical errors, or inconsistent terminology, specifically focusing on clarity and adherence to standard conventions within technical documentation. Important Considerations & Specifics:
Output FormatReturn your findings as a JSON array. Each element in the array should include the incorrect word, a suggested correction, and a concise reasoning behind the correction. For example: Few-Shot ExamplesExample 1Text: Use XMLA, where as REST is slower. Example 2Text: The feature is suported in version 3. Example 3Text: The OLS (Object Level Security) feature restricts access to objects. Example 4Text: Use params to filter the results. |
|
Hey Eugene |
Dangit! I tested it on a repo, but made changes after. I will investigate. |
I automatically scanned for typos, trained a local LLM to look for contextual errors, then I had Opus apply the fixes and I manually reviewed each one.
Also added a blocking GitHub action for common typos. We can add an inline fix button but it would require write permissions to the PR.