fix(kb): collapse manufacturer fragments in Knowledge rollup (#2263)#2275
fix(kb): collapse manufacturer fragments in Knowledge rollup (#2263)#2275Mikecranesync wants to merge 1 commit into
Conversation
Extend manufacturer-aliases.json with three new canonical entries: - "rockwell" → "Rockwell Automation" (bare brand in older manuals) - "automationdirect" / "automation direct" / "automationdirect.com" → "AutomationDirect" (casing/spacing variants) Apply normalizeManufacturer() in the /api/knowledge rollup after the SQL GROUP BY so that title-cased variants that survived INITCAP still collapse to a single catalog row (e.g. "Rockwell" + "Rockwell Automation" now appear as one entry). Also sync the Python OCR_VARIANT_ALIASES in manufacturer_normalize.py to keep both sides in lockstep per the existing cross-surface test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
🤖 AI Code ReviewReview by: groq (llama-3.3-70b-versatile) Review of PR #2263🔴 IMPORTANT: Security VulnerabilitiesNo hardcoded secrets, SQL injection, path traversal, or command injection vulnerabilities were found in the diff. 🔴 IMPORTANT: Missing Error HandlingNo missing error handling on network/IO operations that could crash in production were found in the diff. However, it's worth noting that the 🟡 WARNING: Logic Bugs or Incorrect AssumptionsThe normalization of manufacturer names using In 🟡 WARNING: Missing Input ValidationThe 🔵 SUGGESTION: Code Quality ImprovementsThe code is generally well-structured and readable. However, some variable names could be more descriptive. For example, The use of type aliases (e.g., ✅ GOOD: Noteworthy Good PracticesThe use of The implementation of the Generated by the MIRA automated code review pipeline (Groq → Cerebras → Gemini cascade) |
MIRA staging gate — ✅ PASSEngine + NeonDB staging branch + Groq cascade against fixed questions, graded on the 5-dimension rubric in
Rubric: |
Summary
manufacturer-aliases.jsonwith three new entries:"rockwell"→"Rockwell Automation","automationdirect"/"automation direct"/"automationdirect.com"→"AutomationDirect"normalizeManufacturer()in/api/knowledgeafter the SQLGROUP BYto re-aggregate rows that map to the same canonical name (e.g. 34K "Rockwell Automation" + 18 "Rockwell" → single "Rockwell Automation" row)OCR_VARIANT_ALIASESinmira-crawler/ingest/manufacturer_normalize.pyto stay in lockstep with the alias JSON (existing cross-surface consistency test guards this)Partial fix for #2263. Remaining work: OCR artifact fix in ingest pipeline, classifier pass on 24K Uncategorized chunks.
Root cause
The Knowledge page was grouped purely by SQL
INITCAP(LOWER(TRIM(manufacturer)))with no alias pass, so "Rockwell Automation" and "Rockwell" produced separate catalog rows. The alias map existed but was only applied at upload time and in quickstart, not in the library rollup.Test plan
/api/knowledgeresponse: verify "Rockwell Automation" row count now absorbs the former "Rockwell" row (18 extra chunks)🤖 Generated with Claude Code