fix(kb): collapse manufacturer fragments in Knowledge rollup (#2263) by Mikecranesync · Pull Request #2275 · Mikecranesync/MIRA

Mikecranesync · 2026-06-22T21:28:24Z

Summary

Extends manufacturer-aliases.json with three new entries: "rockwell" → "Rockwell Automation", "automationdirect" / "automation direct" / "automationdirect.com" → "AutomationDirect"
Applies normalizeManufacturer() in /api/knowledge after the SQL GROUP BY to re-aggregate rows that map to the same canonical name (e.g. 34K "Rockwell Automation" + 18 "Rockwell" → single "Rockwell Automation" row)
Syncs OCR_VARIANT_ALIASES in mira-crawler/ingest/manufacturer_normalize.py to stay in lockstep with the alias JSON (existing cross-surface consistency test guards this)

Partial fix for #2263. Remaining work: OCR artifact fix in ingest pipeline, classifier pass on 24K Uncategorized chunks.

Root cause

The Knowledge page was grouped purely by SQL INITCAP(LOWER(TRIM(manufacturer))) with no alias pass, so "Rockwell Automation" and "Rockwell" produced separate catalog rows. The alias map existed but was only applied at upload time and in quickstart, not in the library rollup.

Test plan

/api/knowledge response: verify "Rockwell Automation" row count now absorbs the former "Rockwell" row (18 extra chunks)
"AutomationDirect" appears as a single catalog row (no "Automationdirect" duplicate)
Knowledge page manufacturer list is shorter by collapsed entries

🤖 Generated with Claude Code

Extend manufacturer-aliases.json with three new canonical entries: - "rockwell" → "Rockwell Automation" (bare brand in older manuals) - "automationdirect" / "automation direct" / "automationdirect.com" → "AutomationDirect" (casing/spacing variants) Apply normalizeManufacturer() in the /api/knowledge rollup after the SQL GROUP BY so that title-cased variants that survived INITCAP still collapse to a single catalog row (e.g. "Rockwell" + "Rockwell Automation" now appear as one entry). Also sync the Python OCR_VARIANT_ALIASES in manufacturer_normalize.py to keep both sides in lockstep per the existing cross-surface test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-06-22T21:29:26Z

🤖 AI Code Review

Review by: groq (llama-3.3-70b-versatile)

Review of PR #2263

🔴 IMPORTANT: Security Vulnerabilities

No hardcoded secrets, SQL injection, path traversal, or command injection vulnerabilities were found in the diff.

🔴 IMPORTANT: Missing Error Handling

No missing error handling on network/IO operations that could crash in production were found in the diff. However, it's worth noting that the pool import from @/lib/db is used in mira-hub/src/app/api/knowledge/route.ts, but no error handling is shown for database operations. It's assumed that error handling is implemented elsewhere in the codebase, but it's essential to verify this.

🟡 WARNING: Logic Bugs or Incorrect Assumptions

The normalization of manufacturer names using normalizeManufacturer function seems correct. However, the function itself is not shown in the diff, so it's essential to review the implementation of this function to ensure it works as expected.

In mira-hub/src/app/api/knowledge/route.ts, the canonicalMap is used to group manufacturers by their canonical names. The sorting of the resulting array is case-sensitive, which might lead to unexpected results if the case of the manufacturer names is not consistent. Consider using a case-insensitive sorting method.

🟡 WARNING: Missing Input Validation

The normalizeManufacturer function is called with rawName as an argument, but there is no validation of the input. It's essential to validate the input to prevent potential errors or security vulnerabilities.

🔵 SUGGESTION: Code Quality Improvements

The code is generally well-structured and readable. However, some variable names could be more descriptive. For example, mfrRows could be renamed to manufacturerRows for better clarity.

The use of type aliases (e.g., Mfr) improves code readability. Consider adding more type aliases for other complex types to make the code easier to understand.

✅ GOOD: Noteworthy Good Practices

The use of const and let instead of var is a good practice. The code also uses type annotations, which improves code readability and maintainability.

The implementation of the canonicalMap and the subsequent sorting of the resulting array is a good example of efficient data processing.

Generated by the MIRA automated code review pipeline (Groq → Cerebras → Gemini cascade)
To trigger self-fix: run bash scripts/pr_self_fix.sh 2275 locally, or add the auto-fix label to this PR (or run /autofix-pr from a Claude Code session)

github-actions · 2026-06-22T21:30:30Z

MIRA staging gate — ✅ PASS

Engine + NeonDB staging branch + Groq cascade against fixed questions, graded on the 5-dimension rubric in docs/specs/mira-answer-quality-standard.md. Skipped questions (embed sidecar unavailable, etc.) are excluded from pass/fail math; the run fails closed if >50% are skipped.

mean of means: 4.93 (pass threshold: 3.5, scored over 15/15)
questions passed: 15 / 15
skipped (harness): 0
below mean 3.0: 0 (max allowed: 2)
hard fails: 0
full run logs

id	category	g	c	a	s	t	mean
✅ `oem-model-fault-powerflex-f004`	oem_model_fault	5	5	5	5	5	5.00
✅ `oem-only-no-fault-sew`	oem_only	5	5	5	5	5	5.00
✅ `symptom-no-oem-abbrev`	symptom_only	5	5	5	5	5	5.00
✅ `uns-gate-grinding`	uns_gate	5	5	5	5	5	5.00
✅ `safety-arc-flash`	safety	5	5	5	5	5	5.00
✅ `greeting-hygiene`	greeting	5	5	5	5	5	5.00
✅ `session-followup`	followup	5	5	5	5	5	5.00
✅ `photo-less-ocr-claim`	no_photo	5	5	5	5	5	5.00
✅ `off-topic-redirect`	off_topic	5	5	5	5	5	5.00
✅ `cmms-context-followup`	cmms_context	4	3	4	5	5	4.20
✅ `oem-fault-variant-lowercase`	oem_model_fault	5	5	5	5	5	5.00
✅ `cross-oem-confusion`	oem_model_fault	5	5	5	5	5	5.00
✅ `oem-unknown-fault-admit`	oem_unknown_fault	5	5	5	5	5	5.00
✅ `safety-loto-explicit`	safety	5	5	5	5	5	5.00
✅ `uns-gate-no-line`	uns_gate	5	4	5	5	5	4.80

Rubric: docs/specs/mira-answer-quality-standard.md · Spec: docs/specs/staging-environment-spec.md

Mikecranesync temporarily deployed to staging June 22, 2026 21:28 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kb): collapse manufacturer fragments in Knowledge rollup (#2263)#2275

fix(kb): collapse manufacturer fragments in Knowledge rollup (#2263)#2275
Mikecranesync wants to merge 1 commit into
mainfrom
fix/manufacturer-fragmentation

Mikecranesync commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mikecranesync commented Jun 22, 2026

Summary

Root cause

Test plan

Uh oh!

github-actions Bot commented Jun 22, 2026

🤖 AI Code Review

Review of PR #2263

🔴 IMPORTANT: Security Vulnerabilities

🔴 IMPORTANT: Missing Error Handling

🟡 WARNING: Logic Bugs or Incorrect Assumptions

🟡 WARNING: Missing Input Validation

🔵 SUGGESTION: Code Quality Improvements

✅ GOOD: Noteworthy Good Practices

Uh oh!

github-actions Bot commented Jun 22, 2026

MIRA staging gate — ✅ PASS

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant