Skip to content

docs: Phase 2 gap analysis - code-specific retrieval (#27)#54

Merged
stevei101 merged 2 commits into
mainfrom
feature/phase2-gap-code-retrieval
May 15, 2026
Merged

docs: Phase 2 gap analysis - code-specific retrieval (#27)#54
stevei101 merged 2 commits into
mainfrom
feature/phase2-gap-code-retrieval

Conversation

@stevei101
Copy link
Copy Markdown
Contributor

Summary

Closes the design-spec deliverable for #27. Specifies how OGRE adds code semantics on top of oxidizedRAG without forking it: a CodeRetriever trait, AST-aware indexing via the existing tree-sitter feature flag, a typed dependency graph (calls / imports / references_type) persisted to data-fabric, and an impact-analysis query that joins the structural graph with text-embedding retrieval.

Picks up where Phase 1 left off (assessments/PHASE1_OXIDIZEDRAG_ASSESSMENT.md): Phase 1 confirmed tree-sitter exists in oxidizedRAG but isn't threaded through retrieval ranking; Phase 2 spec resolves that gap as an integration layer, not a core fork.

Deliverable

What it covers

What it deliberately does NOT do

  • No core changes to oxidizedRAG — the trait wraps it.
  • No prototype implementation in this PR.
  • No coverage-data ingestion design.
  • No commitment on cross-language fq-name canonicalization beyond a lang: namespacing default.

Test plan

🤖 Generated with Claude Code

aivcs and others added 2 commits May 14, 2026 10:44
Specifies how OGRE layers code semantics on top of oxidizedRAG without
forking it: a CodeRetriever trait, AST-aware indexing via existing
tree-sitter feature, a typed dependency graph (calls / imports /
references_type) in data-fabric, and an impact-analysis query that joins
both layers.

Closes the design spec deliverable for #27; prototype implementation
tracked in #33 (Phase 3 retrieval layer) and #37 (Phase 4 PR reviewer).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote fq-name canonicalization from "open question" to a foundational
decision with concrete forms per language. Add type sketches for the
CodeRetriever trait (FqName, SymbolQuery, Resolution, CallerEdge,
ImpactOptions, ImpactSet, CodeRetrieverError). Bound impact() with a
result cap, high-fan-in skip, and explicit truncated flag. Replace the
confidence-score language on unresolved edges with a categorical
Resolution. Add storage rationale for SurrealDB + data-fabric. Add a
failure model section (parse errors, index lag, ambiguous resolution,
partial writes, rename handling). Split test detection into
attribute-based (preferred) and name-based (fallback). Expand the
performance budget with 10K and 1M LOC sanity bounds and cache
assumptions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stevei101 stevei101 merged commit eb563d2 into main May 15, 2026
@stevei101 stevei101 deleted the feature/phase2-gap-code-retrieval branch May 15, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants