docs: Phase 2 gap analysis - code-specific retrieval (#27) by stevei101 · Pull Request #54 · stevedores-org/ogre

stevei101 · 2026-05-14T15:58:10Z

Summary

Closes the design-spec deliverable for #27. Specifies how OGRE adds code semantics on top of oxidizedRAG without forking it: a CodeRetriever trait, AST-aware indexing via the existing tree-sitter feature flag, a typed dependency graph (calls / imports / references_type) persisted to data-fabric, and an impact-analysis query that joins the structural graph with text-embedding retrieval.

Picks up where Phase 1 left off (assessments/PHASE1_OXIDIZEDRAG_ASSESSMENT.md): Phase 1 confirmed tree-sitter exists in oxidizedRAG but isn't threaded through retrieval ranking; Phase 2 spec resolves that gap as an integration layer, not a core fork.

Deliverable

assessments/PHASE2_GAP_CODE_SPECIFIC_RETRIEVAL.md (197 lines) — design spec only. No prototype code; that ships in [Phase 3] Design OGRE Retrieval - Code-Aware Integration Layer #33 and [Phase 4] Implement PR Reviewer Agent Prototype #37.

What it covers

The 5 capabilities listed in [Phase 2] Gap: Code-Specific Retrieval #27 (AST-aware chunking, function/module retrieval, dependency tracking, impact analysis, test linkage).
A proposed CodeRetriever trait (6 methods) — five structural, one delegating to existing oxidizedRAG semantic search.
Concrete answers to the 4 open questions in [Phase 2] Gap: Code-Specific Retrieval #27 with default positions flagged for reviewer disagreement.
Performance budget (p50/p99) at 100K LOC.
Explicit non-goals (runtime call graphs, coverage ingestion, code-specific embedding model, multi-repo, on-keystroke indexing).
Dependency map onto [Phase 3] Design OGRE Core - Agent Lifecycle Engine #32, [Phase 3] Design OGRE Retrieval - Code-Aware Integration Layer #33, [Phase 2] Gap: Safe Action Execution #28, [Meta] OGRE Architecture - Open Questions & Decisions #43.

What it deliberately does NOT do

No core changes to oxidizedRAG — the trait wraps it.
No prototype implementation in this PR.
No coverage-data ingestion design.
No commitment on cross-language fq-name canonicalization beyond a lang: namespacing default.

Test plan

Markdown renders correctly on GitHub.
Cross-references to [Phase 2] Gap: Code-Specific Retrieval #27, [Phase 2] Gap: Safe Action Execution #28, [Phase 3] Design OGRE Core - Agent Lifecycle Engine #32, [Phase 3] Design OGRE Retrieval - Code-Aware Integration Layer #33, [Phase 4] Implement PR Reviewer Agent Prototype #37, [Meta] OGRE Architecture - Open Questions & Decisions #43, and Phase 1 assessment resolve.
Reviewer pass on the 4 open questions — any "no, default is wrong" responses become tracked items.
Performance targets reviewed against Phase 1's < 500 ms agent-workflow budget for consistency.

🤖 Generated with Claude Code

Specifies how OGRE layers code semantics on top of oxidizedRAG without forking it: a CodeRetriever trait, AST-aware indexing via existing tree-sitter feature, a typed dependency graph (calls / imports / references_type) in data-fabric, and an impact-analysis query that joins both layers. Closes the design spec deliverable for #27; prototype implementation tracked in #33 (Phase 3 retrieval layer) and #37 (Phase 4 PR reviewer). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Promote fq-name canonicalization from "open question" to a foundational decision with concrete forms per language. Add type sketches for the CodeRetriever trait (FqName, SymbolQuery, Resolution, CallerEdge, ImpactOptions, ImpactSet, CodeRetrieverError). Bound impact() with a result cap, high-fan-in skip, and explicit truncated flag. Replace the confidence-score language on unresolved edges with a categorical Resolution. Add storage rationale for SurrealDB + data-fabric. Add a failure model section (parse errors, index lag, ambiguous resolution, partial writes, rename handling). Split test detection into attribute-based (preferred) and name-based (fallback). Expand the performance budget with 10K and 1M LOC sanity bounds and cache assumptions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aivcs and others added 2 commits May 14, 2026 10:44

stevei101 merged commit eb563d2 into main May 15, 2026

stevei101 deleted the feature/phase2-gap-code-retrieval branch May 15, 2026 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Phase 2 gap analysis - code-specific retrieval (#27)#54

docs: Phase 2 gap analysis - code-specific retrieval (#27)#54
stevei101 merged 2 commits into
mainfrom
feature/phase2-gap-code-retrieval

stevei101 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stevei101 commented May 14, 2026

Summary

Deliverable

What it covers

What it deliberately does NOT do

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants