Develop by Intrinsical-AI · Pull Request #58 · Intrinsical-AI/rag-prototype

Intrinsical-AI · 2026-04-04T23:44:12Z

Anchored 2.0.1 as config breaking change.
Eval
Cache Embeddings
+

…ST (#39) * chore(architecture): enforce hybrid guardrails with import-linter + AST checks * update(CHANGELOG)

) * chore(architecture): enforce hybrid guardrails with import-linter + AST checks * update(CHANGELOG)

) * chore(packaging): remove runtime pre-commit and stale package-data * breaking: harden bootstrap/eval and move probes to healthz/readyz * hotfix(ruff-format): uv run ruff format

## Summary Combined hit and rank calculation into a single loop, eliminating redundant `any(eid in relevant for eid in retrieved_ext)` check over `retrieved_ext`. **Why:** To prevent unnecessarily iterating over the retrieved results twice for every query during evaluation. **Measured Improvement:** The benchmark running 10,000 queries with 50 retrieved items each showed a reduction in execution time from ~0.0751s to ~0.0305s (an ~60% improvement) for the core loop logic.

…ync#44 (#45) * feat(rag): add elasticsearch persistence backend * feat(rag): wire elasticsearch backend through runtime and api * test(rag): cover elasticsearch backend wiring and storage semantics * docs(rag): document elasticsearch backend configuration and behavior * feat(persistence): add scope and snapshot_id columns to document stores Add nullable scope and snapshot_id fields to the SQL Document table and Elasticsearch index mappings, enabling external producers to tag documents with an origin scope and version snapshot. * feat(domain): promote scope and snapshot_id to first-class mutation fields Extend UpsertDocBuilderPort, MutationUpsertInput, and normalization/serialization to carry scope and snapshot_id. Add CanonicalImportSummary result type for reporting insert/update/delete statistics after a canonical import. * feat(use-cases): pass scope and snapshot_id through atomic and saga executors Wire the new fields from MutationUpsertInput into the upsert builder calls in both the atomic and saga mutation execution paths. * feat(persistence): persist scope/snapshot_id and implement list_external_ids_by_scope() Update SQL and Elasticsearch repositories to read/write scope and snapshot_id in upsert, change detection, and domain mapping. Add list_external_ids_by_scope() to both stores to support stale-doc deletion in canonical imports. * feat(http): add POST /api/docs/import-canonical endpoint Introduce CanonicalImportRequest/Response schemas with scope, snapshot_id, replace_scope flag, and duplicate external_id validation. Add the endpoint behind the multi-store write lock. Extend existing mutate endpoint schemas to accept scope/snapshot_id on individual upsert items. * feat(cli): add rag-import-canonical command for declarative scope sync Implement execute_import_canonical_sync() use case: batched upsert (256 docs) with optional replace_scope hard-deletion of stale documents not present in the current snapshot. Wire as rag-import-canonical CLI entry point and register in the CLI group. * docs: add canonical import examples to README and USAGE guide Document the new import-canonical HTTP endpoint and CLI command with curl and CLI invocation examples. Note scope/snapshot synchronization semantics and replace_scope behavior. * (mutate): commit pending file to last batch - missing * fix(lint-format): formatted files * feat(retrieval): add structured retrieval requests and dual-mode planning * feat(search): add remote search backends for opensearch and solr * feat(api): expose retrieval filters and dual mode through HTTP and settings * chore(quality): fix mypy target and vector index fallback * refactor(test-suite): update, expand, polish. Added tests for critical endpoints (#47) * refactor(test-suite): update, expand, polish. Better conceptual splitting, polished fixtures/conftests. Added tests for critical endpoints * [waterfall-pr-chain] feat(filter): metadata based filtering. (#48) feat(filter): metadata based filtering. Expander contract to accept metadata.<key> plus legacy fields. Updated Elastic/OpenSearch + Solr to exact resolve filtering

…ics (#50) * feat(eval): rewrite rag-eval on ir_measures - add ir-measures dependency for standard IR evaluation - replace custom hit_rate/mrr scoring with nDCG@k, MAP@k, MRR@k, P@k and Recall@k - update EvalResult and rag-eval output formatting - document rag-eval as offline IR evaluation * refactor(eval): harden dataset validation and corpus semantics - reject duplicate doc ids, blank ids and empty relevant sets at load time - fail fast when relevant ids fall outside the dataset corpus - make core evaluation own corpus filtering semantics - align app retrieval wiring with the dataset-backed corpus contract * fix(ci): satisfy ruff on eval changes - replace tuple() returns with tuple literals in eval tests - apply ruff formatting to eval-related modules and tests

- support sparse, dense, dual and hybrid retrieval modes in rag-eval - build retrievers from the production composition path - run evaluation against an isolated local runtime and ephemeral storage - keep IR metrics on top of ir_measures - add mode-specific eval config and request handling

- support exact metadata filtering against sequence values in local_split - add maintained e2e coverage for RepoGPT canonical import - add maintained e2e coverage for tr3v0r canonical import - verify retrieval through queryable metadata after native import

- validate mode-specific rag-eval flags and error paths - add focused tests for sparse, dense, dual and hybrid evaluation flows - document isolated eval runtime and official multi-mode usage - fix e2e import loading so linting passes with dynamic sibling repos

- add baseline-vs-candidate retrieval comparison on top of rag-eval - expose rag-eval-compare as a CLI command and project script - compute metric deltas and explicit pass/fail gate results - cover core, app and CLI comparison flows with targeted tests

- document rag-eval-compare usage, exit codes and output in USAGE - add a smoke script that validates both PASS and FAIL gate scenarios - include the compare command in the quick README usage examples

- emit RepoGPT code-units from the shared synergy fixture - import the canonical payload into rag-prototype - verify schema v3 and replace_scope=true - assert metadata-driven and textual retrieval paths - cover idempotent reimport and stale document cleanup via replace_scope

- add a stable RepoGPT-specific rag-eval dataset - cover sparse rag-eval against the shared RepoGPT fixture semantics - add a simple rag-eval-compare smoke with sparse baseline/candidate

- document the shared RepoGPT fixture and cross-repo demo scripts - explain how to run rag-eval on repogpt_rag_eval_v1 - document the simple rag-eval-compare smoke path

… tests SYNERGY_ROOT was incorrectly aliased to the top-level workspace root (parents[3]). Split into WORKSPACE_ROOT (the monorepo root) and SYNERGY_ROOT (WORKSPACE_ROOT / "synergy") so fixture and script paths resolve correctly under the actual directory layout. Affected: repogpt_import_flow, repogpt_ingest_search_eval, tr3v0r_import_flow, repogpt_fixture.

Adds the vuln-pilot evaluation dataset (CIRCL CVE JSONL, 30 docs) and the corresponding test coverage: - e2e: ingest → search → eval smoke against vuln_pilot_rag_eval_v1.jsonl - unit: CLI rag-eval smoke with the vuln pilot dataset Mirrors the RepoGPT coverage pattern. Cross-repo source lives at ../synergy/vuln_pilot/prepared/pilot_small_v1.jsonl.

- Fix ../scripts/ → ../synergy/scripts/ for repogpt_ingest_demo.sh and repogpt_eval_smoke.sh (mirrors the fix in e2e test path resolution). - Fix fixture and script note paths to include the synergy/ subdirectory. - Add vuln pilot eval section: rag-eval / rag-eval-compare examples, cross-repo flow (vulns_batch_triage.py, vulns_ingest_rag.py), and dataset/profile location notes.

…t target - Makefile: fix test-architecture target path tests/unit/http/test_architecture_*.py → tests/architecture/test_*.py (previous path matched 0 files; target silently ran no tests) - mkdocs.yml: fix two broken nav entries architecture/app.md → app.md (directory did not exist) custom_usage_guide.md → USAGE.md (file did not exist) mkdocs build was failing on both entries. - docs/index.md: fix broken internal link architecture/app.md → app.md - docs/app.md: correct health endpoint names /health,/ready → /healthz,/readyz and add missing POST /docs/import-canonical to docs.py router listing - README.md: document performance and performance-cpu optional extras (torch + orjson); both are included in [all] but were undocumented

Co-author attribution: MrCabss69 / Intrinsical-AI

Beta cycle closed. Documents are superseded by the test suite and CHANGELOG; no useful content to preserve.

… + SYNERGY_ROOT = WORKSPACE_ROOT / "synergy". + Añade pytest.skip(allow_module_level=True) cuando la fixture o el script cross-repo no existe. Convierte AssertionError en FileNotFoundError descriptivo en vuln_pilot_fixture. Fix CI

…gacy metadata fields Promotes normalize_filter_values, document_field_values, document_matches_filters into the domain module (retrieval.py). Adds snapshot_id to TOP_LEVEL_FILTER_FIELDS. Removes LEGACY_METADATA_FILTER_FIELDS (path, language, unit_type). Drops legacy (str, k) overload from RetrieverPort and EvalRetrieverPort — protocol is now retrieve(request: RetrievalRequest) -> RetrievalResult only. Renames list_docs_page → query_docs with filters param in DocsReadPort. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Updates test_dense_edgecases, test_hybrid_weighting, test_retrievers, test_sparse_empty, test_sparse_in_memory_cache, test_sparse_tokenization, test_composition, and the dense/hybrid e2e to use RetrievalRequest instead of the legacy (str, k) overload. Edge-case tests for blank query and top_k=0 now assert ValueError at RetrievalRequest construction.

…_ingestion, test_frontend_path. Edited test_evaluation, test_utils. Edge cases, malicious pickle, prefixes, formats

New CLI flag --run-out writes the full ranked retrieval output (one line per query) to a JSONL file. Enables post-hoc multi-run pooling and deeper result inspection without re-running retrieval. These changes reflect the exact working tree state at benchmark execution time (synergy/repogpt-ragp benchmark v1, 2026-03-18). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

dense-st (SentenceTransformers) is required for dense/hybrid retrieval in the local_split profile. Without it, benchmark re-runs fail at eval points 2, 3, 5, 6, 8, 9, 11, 12, 14, 15, 17, 18 with "embeddings backend not configured". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…arning

…ion (incoming benchmark) (#56) * feat(eval): reuse prepared workspace for batch retrieval evaluation * feat(cli): add rag-eval-batch command * fix(cli): lazy-import uvicorn in server command * docs: align runtime paths, retrieval modes, and eval coverage * docs: sync .env example with ingestion defaults * Fix eval batch typing issues * Type eval retrieval modes explicitly * Fix eval retrieval mode typing * fix(ruff): fix ruff linter

…ompute

* feat(config): add YAML-only runtime config loader * refactor(runtime): route config consumers through yaml settings * test(config): update loader and runtime coverage * docs: switch runtime docs to yaml config * fix: add pyYAML stubs for mypy hook

…d on tech_debt.md

…CP hardening

* refactor(rag): add backend-aware composition wiring * refactor(rag): extract eval CLI helpers * refactor(rag): improve CLI status output and guards * fix(rag): normalize settings and harden runtime outputs * docs(rag): update provider and API contract docs * test(e2e): skip RepoGPT flow when checkout is absent * fix(ruff): lint

* chore: stabilize Python support * refactor: clean up evaluation surface (#62) * Refactor/eval surface cleanup (#63) * refactor: clean up evaluation surface * ruff/format

…e docs (#64) * fix(docker): make production image self-contained * docs(runtime): align config and ingestion guide * chore(dev): keep dense-st sync opt-in * docs(release): add release hygiene checklist

Intrinsical-AI and others added 30 commits March 3, 2026 11:47

chore(architecture): enforce hybrid guardrails with import-linter + A…

2bd4132

…ST (#39) * chore(architecture): enforce hybrid guardrails with import-linter + AST checks * update(CHANGELOG)

chore(packaging): remove runtime pre-commit and stale package-data (#40)

0527263

feat: add importlinter for imports/dependency flow checks and gates (#41

c0b76f9

) * chore(architecture): enforce hybrid guardrails with import-linter + AST checks * update(CHANGELOG)

breaking: harden bootstrap/eval + health/readiness contract cleanup (#42

68fae74

) * chore(packaging): remove runtime pre-commit and stale package-data * breaking: harden bootstrap/eval and move probes to healthz/readyz * hotfix(ruff-format): uv run ruff format

clean: aux script with typing errors

3c55a58

update(docs)

c7fa77a

fix(synergize)

b6a59ec

docs(eval): document compare flow and add e2e smoke

c938e47

- document rag-eval-compare usage, exit codes and output in USAGE - add a smoke script that validates both PASS and FAIL gate scenarios - include the compare command in the quick README usage examples

test(eval): add RepoGPT retrieval dataset and sparse smoke coverage

9e5a2ac

- add a stable RepoGPT-specific rag-eval dataset - cover sparse rag-eval against the shared RepoGPT fixture semantics - add a simple rag-eval-compare smoke with sparse baseline/candidate

docs(eval): document RepoGPT integration pack usage

02b12ca

- document the shared RepoGPT fixture and cross-repo demo scripts - explain how to run rag-eval on repogpt_rag_eval_v1 - document the simple rag-eval-compare smoke path

chore: update LICENSE copyright holder

edca80d

Co-author attribution: MrCabss69 / Intrinsical-AI

chore: remove beta test plan and results artifacts

64bf0a3

Beta cycle closed. Documents are superseded by the test suite and CHANGELOG; no useful content to preserve.

lint(format): ruff format

b50e09a

docs(update)

074e0d9

fix(elastic): harden index bootstrap against create races

d4ddea4

lint(format): ruff format

034010c

Intrinsical-AI and others added 30 commits March 15, 2026 14:22

test: added test_atomic_io, test_id_map_json, test_domain_types, test…

fdaae6b

…_ingestion, test_frontend_path. Edited test_evaluation, test_utils. Edge cases, malicious pickle, prefixes, formats

fix(settings): settings was passed twice via kwargs, leading to errors

5826f19

fix(import-canonical): reject invalid documents before scope sync

83a2d09

fix(health): degrade index drift verification failure to actionable w…

1a27bcc

…arning

docs: add technical debt register and docs nav entry

5777546

docs: remove legacy changelog

ebe4797

fix(dx): harden settings parsing and normalize CLI docs

a8e3f23

feat(embeddings): add persistent cache and harden dense mutation prec…

0b4114c

…ompute

refactor(retrieval): unify normalized scoring across backends

b9aaf9a

hardening + doc updates

e97c925

docs(update): aligned narrative, updated commands, added roadmap base…

2d5d7b2

…d on tech_debt.md

feat(agent-surface): runtime snapshot, shared canonical import, and M…

0ed0771

…CP hardening

docs(config): clarify config.yaml as runtime source of truth

355a441

docs: refresh roadmap and debt register status dates

6a994aa

docs(up-to-date)

484742c

settings: allow perf metrics path override from environment

8a28e74

eval: persist deterministic offline workspaces

62a60d6

vector: batch dense eval rebuilds to reduce memory spikes

6ba3fdd

docs: document offline eval workspace reuse

e958375

test: align eval workspace coverage with real vector storage

d57a544

docs: refresh runtime docs and ignore generated eval workspaces

4f2d7bf

chore: align Docker healthcheck and legacy-route regressions

fccd0d2

chore: stabilize Python support (#61)

908d6fe

* chore: stabilize Python support * refactor: clean up evaluation surface (#62) * Refactor/eval surface cleanup (#63) * refactor: clean up evaluation surface * ruff/format

docs/runtime: make production runtime self-contained and align releas…

0019f38

…e docs (#64) * fix(docker): make production image self-contained * docs(runtime): align config and ingestion guide * chore(dev): keep dense-st sync opt-in * docs(release): add release hygiene checklist

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop#58

Develop#58
Intrinsical-AI wants to merge 64 commits into
masterfrom
develop

Intrinsical-AI commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Intrinsical-AI commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant