Skip to content

Develop#58

Open
Intrinsical-AI wants to merge 64 commits into
masterfrom
develop
Open

Develop#58
Intrinsical-AI wants to merge 64 commits into
masterfrom
develop

Conversation

@Intrinsical-AI
Copy link
Copy Markdown
Owner

Anchored 2.0.1 as config breaking change.
Eval
Cache Embeddings
+

Intrinsical-AI and others added 30 commits March 3, 2026 11:47
…ST (#39)

* chore(architecture): enforce hybrid guardrails with import-linter + AST checks

* update(CHANGELOG)
)

* chore(architecture): enforce hybrid guardrails with import-linter + AST checks

* update(CHANGELOG)
)

* chore(packaging): remove runtime pre-commit and stale package-data

* breaking: harden bootstrap/eval and move probes to healthz/readyz

* hotfix(ruff-format): uv run ruff format
## Summary
Combined hit and rank calculation into a single loop, eliminating redundant `any(eid in relevant for eid in retrieved_ext)` check over `retrieved_ext`. 

 **Why:** To prevent unnecessarily iterating over the retrieved results twice for every query during evaluation. 

**Measured Improvement:** The benchmark running 10,000 queries with 50 retrieved items each showed a reduction in execution time from ~0.0751s to ~0.0305s (an ~60% improvement) for the core loop logic.
…ync#44 (#45)

* feat(rag): add elasticsearch persistence backend

* feat(rag): wire elasticsearch backend through runtime and api

* test(rag): cover elasticsearch backend wiring and storage semantics

* docs(rag): document elasticsearch backend configuration and behavior

* feat(persistence): add scope and snapshot_id columns to document stores

Add nullable scope and snapshot_id fields to the SQL Document table and
Elasticsearch index mappings, enabling external producers to tag documents
with an origin scope and version snapshot.

* feat(domain): promote scope and snapshot_id to first-class mutation fields

Extend UpsertDocBuilderPort, MutationUpsertInput, and normalization/serialization
to carry scope and snapshot_id. Add CanonicalImportSummary result type for
reporting insert/update/delete statistics after a canonical import.

* feat(use-cases): pass scope and snapshot_id through atomic and saga executors

Wire the new fields from MutationUpsertInput into the upsert builder calls
in both the atomic and saga mutation execution paths.

* feat(persistence): persist scope/snapshot_id and implement list_external_ids_by_scope()

Update SQL and Elasticsearch repositories to read/write scope and snapshot_id
in upsert, change detection, and domain mapping. Add list_external_ids_by_scope()
to both stores to support stale-doc deletion in canonical imports.

* feat(http): add POST /api/docs/import-canonical endpoint

Introduce CanonicalImportRequest/Response schemas with scope, snapshot_id,
replace_scope flag, and duplicate external_id validation. Add the endpoint
behind the multi-store write lock. Extend existing mutate endpoint schemas
to accept scope/snapshot_id on individual upsert items.

* feat(cli): add rag-import-canonical command for declarative scope sync

Implement execute_import_canonical_sync() use case: batched upsert (256 docs)
with optional replace_scope hard-deletion of stale documents not present in
the current snapshot. Wire as rag-import-canonical CLI entry point and register
in the CLI group.

* docs: add canonical import examples to README and USAGE guide

Document the new import-canonical HTTP endpoint and CLI command with curl
and CLI invocation examples. Note scope/snapshot synchronization semantics
and replace_scope behavior.

* (mutate): commit pending file to last batch - missing

* fix(lint-format): formatted files

* feat(retrieval): add structured retrieval requests and dual-mode planning

* feat(search): add remote search backends for opensearch and solr

* feat(api): expose retrieval filters and dual mode through HTTP and settings

* chore(quality): fix mypy target and vector index fallback

* refactor(test-suite): update, expand, polish. Added tests for critical endpoints (#47)

* refactor(test-suite): update, expand, polish. Better conceptual splitting, polished fixtures/conftests. Added tests for critical endpoints

* [waterfall-pr-chain] feat(filter): metadata based filtering.  (#48)

feat(filter): metadata based filtering. Expander contract to accept metadata.<key> plus legacy fields. Updated Elastic/OpenSearch + Solr to exact resolve filtering
…ics (#50)

* feat(eval): rewrite rag-eval on ir_measures

- add ir-measures dependency for standard IR evaluation
- replace custom hit_rate/mrr scoring with nDCG@k, MAP@k, MRR@k, P@k and Recall@k
- update EvalResult and rag-eval output formatting
- document rag-eval as offline IR evaluation

* refactor(eval): harden dataset validation and corpus semantics

- reject duplicate doc ids, blank ids and empty relevant sets at load time
- fail fast when relevant ids fall outside the dataset corpus
- make core evaluation own corpus filtering semantics
- align app retrieval wiring with the dataset-backed corpus contract

* fix(ci): satisfy ruff on eval changes

- replace tuple() returns with tuple literals in eval tests
- apply ruff formatting to eval-related modules and tests
- support sparse, dense, dual and hybrid retrieval modes in rag-eval
- build retrievers from the production composition path
- run evaluation against an isolated local runtime and ephemeral storage
- keep IR metrics on top of ir_measures
- add mode-specific eval config and request handling
- support exact metadata filtering against sequence values in local_split
- add maintained e2e coverage for RepoGPT canonical import
- add maintained e2e coverage for tr3v0r canonical import
- verify retrieval through queryable metadata after native import
- validate mode-specific rag-eval flags and error paths
- add focused tests for sparse, dense, dual and hybrid evaluation flows
- document isolated eval runtime and official multi-mode usage
- fix e2e import loading so linting passes with dynamic sibling repos
- add baseline-vs-candidate retrieval comparison on top of rag-eval
- expose rag-eval-compare as a CLI command and project script
- compute metric deltas and explicit pass/fail gate results
- cover core, app and CLI comparison flows with targeted tests
- document rag-eval-compare usage, exit codes and output in USAGE
- add a smoke script that validates both PASS and FAIL gate scenarios
- include the compare command in the quick README usage examples
- emit RepoGPT code-units from the shared synergy fixture
- import the canonical payload into rag-prototype
- verify schema v3 and replace_scope=true
- assert metadata-driven and textual retrieval paths
- cover idempotent reimport and stale document cleanup via replace_scope
- add a stable RepoGPT-specific rag-eval dataset
- cover sparse rag-eval against the shared RepoGPT fixture semantics
- add a simple rag-eval-compare smoke with sparse baseline/candidate
- document the shared RepoGPT fixture and cross-repo demo scripts
- explain how to run rag-eval on repogpt_rag_eval_v1
- document the simple rag-eval-compare smoke path
… tests

SYNERGY_ROOT was incorrectly aliased to the top-level workspace root
(parents[3]). Split into WORKSPACE_ROOT (the monorepo root) and
SYNERGY_ROOT (WORKSPACE_ROOT / "synergy") so fixture and script paths
resolve correctly under the actual directory layout.

Affected: repogpt_import_flow, repogpt_ingest_search_eval, tr3v0r_import_flow,
repogpt_fixture.
Adds the vuln-pilot evaluation dataset (CIRCL CVE JSONL, 30 docs) and
the corresponding test coverage:
- e2e: ingest → search → eval smoke against vuln_pilot_rag_eval_v1.jsonl
- unit: CLI rag-eval smoke with the vuln pilot dataset

Mirrors the RepoGPT coverage pattern. Cross-repo source lives at
../synergy/vuln_pilot/prepared/pilot_small_v1.jsonl.
- Fix ../scripts/ → ../synergy/scripts/ for repogpt_ingest_demo.sh and
  repogpt_eval_smoke.sh (mirrors the fix in e2e test path resolution).
- Fix fixture and script note paths to include the synergy/ subdirectory.
- Add vuln pilot eval section: rag-eval / rag-eval-compare examples,
  cross-repo flow (vulns_batch_triage.py, vulns_ingest_rag.py), and
  dataset/profile location notes.
…t target

- Makefile: fix test-architecture target path
  tests/unit/http/test_architecture_*.py → tests/architecture/test_*.py
  (previous path matched 0 files; target silently ran no tests)
- mkdocs.yml: fix two broken nav entries
  architecture/app.md → app.md (directory did not exist)
  custom_usage_guide.md → USAGE.md (file did not exist)
  mkdocs build was failing on both entries.
- docs/index.md: fix broken internal link architecture/app.md → app.md
- docs/app.md: correct health endpoint names /health,/ready → /healthz,/readyz
  and add missing POST /docs/import-canonical to docs.py router listing
- README.md: document performance and performance-cpu optional extras
  (torch + orjson); both are included in [all] but were undocumented
Co-author attribution: MrCabss69 / Intrinsical-AI
Beta cycle closed. Documents are superseded by the test suite and
CHANGELOG; no useful content to preserve.
… + SYNERGY_ROOT = WORKSPACE_ROOT / "synergy". + Añade pytest.skip(allow_module_level=True) cuando la fixture o el script cross-repo no existe. Convierte AssertionError en FileNotFoundError descriptivo en vuln_pilot_fixture. Fix CI
…gacy metadata fields

Promotes normalize_filter_values, document_field_values, document_matches_filters
into the domain module (retrieval.py). Adds snapshot_id to TOP_LEVEL_FILTER_FIELDS.
Removes LEGACY_METADATA_FILTER_FIELDS (path, language, unit_type). Drops legacy
(str, k) overload from RetrieverPort and EvalRetrieverPort — protocol is now
retrieve(request: RetrievalRequest) -> RetrievalResult only. Renames list_docs_page
→ query_docs with filters param in DocsReadPort.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Intrinsical-AI and others added 30 commits March 15, 2026 14:22
Updates test_dense_edgecases, test_hybrid_weighting, test_retrievers,
test_sparse_empty, test_sparse_in_memory_cache, test_sparse_tokenization,
test_composition, and the dense/hybrid e2e to use RetrievalRequest instead
of the legacy (str, k) overload. Edge-case tests for blank query and
top_k=0 now assert ValueError at RetrievalRequest construction.
…_ingestion, test_frontend_path. Edited test_evaluation, test_utils. Edge cases, malicious pickle, prefixes, formats
New CLI flag --run-out writes the full ranked retrieval output (one line
per query) to a JSONL file. Enables post-hoc multi-run pooling and
deeper result inspection without re-running retrieval.

These changes reflect the exact working tree state at benchmark execution
time (synergy/repogpt-ragp benchmark v1, 2026-03-18).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dense-st (SentenceTransformers) is required for dense/hybrid retrieval
in the local_split profile. Without it, benchmark re-runs fail at eval
points 2, 3, 5, 6, 8, 9, 11, 12, 14, 15, 17, 18 with "embeddings
backend not configured".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion (incoming benchmark) (#56)

* feat(eval): reuse prepared workspace for batch retrieval evaluation

* feat(cli): add rag-eval-batch command

* fix(cli): lazy-import uvicorn in server command

* docs: align runtime paths, retrieval modes, and eval coverage

* docs: sync .env example with ingestion defaults

* Fix eval batch typing issues

* Type eval retrieval modes explicitly

* Fix eval retrieval mode typing

* fix(ruff): fix ruff linter
* feat(config): add YAML-only runtime config loader

* refactor(runtime): route config consumers through yaml settings

* test(config): update loader and runtime coverage

* docs: switch runtime docs to yaml config

* fix: add pyYAML stubs for mypy hook
* refactor(rag): add backend-aware composition wiring

* refactor(rag): extract eval CLI helpers

* refactor(rag): improve CLI status output and guards

* fix(rag): normalize settings and harden runtime outputs

* docs(rag): update provider and API contract docs

* test(e2e): skip RepoGPT flow when checkout is absent

* fix(ruff): lint
* chore: stabilize Python support

* refactor: clean up evaluation surface (#62)

* Refactor/eval surface cleanup (#63)

* refactor: clean up evaluation surface

* ruff/format
…e docs (#64)

* fix(docker): make production image self-contained

* docs(runtime): align config and ingestion guide

* chore(dev): keep dense-st sync opt-in

* docs(release): add release hygiene checklist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant