Develop#58
Open
Intrinsical-AI wants to merge 64 commits into
Open
Conversation
…ST (#39) * chore(architecture): enforce hybrid guardrails with import-linter + AST checks * update(CHANGELOG)
## Summary Combined hit and rank calculation into a single loop, eliminating redundant `any(eid in relevant for eid in retrieved_ext)` check over `retrieved_ext`. **Why:** To prevent unnecessarily iterating over the retrieved results twice for every query during evaluation. **Measured Improvement:** The benchmark running 10,000 queries with 50 retrieved items each showed a reduction in execution time from ~0.0751s to ~0.0305s (an ~60% improvement) for the core loop logic.
…ync#44 (#45) * feat(rag): add elasticsearch persistence backend * feat(rag): wire elasticsearch backend through runtime and api * test(rag): cover elasticsearch backend wiring and storage semantics * docs(rag): document elasticsearch backend configuration and behavior * feat(persistence): add scope and snapshot_id columns to document stores Add nullable scope and snapshot_id fields to the SQL Document table and Elasticsearch index mappings, enabling external producers to tag documents with an origin scope and version snapshot. * feat(domain): promote scope and snapshot_id to first-class mutation fields Extend UpsertDocBuilderPort, MutationUpsertInput, and normalization/serialization to carry scope and snapshot_id. Add CanonicalImportSummary result type for reporting insert/update/delete statistics after a canonical import. * feat(use-cases): pass scope and snapshot_id through atomic and saga executors Wire the new fields from MutationUpsertInput into the upsert builder calls in both the atomic and saga mutation execution paths. * feat(persistence): persist scope/snapshot_id and implement list_external_ids_by_scope() Update SQL and Elasticsearch repositories to read/write scope and snapshot_id in upsert, change detection, and domain mapping. Add list_external_ids_by_scope() to both stores to support stale-doc deletion in canonical imports. * feat(http): add POST /api/docs/import-canonical endpoint Introduce CanonicalImportRequest/Response schemas with scope, snapshot_id, replace_scope flag, and duplicate external_id validation. Add the endpoint behind the multi-store write lock. Extend existing mutate endpoint schemas to accept scope/snapshot_id on individual upsert items. * feat(cli): add rag-import-canonical command for declarative scope sync Implement execute_import_canonical_sync() use case: batched upsert (256 docs) with optional replace_scope hard-deletion of stale documents not present in the current snapshot. Wire as rag-import-canonical CLI entry point and register in the CLI group. * docs: add canonical import examples to README and USAGE guide Document the new import-canonical HTTP endpoint and CLI command with curl and CLI invocation examples. Note scope/snapshot synchronization semantics and replace_scope behavior. * (mutate): commit pending file to last batch - missing * fix(lint-format): formatted files * feat(retrieval): add structured retrieval requests and dual-mode planning * feat(search): add remote search backends for opensearch and solr * feat(api): expose retrieval filters and dual mode through HTTP and settings * chore(quality): fix mypy target and vector index fallback * refactor(test-suite): update, expand, polish. Added tests for critical endpoints (#47) * refactor(test-suite): update, expand, polish. Better conceptual splitting, polished fixtures/conftests. Added tests for critical endpoints * [waterfall-pr-chain] feat(filter): metadata based filtering. (#48) feat(filter): metadata based filtering. Expander contract to accept metadata.<key> plus legacy fields. Updated Elastic/OpenSearch + Solr to exact resolve filtering
…ics (#50) * feat(eval): rewrite rag-eval on ir_measures - add ir-measures dependency for standard IR evaluation - replace custom hit_rate/mrr scoring with nDCG@k, MAP@k, MRR@k, P@k and Recall@k - update EvalResult and rag-eval output formatting - document rag-eval as offline IR evaluation * refactor(eval): harden dataset validation and corpus semantics - reject duplicate doc ids, blank ids and empty relevant sets at load time - fail fast when relevant ids fall outside the dataset corpus - make core evaluation own corpus filtering semantics - align app retrieval wiring with the dataset-backed corpus contract * fix(ci): satisfy ruff on eval changes - replace tuple() returns with tuple literals in eval tests - apply ruff formatting to eval-related modules and tests
- support sparse, dense, dual and hybrid retrieval modes in rag-eval - build retrievers from the production composition path - run evaluation against an isolated local runtime and ephemeral storage - keep IR metrics on top of ir_measures - add mode-specific eval config and request handling
- support exact metadata filtering against sequence values in local_split - add maintained e2e coverage for RepoGPT canonical import - add maintained e2e coverage for tr3v0r canonical import - verify retrieval through queryable metadata after native import
- validate mode-specific rag-eval flags and error paths - add focused tests for sparse, dense, dual and hybrid evaluation flows - document isolated eval runtime and official multi-mode usage - fix e2e import loading so linting passes with dynamic sibling repos
- add baseline-vs-candidate retrieval comparison on top of rag-eval - expose rag-eval-compare as a CLI command and project script - compute metric deltas and explicit pass/fail gate results - cover core, app and CLI comparison flows with targeted tests
- document rag-eval-compare usage, exit codes and output in USAGE - add a smoke script that validates both PASS and FAIL gate scenarios - include the compare command in the quick README usage examples
- emit RepoGPT code-units from the shared synergy fixture - import the canonical payload into rag-prototype - verify schema v3 and replace_scope=true - assert metadata-driven and textual retrieval paths - cover idempotent reimport and stale document cleanup via replace_scope
- add a stable RepoGPT-specific rag-eval dataset - cover sparse rag-eval against the shared RepoGPT fixture semantics - add a simple rag-eval-compare smoke with sparse baseline/candidate
- document the shared RepoGPT fixture and cross-repo demo scripts - explain how to run rag-eval on repogpt_rag_eval_v1 - document the simple rag-eval-compare smoke path
… tests SYNERGY_ROOT was incorrectly aliased to the top-level workspace root (parents[3]). Split into WORKSPACE_ROOT (the monorepo root) and SYNERGY_ROOT (WORKSPACE_ROOT / "synergy") so fixture and script paths resolve correctly under the actual directory layout. Affected: repogpt_import_flow, repogpt_ingest_search_eval, tr3v0r_import_flow, repogpt_fixture.
Adds the vuln-pilot evaluation dataset (CIRCL CVE JSONL, 30 docs) and the corresponding test coverage: - e2e: ingest → search → eval smoke against vuln_pilot_rag_eval_v1.jsonl - unit: CLI rag-eval smoke with the vuln pilot dataset Mirrors the RepoGPT coverage pattern. Cross-repo source lives at ../synergy/vuln_pilot/prepared/pilot_small_v1.jsonl.
- Fix ../scripts/ → ../synergy/scripts/ for repogpt_ingest_demo.sh and repogpt_eval_smoke.sh (mirrors the fix in e2e test path resolution). - Fix fixture and script note paths to include the synergy/ subdirectory. - Add vuln pilot eval section: rag-eval / rag-eval-compare examples, cross-repo flow (vulns_batch_triage.py, vulns_ingest_rag.py), and dataset/profile location notes.
…t target - Makefile: fix test-architecture target path tests/unit/http/test_architecture_*.py → tests/architecture/test_*.py (previous path matched 0 files; target silently ran no tests) - mkdocs.yml: fix two broken nav entries architecture/app.md → app.md (directory did not exist) custom_usage_guide.md → USAGE.md (file did not exist) mkdocs build was failing on both entries. - docs/index.md: fix broken internal link architecture/app.md → app.md - docs/app.md: correct health endpoint names /health,/ready → /healthz,/readyz and add missing POST /docs/import-canonical to docs.py router listing - README.md: document performance and performance-cpu optional extras (torch + orjson); both are included in [all] but were undocumented
Co-author attribution: MrCabss69 / Intrinsical-AI
Beta cycle closed. Documents are superseded by the test suite and CHANGELOG; no useful content to preserve.
… + SYNERGY_ROOT = WORKSPACE_ROOT / "synergy". + Añade pytest.skip(allow_module_level=True) cuando la fixture o el script cross-repo no existe. Convierte AssertionError en FileNotFoundError descriptivo en vuln_pilot_fixture. Fix CI
…gacy metadata fields Promotes normalize_filter_values, document_field_values, document_matches_filters into the domain module (retrieval.py). Adds snapshot_id to TOP_LEVEL_FILTER_FIELDS. Removes LEGACY_METADATA_FILTER_FIELDS (path, language, unit_type). Drops legacy (str, k) overload from RetrieverPort and EvalRetrieverPort — protocol is now retrieve(request: RetrievalRequest) -> RetrievalResult only. Renames list_docs_page → query_docs with filters param in DocsReadPort. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Updates test_dense_edgecases, test_hybrid_weighting, test_retrievers, test_sparse_empty, test_sparse_in_memory_cache, test_sparse_tokenization, test_composition, and the dense/hybrid e2e to use RetrievalRequest instead of the legacy (str, k) overload. Edge-case tests for blank query and top_k=0 now assert ValueError at RetrievalRequest construction.
…_ingestion, test_frontend_path. Edited test_evaluation, test_utils. Edge cases, malicious pickle, prefixes, formats
New CLI flag --run-out writes the full ranked retrieval output (one line per query) to a JSONL file. Enables post-hoc multi-run pooling and deeper result inspection without re-running retrieval. These changes reflect the exact working tree state at benchmark execution time (synergy/repogpt-ragp benchmark v1, 2026-03-18). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dense-st (SentenceTransformers) is required for dense/hybrid retrieval in the local_split profile. Without it, benchmark re-runs fail at eval points 2, 3, 5, 6, 8, 9, 11, 12, 14, 15, 17, 18 with "embeddings backend not configured". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ion (incoming benchmark) (#56) * feat(eval): reuse prepared workspace for batch retrieval evaluation * feat(cli): add rag-eval-batch command * fix(cli): lazy-import uvicorn in server command * docs: align runtime paths, retrieval modes, and eval coverage * docs: sync .env example with ingestion defaults * Fix eval batch typing issues * Type eval retrieval modes explicitly * Fix eval retrieval mode typing * fix(ruff): fix ruff linter
* feat(config): add YAML-only runtime config loader * refactor(runtime): route config consumers through yaml settings * test(config): update loader and runtime coverage * docs: switch runtime docs to yaml config * fix: add pyYAML stubs for mypy hook
…d on tech_debt.md
* refactor(rag): add backend-aware composition wiring * refactor(rag): extract eval CLI helpers * refactor(rag): improve CLI status output and guards * fix(rag): normalize settings and harden runtime outputs * docs(rag): update provider and API contract docs * test(e2e): skip RepoGPT flow when checkout is absent * fix(ruff): lint
…e docs (#64) * fix(docker): make production image self-contained * docs(runtime): align config and ingestion guide * chore(dev): keep dense-st sync opt-in * docs(release): add release hygiene checklist
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Anchored 2.0.1 as config breaking change.
Eval
Cache Embeddings
+