Evaluation to Use Config Management #477

avirajsingh7 · 2025-12-10T04:44:18Z

Summary

This change refactors the evaluation run process to utilize a stored configuration instead of a configuration dictionary. It introduces fields for config_id, config_version, and model in the evaluation run table, streamlining the evaluation process and improving data integrity.

Checklist

Before submitting a pull request, please ensure that you mark these tasks.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Summary by CodeRabbit

New Features
- Evaluation runs now track and store the LLM model used for each evaluation.
- Configuration is now referenced by ID and version instead of storing complete configurations inline, improving efficiency and maintainability.
Bug Fixes
- Enhanced validation and error handling for missing or invalid configurations during evaluation setup.
Chores
- Database schema updated to support configuration references.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-10T04:44:28Z

📝 Walkthrough

Walkthrough

The PR refactors evaluation configuration handling to use stored configuration references instead of inline config objects. The evaluate endpoint now accepts config_id and config_version parameters, resolving the stored configuration via ConfigVersionCrud and resolve_config_blob. The EvaluationRun model is updated to store references and a resolved model field. Database schema is migrated with appropriate foreign keys. Tests are updated to use the new configuration reference pattern.

Changes

Cohort / File(s)	Summary
Evaluation API & Route Handler `backend/app/api/routes/evaluation.py`	Refactored `evaluate()` method to accept `config_id: UUID` and `config_version: int` instead of `config: dict` and `assistant_id`. Added config resolution via `ConfigVersionCrud`, provider validation (OPENAI only), and HTTP error handling for missing/invalid configs. Updated batch evaluation to use resolved config parameters.
Core CRUD Operations `backend/app/crud/evaluations/core.py`	Updated `create_evaluation_run()` signature to accept `config_id` and `config_version` instead of `config` dict. Added new `resolve_model_from_config()` function to extract model name from stored configuration. Updated logging and docstrings. Added imports for `UUID`, `ConfigVersionCrud`, `LLMCallConfig`, and `resolve_config_blob`.
Data Model Definitions `backend/app/models/evaluation.py`	Replaced `config: dict[str, Any]` with `config_id: UUID \| None`, `config_version: int \| None`, and new `model: str \| None` field in both `EvaluationRun` and `EvaluationRunPublic` to reflect stored config references and resolved model.
Processing & Embeddings `backend/app/crud/evaluations/processing.py`, `backend/app/crud/evaluations/embeddings.py`	Updated model resolution in processing flow to use new `resolve_model_from_config()`. Hard-coded embedding model to `"text-embedding-3-large"` in embeddings batch handler, removing dynamic retrieval.
Module Exports `backend/app/crud/evaluations/__init__.py`	Added `resolve_model_from_config` to public API exports via `__all__` list.
Database Migration `backend/app/alembic/versions/041_add_config_in_evals_run_table.py`	New migration adds `config_id` (UUID, foreign key to config table) and `config_version` (Integer) columns to evaluation_run table; removes legacy `config` JSONB column. Includes downgrade path.
Test Suite `backend/app/tests/api/routes/test_evaluation.py`	Updated test cases to create test configs via `create_test_config()` and reference via `config_id`/`config_version` instead of embedding full config objects. Added `uuid4` usage for negative test scenarios. Updated error message assertions for config-not-found handling.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Unified API: Add support for Kaapi Abstracted LLM Call #498 – Introduces LLMCallConfig and resolve_config_blob types/utilities that are now imported and used for config resolution in this PR.
Evaluation #405 – Directly modifies the same evaluation routes, CRUD functions, models, and processing logic touched by this refactoring.
Kaapi v1.0: Database Comments #476 – Updates the same model files (backend/app/models/evaluation.py) and config-related schema, overlapping on evaluation/config field definitions.

Suggested reviewers

Prajna1999
kartpop

Poem

🐰 Hops with glee
Config stored, no more to carry,
Just an ID, oh how merry!
Versions tracked with careful care,
References bloom through the air! ✨
Resolution flows so clean and bright,
Configurations bundled just right! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main refactoring: evaluation runs now use stored configuration management (config_id/config_version) instead of inline config dicts.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (4)

backend/app/crud/evaluations/embeddings.py (1)
366-367: Misleading comment - update to reflect actual behavior.

The comment says "Get embedding model from config" but the code hardcodes the value. Update the comment to accurately describe the implementation.
-        # Get embedding model from config (default: text-embedding-3-large)
-        embedding_model = "text-embedding-3-large"
+        # Use fixed embedding model (text-embedding-3-large)
+        embedding_model = "text-embedding-3-large"
backend/app/tests/api/routes/test_evaluation.py (1)
524-545: Consider renaming function to match its new purpose.

The function test_start_batch_evaluation_missing_model was repurposed to test invalid config_id scenarios. The docstring was updated but the function name still references "missing_model". Consider renaming for clarity.
-    def test_start_batch_evaluation_missing_model(self, client, user_api_key_header):
-        """Test batch evaluation fails with invalid config_id."""
+    def test_start_batch_evaluation_invalid_config_id(self, client, user_api_key_header):
+        """Test batch evaluation fails with invalid config_id."""
backend/app/api/routes/evaluation.py (1)
499-510: Consider validating that model is present in config params.

The model is extracted with .get("model") which returns None if not present. Since model is critical for cost tracking (used in create_langfuse_dataset_run), consider validating its presence and returning an error if missing.
     # Extract model from config for storage
     model = config.completion.params.get("model")
+    if not model:
+        raise HTTPException(
+            status_code=400,
+            detail="Config must specify a 'model' in completion params for evaluation",
+        )

     # Create EvaluationRun record with config references
backend/app/crud/evaluations/core.py (1)
15-69: Config-based create_evaluation_run refactor is correctly implemented; consider logging model for improved traceability.

The refactor from inline config dict to config_id: UUID and config_version: int is properly implemented throughout:

The sole call site in backend/app/api/routes/evaluation.py:503 correctly passes all new parameters with the right types (config_id as UUID, config_version as int, model extracted from config).

The EvaluationRun model in backend/app/models/evaluation.py correctly defines all three fields with appropriate types and descriptions.

All type hints align with Python 3.11+ guidelines.

One suggested improvement for debugging:

Include model in the creation log for better traceability when correlating evaluation runs with model versions:
logger.info(
    f"Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, "
-   f"config_id={config_id}, config_version={config_version}"
+   f"config_id={config_id}, config_version={config_version}, model={model}"
)
Since the model is already extracted at the call site and passed to the function, including it in the log will provide fuller context for operational debugging without any additional cost.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 30ef268 and d5f9d4d.

📒 Files selected for processing (7)

backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py (1 hunks)
backend/app/api/routes/evaluation.py (5 hunks)
backend/app/crud/evaluations/core.py (5 hunks)
backend/app/crud/evaluations/embeddings.py (1 hunks)
backend/app/crud/evaluations/processing.py (1 hunks)
backend/app/models/evaluation.py (3 hunks)
backend/app/tests/api/routes/test_evaluation.py (5 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/api/routes/evaluation.py
backend/app/models/evaluation.py
backend/app/crud/evaluations/embeddings.py
backend/app/tests/api/routes/test_evaluation.py
backend/app/crud/evaluations/processing.py
backend/app/crud/evaluations/core.py
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py

backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Expose FastAPI REST endpoints under backend/app/api/ organized by domain

Files:

backend/app/api/routes/evaluation.py

backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

backend/app/models/evaluation.py

backend/app/crud/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Implement database access operations in backend/app/crud/

Files:

backend/app/crud/evaluations/embeddings.py
backend/app/crud/evaluations/processing.py
backend/app/crud/evaluations/core.py

🧬 Code graph analysis (2)

backend/app/tests/api/routes/test_evaluation.py (2)

backend/app/crud/evaluations/batch.py (1)

build_evaluation_jsonl (62-115)

backend/app/models/evaluation.py (2)

EvaluationDataset (74-130)

EvaluationRun (133-248)

backend/app/crud/evaluations/processing.py (1)

backend/app/crud/evaluations/langfuse.py (1)

create_langfuse_dataset_run (20-163)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (3)

backend/app/crud/evaluations/processing.py (1)

257-264: LGTM! Clean refactor to use stored model field.

The change correctly retrieves the model from eval_run.model instead of extracting it from config. This aligns with the new data model where the model is snapshotted at evaluation creation time.

backend/app/models/evaluation.py (1)

148-157: LGTM! Well-structured config reference fields.

The new config_id and config_version fields properly establish the relationship to stored configs with appropriate constraints (ge=1 for version). The nullable design allows backward compatibility with existing data.

backend/app/api/routes/evaluation.py (1)

478-495: LGTM! Robust config resolution with provider validation.

The config resolution flow properly validates that the stored config exists and uses the OPENAI provider. Error handling returns appropriate HTTP 400 responses with descriptive messages.

backend/app/alembic/versions/040_add_config_in_evals_run_table.py

backend/app/models/evaluation.py

codecov · 2025-12-10T04:48:57Z

Codecov Report

❌ Patch coverage is 56.09756% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/evaluations/core.py	35.71%	9 Missing ⚠️
backend/app/api/routes/evaluation.py	46.15%	7 Missing ⚠️
backend/app/crud/evaluations/embeddings.py	0.00%	1 Missing ⚠️
backend/app/crud/evaluations/processing.py	50.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

backend/app/models/evaluation.py (1)
148-158: Align EvaluationRun type hints with nullable DB columns for config fields

config_id and config_version are nullable in the schema but annotated as non-optional types. This can mislead callers and type checkers into assuming they’re always present, even for legacy runs or transitional data.

Consider updating the annotations to reflect nullability:
-    config_id: UUID = SQLField(
+    config_id: UUID | None = SQLField(
         foreign_key="config.id",
         nullable=True,
         description="Reference to the stored config used for this evaluation",
     )
-    config_version: int = SQLField(
+    config_version: int | None = SQLField(
         nullable=True,
         ge=1,
         description="Version of the config used for this evaluation",
     )
This keeps the schema the same while making runtime and type expectations clearer.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5f9d4d and eda7762.

📒 Files selected for processing (1)

backend/app/models/evaluation.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use type hints in Python code (Python 3.11+ project)

Files:

backend/app/models/evaluation.py

backend/app/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define SQLModel entities (database tables and domain objects) in backend/app/models/

Files:

backend/app/models/evaluation.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: checks (3.11.7, 6)

🔇 Additional comments (1)

backend/app/models/evaluation.py (1)

271-273: Public model nullability now matches the schema

Making config_id, config_version, and model nullable in EvaluationRunPublic correctly reflects the DB fields and avoids validation issues for existing rows. This resolves the earlier mismatch between the table and the public model.

backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py

backend/app/api/routes/evaluation.py

…ssistant_id handling

…g dict

…nstead of config dict

…ig_version fields

…ve_model_from_config function, and update processing logic to use config references

Prajna1999

lgtm

avirajsingh7 · 2025-12-18T04:02:59Z

hold merge- untill frontend is ready.

Prajna1999 · 2025-12-22T03:35:18Z

good to go. Can be merged

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/api/routes/evaluation.py (1)
8-16: Critical: Missing Depends import.

The pipeline failure indicates NameError: name 'Depends' is not defined at line 130. The import statement for Depends from fastapi is missing.
🐛 Add missing import
 from fastapi import (
     APIRouter,
     Body,
+    Depends,
     File,
     Form,
     HTTPException,
     Query,
     UploadFile,
 )

🤖 Fix all issues with AI agents

In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py:
- Line 20: The migration functions upgrade() and downgrade() lack return type
hints; update both function definitions (upgrade and downgrade) to include
explicit return types (e.g., change "def upgrade():" and "def downgrade():" to
"def upgrade() -> None:" and "def downgrade() -> None:") so they conform to the
project's typing guidelines.
- Around line 47-56: The downgrade currently re-adds the "config" column on the
"evaluation_run" table using op.add_column with sa.Column(..., nullable=False)
which will fail if rows exist; update that op.add_column call in the downgrade
to use nullable=True (or alternatively add a server_default or a prior data
migration to populate values before setting non-nullable), ensuring the column
is created nullable during downgrade to avoid PostgreSQL errors.

In @backend/app/api/routes/evaluation.py:
- Around line 505-509: The code references a non-existent constant
LLMProvider.OPENAI in the evaluation config validation, causing AttributeError;
update the check in evaluation.py (the block that raises HTTPException) to
compare against the actual provider string "openai" (i.e., use
config.completion.provider != "openai") or alternatively add a new constant
OPENAI = "openai" to the LLMProvider class in
backend/app/services/llm/providers/registry.py so the symbol exists and matches
tests; pick one approach and ensure the error message and tests remain
consistent with the chosen value.

In @backend/app/crud/evaluations/core.py:
- Around line 308-349: resolve_model_from_config currently declares returning
str but assigns model = config.completion.params.get("model") which may be None;
update resolve_model_from_config to validate that model is present and a str
(e.g., if not model: raise ValueError(...) with context including eval_run.id,
config_id, config_version) before returning, or coerce/choose a safe default
only if intended; reference the resolve_model_from_config function and the model
variable from config.completion.params.get("model") when implementing the check.

In @backend/app/crud/evaluations/processing.py:
- Around line 257-263: resolve_model_from_config currently uses
config.get("model") which can return None despite its str return annotation and
docstring promise; modify resolve_model_from_config to validate the retrieved
value and raise ValueError if missing (or alternatively change the function
signature to return str | None and update callers), e.g., after fetching model =
config.get("model") check if model is truthy and raise ValueError("missing model
in config") to enforce the contract so callers like
resolve_model_from_config(session=session, eval_run=eval_run) always receive a
str or an explicit None-aware type is used consistently.

🧹 Nitpick comments (1)

backend/app/alembic/versions/041_add_config_in_evals_run_table.py (1)

1-60: Consider a multi-step migration strategy for safer deployment.

Given the destructive nature of this schema change (dropping the config column) and the PR status ("hold merge - until frontend is ready"), consider deploying this as a multi-phase migration:

Phase 1: Add new columns without dropping old ones

Add config_id and config_version (nullable)

Add foreign key constraint

Deploy application code that writes to both old and new columns

Phase 2: Backfill existing data

Create a data migration script to populate config_id/config_version from existing config JSONB

Validate data integrity

Phase 3: Cut over

Deploy application code that only uses new columns

Monitor for issues

Phase 4: Cleanup

Drop the old config column in a subsequent migration

This approach provides:

Zero downtime deployment

Easy rollback at each phase

Data preservation and validation

Safer production deployment

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eda7762 and 31d9523.

📒 Files selected for processing (8)

backend/app/alembic/versions/041_add_config_in_evals_run_table.py
backend/app/api/routes/evaluation.py
backend/app/crud/evaluations/__init__.py
backend/app/crud/evaluations/core.py
backend/app/crud/evaluations/embeddings.py
backend/app/crud/evaluations/processing.py
backend/app/models/evaluation.py
backend/app/tests/api/routes/test_evaluation.py

🚧 Files skipped from review as they are similar to previous changes (2)

backend/app/crud/evaluations/embeddings.py
backend/app/models/evaluation.py

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets: logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase

Files:

backend/app/tests/api/routes/test_evaluation.py
backend/app/alembic/versions/041_add_config_in_evals_run_table.py
backend/app/api/routes/evaluation.py
backend/app/crud/evaluations/__init__.py
backend/app/crud/evaluations/processing.py
backend/app/crud/evaluations/core.py

backend/app/tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use factory pattern for test fixtures in backend/app/tests/

Files:

backend/app/tests/api/routes/test_evaluation.py

backend/app/alembic/versions/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Generate database migrations using alembic revision --autogenerate -m "Description" --rev-id <number> where rev-id is the latest existing revision ID + 1

Files:

backend/app/alembic/versions/041_add_config_in_evals_run_table.py

backend/app/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

backend/app/api/**/*.py: Define FastAPI REST endpoints in backend/app/api/ organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings using load_description("domain/action.md")

Files:

backend/app/api/routes/evaluation.py

🧠 Learnings (2)

📚 Learning: 2025-12-17T15:39:30.469Z

Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Applies to backend/app/alembic/versions/*.py : Generate database migrations using `alembic revision --autogenerate -m "Description" --rev-id <number>` where rev-id is the latest existing revision ID + 1

Applied to files:

backend/app/alembic/versions/041_add_config_in_evals_run_table.py

📚 Learning: 2025-12-17T15:39:30.469Z

Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Organize backend code in `backend/app/` following the layered architecture: Models, CRUD, Routes, Core, Services, and Celery directories

Applied to files:

backend/app/api/routes/evaluation.py

🧬 Code graph analysis (5)

backend/app/tests/api/routes/test_evaluation.py (3)

backend/app/crud/evaluations/batch.py (1)

build_evaluation_jsonl (62-115)

backend/app/models/evaluation.py (2)

EvaluationDataset (74-168)

EvaluationRun (171-322)

backend/app/tests/utils/test_data.py (1)

create_test_config (239-302)

backend/app/api/routes/evaluation.py (6)

backend/app/crud/config/version.py (1)

ConfigVersionCrud (15-142)

backend/app/models/llm/request.py (1)

LLMCallConfig (132-188)

backend/app/services/llm/jobs.py (1)

resolve_config_blob (84-116)

backend/app/services/llm/providers/registry.py (1)

LLMProvider (14-41)

backend/app/utils.py (4)

APIResponse (33-57)

get_langfuse_client (212-248)

get_openai_client (179-209)

load_description (393-398)

backend/app/crud/evaluations/core.py (1)

create_evaluation_run (18-71)

backend/app/crud/evaluations/__init__.py (1)

backend/app/crud/evaluations/core.py (1)

resolve_model_from_config (308-349)

backend/app/crud/evaluations/processing.py (2)

backend/app/crud/evaluations/core.py (2)

update_evaluation_run (154-206)

resolve_model_from_config (308-349)

backend/app/crud/evaluations/langfuse.py (1)

create_langfuse_dataset_run (21-164)

backend/app/crud/evaluations/core.py (3)

backend/app/crud/config/version.py (1)

ConfigVersionCrud (15-142)

backend/app/models/llm/request.py (1)

LLMCallConfig (132-188)

backend/app/services/llm/jobs.py (1)

resolve_config_blob (84-116)

🪛 GitHub Actions: Kaapi CI

backend/app/api/routes/evaluation.py

[error] 130-130: NameError: name 'Depends' is not defined.

🔇 Additional comments (4)

backend/app/crud/evaluations/__init__.py (1)

8-8: LGTM!

The new resolve_model_from_config function is correctly imported and exported for public use.

Also applies to: 43-43

backend/app/tests/api/routes/test_evaluation.py (1)

3-3: LGTM!

The test updates correctly reflect the shift from inline config dictionaries to stored config references. The use of create_test_config factory function aligns with the coding guidelines for test fixtures, and the error scenarios properly test config-not-found cases.

Also applies to: 10-10, 499-545, 728-803

backend/app/api/routes/evaluation.py (1)

492-509: Verify config resolution error handling covers all failure modes.

The config resolution logic handles errors from resolve_config_blob and validates the provider, but ensure that:

Config version not found scenarios are properly handled

Invalid/corrupted config blobs are caught

The provider validation matches actual config schemas used in production

backend/app/crud/evaluations/core.py (1)

66-69: LGTM!

The logging statement correctly follows the coding guideline format with function context and includes the new config_id and config_version fields.

coderabbitai · 2026-01-07T08:12:00Z

backend/app/alembic/versions/041_add_config_in_evals_run_table.py

+depends_on = None
+
+
+def upgrade():


⚠️ Potential issue | 🟡 Minor

Add return type hints to migration functions.

Both upgrade() and downgrade() functions are missing return type hints.

As per coding guidelines, all functions should have type hints.

📝 Proposed fix

-def upgrade(): +def upgrade() -> None:

-def downgrade(): +def downgrade() -> None:

Also applies to: 45-45

🤖 Prompt for AI Agents

In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py at line 20, The migration functions upgrade() and downgrade() lack return type hints; update both function definitions (upgrade and downgrade) to include explicit return types (e.g., change "def upgrade():" and "def downgrade():" to "def upgrade() -> None:" and "def downgrade() -> None:") so they conform to the project's typing guidelines.

coderabbitai · 2026-01-07T08:12:00Z

backend/app/alembic/versions/041_add_config_in_evals_run_table.py

+    op.add_column(
+        "evaluation_run",
+        sa.Column(
+            "config_id",
+            sa.Uuid(),
+            nullable=True,
+            comment="Reference to the stored config used",
+        ),
+    )
+    op.add_column(
+        "evaluation_run",
+        sa.Column(
+            "config_version",
+            sa.Integer(),
+            nullable=True,
+            comment="Version of the config used",
+        ),
+    )
+    op.create_foreign_key(None, "evaluation_run", "config", ["config_id"], ["id"])
+    op.drop_column("evaluation_run", "config")


⚠️ Potential issue | 🔴 Critical

Critical: Data loss and foreign key constraint naming issues.

This migration has several critical problems:

Data loss: Line 41 drops the config column without migrating existing data to the new config_id/config_version columns. Any existing evaluation runs will lose their configuration data permanently.

Foreign key constraint naming: Line 40 creates a foreign key with None as the constraint name, causing Alembic to auto-generate a name. However, the downgrade function (Line 57) also uses None to drop the constraint, which won't match the auto-generated name and will fail.

Required actions:

Add a data migration step before dropping the config column. You'll need to:

Parse each existing config JSONB object

Look up or create corresponding config records with appropriate versions

Update config_id and config_version for each evaluation_run

Or, if data migration isn't feasible, add a comment explaining why data loss is acceptable

Specify an explicit constraint name instead of None:

🔧 Proposed fix for FK constraint naming

- op.create_foreign_key(None, "evaluation_run", "config", ["config_id"], ["id"]) + op.create_foreign_key( + "fk_evaluation_run_config_id", + "evaluation_run", + "config", + ["config_id"], + ["id"] + )

And update the downgrade:

- op.drop_constraint(None, "evaluation_run", type_="foreignkey") + op.drop_constraint("fk_evaluation_run_config_id", "evaluation_run", type_="foreignkey")

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2026-01-07T08:12:00Z

backend/app/alembic/versions/041_add_config_in_evals_run_table.py

+    op.add_column(
+        "evaluation_run",
+        sa.Column(
+            "config",
+            postgresql.JSONB(astext_type=sa.Text()),
+            autoincrement=False,
+            nullable=False,
+            comment="Evaluation configuration (model, instructions, etc.)",
+        ),
+    )


⚠️ Potential issue | 🔴 Critical

Critical: Downgrade will fail with existing data.

The downgrade re-adds the config column with nullable=False (Line 53). If the evaluation_run table contains any records when downgrading, this operation will fail because PostgreSQL cannot add a non-nullable column to a table with existing rows without specifying a default value.

Either:

Make the column nullable during downgrade: nullable=True

Provide a server default value

Add a data migration to populate the column before setting it non-nullable

🔧 Proposed fix (Option 1: Make nullable)

op.add_column( "evaluation_run", sa.Column( "config", postgresql.JSONB(astext_type=sa.Text()), autoincrement=False, - nullable=False, + nullable=True, comment="Evaluation configuration (model, instructions, etc.)", ), )

🤖 Prompt for AI Agents

In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py around lines 47 - 56, The downgrade currently re-adds the "config" column on the "evaluation_run" table using op.add_column with sa.Column(..., nullable=False) which will fail if rows exist; update that op.add_column call in the downgrade to use nullable=True (or alternatively add a server_default or a prior data migration to populate values before setting non-nullable), ensuring the column is created nullable during downgrade to avoid PostgreSQL errors.

coderabbitai · 2026-01-07T08:12:00Z

backend/app/api/routes/evaluation.py

+    elif config.completion.provider != LLMProvider.OPENAI:
+        raise HTTPException(
+            status_code=422,
+            detail="Only 'openai' provider is supported for evaluation configs",
        )


⚠️ Potential issue | 🔴 Critical

Critical: Invalid constant reference - LLMProvider.OPENAI does not exist.

The code references LLMProvider.OPENAI but the LLMProvider registry (backend/app/services/llm/providers/registry.py) only defines OPENAI_NATIVE = "openai-native". However, the error message and test configs use "openai" as the provider string.

This mismatch will cause an AttributeError at runtime.

🔍 Verify the correct provider constant

#!/bin/bash # Check what constants are defined in LLMProvider ast-grep --pattern 'class LLMProvider: $$$ ' # Check what provider values are used in evaluation configs rg -n --type=py 'provider.*=.*["\']openai["\']' backend/app/

Based on the error message expecting "openai" and test data using provider="openai", you likely need either:

Add OPENAI = "openai" constant to LLMProvider, or

Change the validation logic to check the string directly: != "openai"

🤖 Prompt for AI Agents

In @backend/app/api/routes/evaluation.py around lines 505 - 509, The code references a non-existent constant LLMProvider.OPENAI in the evaluation config validation, causing AttributeError; update the check in evaluation.py (the block that raises HTTPException) to compare against the actual provider string "openai" (i.e., use config.completion.provider != "openai") or alternatively add a new constant OPENAI = "openai" to the LLMProvider class in backend/app/services/llm/providers/registry.py so the symbol exists and matches tests; pick one approach and ensure the error message and tests remain consistent with the chosen value.

coderabbitai · 2026-01-07T08:12:00Z

backend/app/crud/evaluations/core.py

+def resolve_model_from_config(
+    session: Session,
+    eval_run: EvaluationRun,
+) -> str:
+    """
+    Resolve the model name from the evaluation run's config.
+
+    Args:
+        session: Database session
+        eval_run: EvaluationRun instance
+
+    Returns:
+        Model name from config
+
+    Raises:
+        ValueError: If config is missing, invalid, or has no model
+    """
+    if not eval_run.config_id or not eval_run.config_version:
+        raise ValueError(
+            f"Evaluation run {eval_run.id} has no config reference "
+            f"(config_id={eval_run.config_id}, config_version={eval_run.config_version})"
+        )
+
+    config_version_crud = ConfigVersionCrud(
+        session=session,
+        config_id=eval_run.config_id,
+        project_id=eval_run.project_id,
+    )
+
+    config, error = resolve_config_blob(
+        config_crud=config_version_crud,
+        config=LLMCallConfig(id=eval_run.config_id, version=eval_run.config_version),
+    )
+
+    if error or config is None:
+        raise ValueError(
+            f"Config resolution failed for evaluation {eval_run.id} "
+            f"(config_id={eval_run.config_id}, version={eval_run.config_version}): {error}"
+        )
+
+    model = config.completion.params.get("model")
+    return model


⚠️ Potential issue | 🔴 Critical

Fix type mismatch: model extraction can return None.

The function's return type is str, but line 348 uses config.completion.params.get("model") which can return None if the "model" key is missing. This violates the type contract and could cause issues when the model is passed to downstream functions expecting a string.

✅ Validate that model exists

model = config.completion.params.get("model") + if not model: + raise ValueError( + f"Config for evaluation {eval_run.id} does not specify a model " + f"(config_id={eval_run.config_id}, version={eval_run.config_version})" + ) return model

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def resolve_model_from_config(

session: Session,

eval_run: EvaluationRun,

) -> str:

"""

Resolve the model name from the evaluation run's config.

Args:

session: Database session

eval_run: EvaluationRun instance

Returns:

Model name from config

Raises:

ValueError: If config is missing, invalid, or has no model

"""

if not eval_run.config_id or not eval_run.config_version:

raise ValueError(

f"Evaluation run {eval_run.id} has no config reference "

f"(config_id={eval_run.config_id}, config_version={eval_run.config_version})"

)

config_version_crud = ConfigVersionCrud(

session=session,

config_id=eval_run.config_id,

project_id=eval_run.project_id,

)

config, error = resolve_config_blob(

config_crud=config_version_crud,

config=LLMCallConfig(id=eval_run.config_id, version=eval_run.config_version),

)

if error or config is None:

raise ValueError(

f"Config resolution failed for evaluation {eval_run.id} "

f"(config_id={eval_run.config_id}, version={eval_run.config_version}): {error}"

)

model = config.completion.params.get("model")

return model

def resolve_model_from_config(

session: Session,

eval_run: EvaluationRun,

) -> str:

"""

Resolve the model name from the evaluation run's config.

Args:

session: Database session

eval_run: EvaluationRun instance

Returns:

Model name from config

Raises:

ValueError: If config is missing, invalid, or has no model

"""

if not eval_run.config_id or not eval_run.config_version:

raise ValueError(

f"Evaluation run {eval_run.id} has no config reference "

f"(config_id={eval_run.config_id}, config_version={eval_run.config_version})"

)

config_version_crud = ConfigVersionCrud(

session=session,

config_id=eval_run.config_id,

project_id=eval_run.project_id,

)

config, error = resolve_config_blob(

config_crud=config_version_crud,

config=LLMCallConfig(id=eval_run.config_id, version=eval_run.config_version),

)

if error or config is None:

raise ValueError(

f"Config resolution failed for evaluation {eval_run.id} "

f"(config_id={eval_run.config_id}, version={eval_run.config_version}): {error}"

)

model = config.completion.params.get("model")

if not model:

raise ValueError(

f"Config for evaluation {eval_run.id} does not specify a model "

f"(config_id={eval_run.config_id}, version={eval_run.config_version})"

)

return model

🤖 Prompt for AI Agents

In @backend/app/crud/evaluations/core.py around lines 308 - 349, resolve_model_from_config currently declares returning str but assigns model = config.completion.params.get("model") which may be None; update resolve_model_from_config to validate that model is present and a str (e.g., if not model: raise ValueError(...) with context including eval_run.id, config_id, config_version) before returning, or coerce/choose a safe default only if intended; reference the resolve_model_from_config function and the model variable from config.completion.params.get("model") when implementing the check.

coderabbitai · 2026-01-07T08:12:00Z

backend/app/crud/evaluations/processing.py

+        # Use model stored at creation time for cost tracking
+        model = resolve_model_from_config(session=session, eval_run=eval_run)
+
        trace_id_mapping = create_langfuse_dataset_run(
            langfuse=langfuse,
            dataset_name=eval_run.dataset_name,
+            model=model,


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate the resolve_model_from_config function rg -n "def resolve_model_from_config" --type=py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 146

🏁 Script executed:

# Check the imports and context around line 257-263 in processing.py head -n 35 backend/app/crud/evaluations/processing.py | tail -n 10

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 446

🏁 Script executed:

# Look at the core.py file around line 348 sed -n '340,360p' backend/app/crud/evaluations/core.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 379

🏁 Script executed:

# Get the full resolve_model_from_config function with type hints sed -n '308,330p' backend/app/crud/evaluations/core.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 712

🏁 Script executed:

# Check the create_langfuse_dataset_run function signature rg -n "def create_langfuse_dataset_run" -A 5 --type=py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 502

🏁 Script executed:

# Check if there's any validation before resolve_model_from_config is called sed -n '250,270p' backend/app/crud/evaluations/processing.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 816

🏁 Script executed:

# Get the complete resolve_model_from_config function sed -n '308,365p' backend/app/crud/evaluations/core.py

Repository: ProjectTech4DevAI/kaapi-backend

Length of output: 1340

Add validation to ensure model key exists in config params.

The resolve_model_from_config function's return type is annotated as str, and its docstring promises to raise ValueError if the config has no model. However, the implementation at line 350 uses .get("model") without validation, allowing it to return None and violating the type contract.

Add a validation check after retrieving the model to ensure the key exists, or update the return type annotation to str | None if None is acceptable. This fixes the contract mismatch between the type hint and actual implementation.

🤖 Prompt for AI Agents

In @backend/app/crud/evaluations/processing.py around lines 257 - 263, resolve_model_from_config currently uses config.get("model") which can return None despite its str return annotation and docstring promise; modify resolve_model_from_config to validate the retrieved value and raise ValueError if missing (or alternatively change the function signature to return str | None and update callers), e.g., after fetching model = config.get("model") check if model is truthy and raise ValueError("missing model in config") to enforce the contract so callers like resolve_model_from_config(session=session, eval_run=eval_run) always receive a str or an explicit None-aware type is used consistently.

avirajsingh7 linked an issue Dec 10, 2025 that may be closed by this pull request

Add config management in Evals #440

Open

avirajsingh7 self-assigned this Dec 10, 2025

avirajsingh7 added enhancement New feature or request ready-for-review labels Dec 10, 2025

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

backend/app/alembic/versions/040_add_config_in_evals_run_table.py Outdated Show resolved Hide resolved

backend/app/alembic/versions/040_add_config_in_evals_run_table.py Show resolved Hide resolved

backend/app/models/evaluation.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

avirajsingh7 requested review from AkhileshNegi and Prajna1999 December 10, 2025 05:06

AkhileshNegi requested changes Dec 10, 2025

View reviewed changes

backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py Outdated Show resolved Hide resolved

backend/app/api/routes/evaluation.py Outdated Show resolved Hide resolved

backend/app/api/routes/evaluation.py Outdated Show resolved Hide resolved

Prajna1999 reviewed Dec 12, 2025

View reviewed changes

backend/app/api/routes/evaluation.py Outdated Show resolved Hide resolved

avirajsingh7 force-pushed the evals/config_addition branch from f5b94b0 to cdb0b2e Compare December 15, 2025 07:13

avirajsingh7 added 9 commits December 15, 2025 14:01

Refactor evaluation endpoint to use stored configuration and remove a…

3f8ddcf

…ssistant_id handling

Refactor evaluation run to use config ID and version instead of confi…

5280622

…g dict

Add config_id, config_version, and model fields to evaluation run table

13eb778

Refactor batch evaluation tests to use config_id and config_version i…

7bdd322

…nstead of config dict

Update EvaluationRunPublic model to allow nullable config_id and conf…

8f9561c

…ig_version fields

Refactor evaluation run model handling: remove model field, add resol…

f612da4

…ve_model_from_config function, and update processing logic to use config references

fix migration number

4f89f43

fix test

82bee43

fix status code

a2c8a95

avirajsingh7 force-pushed the evals/config_addition branch from c9cc51a to a2c8a95 Compare December 15, 2025 08:34

remove old mirgation

b9fd664

avirajsingh7 requested review from AkhileshNegi and Prajna1999 December 16, 2025 04:46

Prajna1999 approved these changes Dec 16, 2025

View reviewed changes

AkhileshNegi approved these changes Dec 16, 2025

View reviewed changes

Merge branch 'main' into evals/config_addition

31d9523

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

Evaluation to Use Config Management #477

Are you sure you want to change the base?

Evaluation to Use Config Management #477

Uh oh!

Conversation

avirajsingh7 commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Prajna1999 left a comment

Choose a reason for hiding this comment

Uh oh!

avirajsingh7 commented Dec 18, 2025

Uh oh!

Prajna1999 commented Dec 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

avirajsingh7 commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

codecov bot commented Dec 10, 2025 •

edited

Loading