-
Notifications
You must be signed in to change notification settings - Fork 7
Evaluation to Use Config Management #477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThe PR refactors evaluation configuration handling to use stored configuration references instead of inline config objects. The evaluate endpoint now accepts Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (4)
backend/app/crud/evaluations/embeddings.py (1)
366-367: Misleading comment - update to reflect actual behavior.The comment says "Get embedding model from config" but the code hardcodes the value. Update the comment to accurately describe the implementation.
- # Get embedding model from config (default: text-embedding-3-large) - embedding_model = "text-embedding-3-large" + # Use fixed embedding model (text-embedding-3-large) + embedding_model = "text-embedding-3-large"backend/app/tests/api/routes/test_evaluation.py (1)
524-545: Consider renaming function to match its new purpose.The function
test_start_batch_evaluation_missing_modelwas repurposed to test invalidconfig_idscenarios. The docstring was updated but the function name still references "missing_model". Consider renaming for clarity.- def test_start_batch_evaluation_missing_model(self, client, user_api_key_header): - """Test batch evaluation fails with invalid config_id.""" + def test_start_batch_evaluation_invalid_config_id(self, client, user_api_key_header): + """Test batch evaluation fails with invalid config_id."""backend/app/api/routes/evaluation.py (1)
499-510: Consider validating thatmodelis present in config params.The model is extracted with
.get("model")which returnsNoneif not present. Sincemodelis critical for cost tracking (used increate_langfuse_dataset_run), consider validating its presence and returning an error if missing.# Extract model from config for storage model = config.completion.params.get("model") + if not model: + raise HTTPException( + status_code=400, + detail="Config must specify a 'model' in completion params for evaluation", + ) # Create EvaluationRun record with config referencesbackend/app/crud/evaluations/core.py (1)
15-69: Config-basedcreate_evaluation_runrefactor is correctly implemented; consider loggingmodelfor improved traceability.The refactor from inline config dict to
config_id: UUIDandconfig_version: intis properly implemented throughout:
- The sole call site in
backend/app/api/routes/evaluation.py:503correctly passes all new parameters with the right types (config_idas UUID,config_versionas int,modelextracted from config).- The
EvaluationRunmodel inbackend/app/models/evaluation.pycorrectly defines all three fields with appropriate types and descriptions.- All type hints align with Python 3.11+ guidelines.
One suggested improvement for debugging:
Include
modelin the creation log for better traceability when correlating evaluation runs with model versions:logger.info( f"Created EvaluationRun record: id={eval_run.id}, run_name={run_name}, " - f"config_id={config_id}, config_version={config_version}" + f"config_id={config_id}, config_version={config_version}, model={model}" )Since the model is already extracted at the call site and passed to the function, including it in the log will provide fuller context for operational debugging without any additional cost.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py(1 hunks)backend/app/api/routes/evaluation.py(5 hunks)backend/app/crud/evaluations/core.py(5 hunks)backend/app/crud/evaluations/embeddings.py(1 hunks)backend/app/crud/evaluations/processing.py(1 hunks)backend/app/models/evaluation.py(3 hunks)backend/app/tests/api/routes/test_evaluation.py(5 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/api/routes/evaluation.pybackend/app/models/evaluation.pybackend/app/crud/evaluations/embeddings.pybackend/app/tests/api/routes/test_evaluation.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.pybackend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Expose FastAPI REST endpoints under backend/app/api/ organized by domain
Files:
backend/app/api/routes/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.py
backend/app/crud/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Implement database access operations in backend/app/crud/
Files:
backend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.py
🧬 Code graph analysis (2)
backend/app/tests/api/routes/test_evaluation.py (2)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-115)backend/app/models/evaluation.py (2)
EvaluationDataset(74-130)EvaluationRun(133-248)
backend/app/crud/evaluations/processing.py (1)
backend/app/crud/evaluations/langfuse.py (1)
create_langfuse_dataset_run(20-163)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (3)
backend/app/crud/evaluations/processing.py (1)
257-264: LGTM! Clean refactor to use stored model field.The change correctly retrieves the model from
eval_run.modelinstead of extracting it from config. This aligns with the new data model where the model is snapshotted at evaluation creation time.backend/app/models/evaluation.py (1)
148-157: LGTM! Well-structured config reference fields.The new
config_idandconfig_versionfields properly establish the relationship to stored configs with appropriate constraints (ge=1for version). The nullable design allows backward compatibility with existing data.backend/app/api/routes/evaluation.py (1)
478-495: LGTM! Robust config resolution with provider validation.The config resolution flow properly validates that the stored config exists and uses the OPENAI provider. Error handling returns appropriate HTTP 400 responses with descriptive messages.
backend/app/alembic/versions/040_add_config_in_evals_run_table.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
backend/app/models/evaluation.py (1)
148-158: AlignEvaluationRuntype hints with nullable DB columns for config fields
config_idandconfig_versionare nullable in the schema but annotated as non-optional types. This can mislead callers and type checkers into assuming they’re always present, even for legacy runs or transitional data.Consider updating the annotations to reflect nullability:
- config_id: UUID = SQLField( + config_id: UUID | None = SQLField( foreign_key="config.id", nullable=True, description="Reference to the stored config used for this evaluation", ) - config_version: int = SQLField( + config_version: int | None = SQLField( nullable=True, ge=1, description="Version of the config used for this evaluation", )This keeps the schema the same while making runtime and type expectations clearer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
backend/app/models/evaluation.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use type hints in Python code (Python 3.11+ project)
Files:
backend/app/models/evaluation.py
backend/app/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define SQLModel entities (database tables and domain objects) in backend/app/models/
Files:
backend/app/models/evaluation.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: checks (3.11.7, 6)
🔇 Additional comments (1)
backend/app/models/evaluation.py (1)
271-273: Public model nullability now matches the schemaMaking
config_id,config_version, andmodelnullable inEvaluationRunPubliccorrectly reflects the DB fields and avoids validation issues for existing rows. This resolves the earlier mismatch between the table and the public model.
backend/app/alembic/versions/7b48f23ebfdd_add_config_id_and_version_in_evals_run_.py
Outdated
Show resolved
Hide resolved
f5b94b0 to
cdb0b2e
Compare
…ssistant_id handling
…nstead of config dict
…ig_version fields
…ve_model_from_config function, and update processing logic to use config references
c9cc51a to
a2c8a95
Compare
Prajna1999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
|
hold merge- untill frontend is ready. |
|
good to go. Can be merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/api/routes/evaluation.py (1)
8-16: Critical: MissingDependsimport.The pipeline failure indicates
NameError: name 'Depends' is not definedat line 130. The import statement forDependsfromfastapiis missing.🐛 Add missing import
from fastapi import ( APIRouter, Body, + Depends, File, Form, HTTPException, Query, UploadFile, )
🤖 Fix all issues with AI agents
In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py:
- Line 20: The migration functions upgrade() and downgrade() lack return type
hints; update both function definitions (upgrade and downgrade) to include
explicit return types (e.g., change "def upgrade():" and "def downgrade():" to
"def upgrade() -> None:" and "def downgrade() -> None:") so they conform to the
project's typing guidelines.
- Around line 47-56: The downgrade currently re-adds the "config" column on the
"evaluation_run" table using op.add_column with sa.Column(..., nullable=False)
which will fail if rows exist; update that op.add_column call in the downgrade
to use nullable=True (or alternatively add a server_default or a prior data
migration to populate values before setting non-nullable), ensuring the column
is created nullable during downgrade to avoid PostgreSQL errors.
In @backend/app/api/routes/evaluation.py:
- Around line 505-509: The code references a non-existent constant
LLMProvider.OPENAI in the evaluation config validation, causing AttributeError;
update the check in evaluation.py (the block that raises HTTPException) to
compare against the actual provider string "openai" (i.e., use
config.completion.provider != "openai") or alternatively add a new constant
OPENAI = "openai" to the LLMProvider class in
backend/app/services/llm/providers/registry.py so the symbol exists and matches
tests; pick one approach and ensure the error message and tests remain
consistent with the chosen value.
In @backend/app/crud/evaluations/core.py:
- Around line 308-349: resolve_model_from_config currently declares returning
str but assigns model = config.completion.params.get("model") which may be None;
update resolve_model_from_config to validate that model is present and a str
(e.g., if not model: raise ValueError(...) with context including eval_run.id,
config_id, config_version) before returning, or coerce/choose a safe default
only if intended; reference the resolve_model_from_config function and the model
variable from config.completion.params.get("model") when implementing the check.
In @backend/app/crud/evaluations/processing.py:
- Around line 257-263: resolve_model_from_config currently uses
config.get("model") which can return None despite its str return annotation and
docstring promise; modify resolve_model_from_config to validate the retrieved
value and raise ValueError if missing (or alternatively change the function
signature to return str | None and update callers), e.g., after fetching model =
config.get("model") check if model is truthy and raise ValueError("missing model
in config") to enforce the contract so callers like
resolve_model_from_config(session=session, eval_run=eval_run) always receive a
str or an explicit None-aware type is used consistently.
🧹 Nitpick comments (1)
backend/app/alembic/versions/041_add_config_in_evals_run_table.py (1)
1-60: Consider a multi-step migration strategy for safer deployment.Given the destructive nature of this schema change (dropping the
configcolumn) and the PR status ("hold merge - until frontend is ready"), consider deploying this as a multi-phase migration:Phase 1: Add new columns without dropping old ones
- Add
config_idandconfig_version(nullable)- Add foreign key constraint
- Deploy application code that writes to both old and new columns
Phase 2: Backfill existing data
- Create a data migration script to populate
config_id/config_versionfrom existingconfigJSONB- Validate data integrity
Phase 3: Cut over
- Deploy application code that only uses new columns
- Monitor for issues
Phase 4: Cleanup
- Drop the old
configcolumn in a subsequent migrationThis approach provides:
- Zero downtime deployment
- Easy rollback at each phase
- Data preservation and validation
- Safer production deployment
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
backend/app/alembic/versions/041_add_config_in_evals_run_table.pybackend/app/api/routes/evaluation.pybackend/app/crud/evaluations/__init__.pybackend/app/crud/evaluations/core.pybackend/app/crud/evaluations/embeddings.pybackend/app/crud/evaluations/processing.pybackend/app/models/evaluation.pybackend/app/tests/api/routes/test_evaluation.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/app/crud/evaluations/embeddings.py
- backend/app/models/evaluation.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always add type hints to all function parameters and return values in Python code
Prefix all log messages with the function name in square brackets:logger.info(f"[function_name] Message {mask_string(sensitive_value)}")
Use Python 3.11+ with type hints throughout the codebase
Files:
backend/app/tests/api/routes/test_evaluation.pybackend/app/alembic/versions/041_add_config_in_evals_run_table.pybackend/app/api/routes/evaluation.pybackend/app/crud/evaluations/__init__.pybackend/app/crud/evaluations/processing.pybackend/app/crud/evaluations/core.py
backend/app/tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use factory pattern for test fixtures in
backend/app/tests/
Files:
backend/app/tests/api/routes/test_evaluation.py
backend/app/alembic/versions/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Generate database migrations using
alembic revision --autogenerate -m "Description" --rev-id <number>where rev-id is the latest existing revision ID + 1
Files:
backend/app/alembic/versions/041_add_config_in_evals_run_table.py
backend/app/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
backend/app/api/**/*.py: Define FastAPI REST endpoints inbackend/app/api/organized by domain
Load Swagger endpoint descriptions from external markdown files instead of inline strings usingload_description("domain/action.md")
Files:
backend/app/api/routes/evaluation.py
🧠 Learnings (2)
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Applies to backend/app/alembic/versions/*.py : Generate database migrations using `alembic revision --autogenerate -m "Description" --rev-id <number>` where rev-id is the latest existing revision ID + 1
Applied to files:
backend/app/alembic/versions/041_add_config_in_evals_run_table.py
📚 Learning: 2025-12-17T15:39:30.469Z
Learnt from: CR
Repo: ProjectTech4DevAI/kaapi-backend PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-17T15:39:30.469Z
Learning: Organize backend code in `backend/app/` following the layered architecture: Models, CRUD, Routes, Core, Services, and Celery directories
Applied to files:
backend/app/api/routes/evaluation.py
🧬 Code graph analysis (5)
backend/app/tests/api/routes/test_evaluation.py (3)
backend/app/crud/evaluations/batch.py (1)
build_evaluation_jsonl(62-115)backend/app/models/evaluation.py (2)
EvaluationDataset(74-168)EvaluationRun(171-322)backend/app/tests/utils/test_data.py (1)
create_test_config(239-302)
backend/app/api/routes/evaluation.py (6)
backend/app/crud/config/version.py (1)
ConfigVersionCrud(15-142)backend/app/models/llm/request.py (1)
LLMCallConfig(132-188)backend/app/services/llm/jobs.py (1)
resolve_config_blob(84-116)backend/app/services/llm/providers/registry.py (1)
LLMProvider(14-41)backend/app/utils.py (4)
APIResponse(33-57)get_langfuse_client(212-248)get_openai_client(179-209)load_description(393-398)backend/app/crud/evaluations/core.py (1)
create_evaluation_run(18-71)
backend/app/crud/evaluations/__init__.py (1)
backend/app/crud/evaluations/core.py (1)
resolve_model_from_config(308-349)
backend/app/crud/evaluations/processing.py (2)
backend/app/crud/evaluations/core.py (2)
update_evaluation_run(154-206)resolve_model_from_config(308-349)backend/app/crud/evaluations/langfuse.py (1)
create_langfuse_dataset_run(21-164)
backend/app/crud/evaluations/core.py (3)
backend/app/crud/config/version.py (1)
ConfigVersionCrud(15-142)backend/app/models/llm/request.py (1)
LLMCallConfig(132-188)backend/app/services/llm/jobs.py (1)
resolve_config_blob(84-116)
🪛 GitHub Actions: Kaapi CI
backend/app/api/routes/evaluation.py
[error] 130-130: NameError: name 'Depends' is not defined.
🔇 Additional comments (4)
backend/app/crud/evaluations/__init__.py (1)
8-8: LGTM!The new
resolve_model_from_configfunction is correctly imported and exported for public use.Also applies to: 43-43
backend/app/tests/api/routes/test_evaluation.py (1)
3-3: LGTM!The test updates correctly reflect the shift from inline config dictionaries to stored config references. The use of
create_test_configfactory function aligns with the coding guidelines for test fixtures, and the error scenarios properly test config-not-found cases.Also applies to: 10-10, 499-545, 728-803
backend/app/api/routes/evaluation.py (1)
492-509: Verify config resolution error handling covers all failure modes.The config resolution logic handles errors from
resolve_config_bloband validates the provider, but ensure that:
- Config version not found scenarios are properly handled
- Invalid/corrupted config blobs are caught
- The provider validation matches actual config schemas used in production
backend/app/crud/evaluations/core.py (1)
66-69: LGTM!The logging statement correctly follows the coding guideline format with function context and includes the new config_id and config_version fields.
| depends_on = None | ||
|
|
||
|
|
||
| def upgrade(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add return type hints to migration functions.
Both upgrade() and downgrade() functions are missing return type hints.
As per coding guidelines, all functions should have type hints.
📝 Proposed fix
-def upgrade():
+def upgrade() -> None:-def downgrade():
+def downgrade() -> None:Also applies to: 45-45
🤖 Prompt for AI Agents
In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py at line
20, The migration functions upgrade() and downgrade() lack return type hints;
update both function definitions (upgrade and downgrade) to include explicit
return types (e.g., change "def upgrade():" and "def downgrade():" to "def
upgrade() -> None:" and "def downgrade() -> None:") so they conform to the
project's typing guidelines.
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "config_id", | ||
| sa.Uuid(), | ||
| nullable=True, | ||
| comment="Reference to the stored config used", | ||
| ), | ||
| ) | ||
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "config_version", | ||
| sa.Integer(), | ||
| nullable=True, | ||
| comment="Version of the config used", | ||
| ), | ||
| ) | ||
| op.create_foreign_key(None, "evaluation_run", "config", ["config_id"], ["id"]) | ||
| op.drop_column("evaluation_run", "config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Data loss and foreign key constraint naming issues.
This migration has several critical problems:
-
Data loss: Line 41 drops the
configcolumn without migrating existing data to the newconfig_id/config_versioncolumns. Any existing evaluation runs will lose their configuration data permanently. -
Foreign key constraint naming: Line 40 creates a foreign key with
Noneas the constraint name, causing Alembic to auto-generate a name. However, the downgrade function (Line 57) also usesNoneto drop the constraint, which won't match the auto-generated name and will fail.
Required actions:
-
Add a data migration step before dropping the
configcolumn. You'll need to:- Parse each existing
configJSONB object - Look up or create corresponding
configrecords with appropriate versions - Update
config_idandconfig_versionfor each evaluation_run - Or, if data migration isn't feasible, add a comment explaining why data loss is acceptable
- Parse each existing
-
Specify an explicit constraint name instead of
None:
🔧 Proposed fix for FK constraint naming
- op.create_foreign_key(None, "evaluation_run", "config", ["config_id"], ["id"])
+ op.create_foreign_key(
+ "fk_evaluation_run_config_id",
+ "evaluation_run",
+ "config",
+ ["config_id"],
+ ["id"]
+ )And update the downgrade:
- op.drop_constraint(None, "evaluation_run", type_="foreignkey")
+ op.drop_constraint("fk_evaluation_run_config_id", "evaluation_run", type_="foreignkey")Committable suggestion skipped: line range outside the PR's diff.
| op.add_column( | ||
| "evaluation_run", | ||
| sa.Column( | ||
| "config", | ||
| postgresql.JSONB(astext_type=sa.Text()), | ||
| autoincrement=False, | ||
| nullable=False, | ||
| comment="Evaluation configuration (model, instructions, etc.)", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Downgrade will fail with existing data.
The downgrade re-adds the config column with nullable=False (Line 53). If the evaluation_run table contains any records when downgrading, this operation will fail because PostgreSQL cannot add a non-nullable column to a table with existing rows without specifying a default value.
Either:
- Make the column nullable during downgrade:
nullable=True - Provide a server default value
- Add a data migration to populate the column before setting it non-nullable
🔧 Proposed fix (Option 1: Make nullable)
op.add_column(
"evaluation_run",
sa.Column(
"config",
postgresql.JSONB(astext_type=sa.Text()),
autoincrement=False,
- nullable=False,
+ nullable=True,
comment="Evaluation configuration (model, instructions, etc.)",
),
)🤖 Prompt for AI Agents
In @backend/app/alembic/versions/041_add_config_in_evals_run_table.py around
lines 47 - 56, The downgrade currently re-adds the "config" column on the
"evaluation_run" table using op.add_column with sa.Column(..., nullable=False)
which will fail if rows exist; update that op.add_column call in the downgrade
to use nullable=True (or alternatively add a server_default or a prior data
migration to populate values before setting non-nullable), ensuring the column
is created nullable during downgrade to avoid PostgreSQL errors.
| elif config.completion.provider != LLMProvider.OPENAI: | ||
| raise HTTPException( | ||
| status_code=422, | ||
| detail="Only 'openai' provider is supported for evaluation configs", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Invalid constant reference - LLMProvider.OPENAI does not exist.
The code references LLMProvider.OPENAI but the LLMProvider registry (backend/app/services/llm/providers/registry.py) only defines OPENAI_NATIVE = "openai-native". However, the error message and test configs use "openai" as the provider string.
This mismatch will cause an AttributeError at runtime.
🔍 Verify the correct provider constant
#!/bin/bash
# Check what constants are defined in LLMProvider
ast-grep --pattern 'class LLMProvider:
$$$
'
# Check what provider values are used in evaluation configs
rg -n --type=py 'provider.*=.*["\']openai["\']' backend/app/Based on the error message expecting "openai" and test data using provider="openai", you likely need either:
- Add
OPENAI = "openai"constant to LLMProvider, or - Change the validation logic to check the string directly:
!= "openai"
🤖 Prompt for AI Agents
In @backend/app/api/routes/evaluation.py around lines 505 - 509, The code
references a non-existent constant LLMProvider.OPENAI in the evaluation config
validation, causing AttributeError; update the check in evaluation.py (the block
that raises HTTPException) to compare against the actual provider string
"openai" (i.e., use config.completion.provider != "openai") or alternatively add
a new constant OPENAI = "openai" to the LLMProvider class in
backend/app/services/llm/providers/registry.py so the symbol exists and matches
tests; pick one approach and ensure the error message and tests remain
consistent with the chosen value.
| def resolve_model_from_config( | ||
| session: Session, | ||
| eval_run: EvaluationRun, | ||
| ) -> str: | ||
| """ | ||
| Resolve the model name from the evaluation run's config. | ||
| Args: | ||
| session: Database session | ||
| eval_run: EvaluationRun instance | ||
| Returns: | ||
| Model name from config | ||
| Raises: | ||
| ValueError: If config is missing, invalid, or has no model | ||
| """ | ||
| if not eval_run.config_id or not eval_run.config_version: | ||
| raise ValueError( | ||
| f"Evaluation run {eval_run.id} has no config reference " | ||
| f"(config_id={eval_run.config_id}, config_version={eval_run.config_version})" | ||
| ) | ||
|
|
||
| config_version_crud = ConfigVersionCrud( | ||
| session=session, | ||
| config_id=eval_run.config_id, | ||
| project_id=eval_run.project_id, | ||
| ) | ||
|
|
||
| config, error = resolve_config_blob( | ||
| config_crud=config_version_crud, | ||
| config=LLMCallConfig(id=eval_run.config_id, version=eval_run.config_version), | ||
| ) | ||
|
|
||
| if error or config is None: | ||
| raise ValueError( | ||
| f"Config resolution failed for evaluation {eval_run.id} " | ||
| f"(config_id={eval_run.config_id}, version={eval_run.config_version}): {error}" | ||
| ) | ||
|
|
||
| model = config.completion.params.get("model") | ||
| return model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix type mismatch: model extraction can return None.
The function's return type is str, but line 348 uses config.completion.params.get("model") which can return None if the "model" key is missing. This violates the type contract and could cause issues when the model is passed to downstream functions expecting a string.
✅ Validate that model exists
model = config.completion.params.get("model")
+ if not model:
+ raise ValueError(
+ f"Config for evaluation {eval_run.id} does not specify a model "
+ f"(config_id={eval_run.config_id}, version={eval_run.config_version})"
+ )
return model📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def resolve_model_from_config( | |
| session: Session, | |
| eval_run: EvaluationRun, | |
| ) -> str: | |
| """ | |
| Resolve the model name from the evaluation run's config. | |
| Args: | |
| session: Database session | |
| eval_run: EvaluationRun instance | |
| Returns: | |
| Model name from config | |
| Raises: | |
| ValueError: If config is missing, invalid, or has no model | |
| """ | |
| if not eval_run.config_id or not eval_run.config_version: | |
| raise ValueError( | |
| f"Evaluation run {eval_run.id} has no config reference " | |
| f"(config_id={eval_run.config_id}, config_version={eval_run.config_version})" | |
| ) | |
| config_version_crud = ConfigVersionCrud( | |
| session=session, | |
| config_id=eval_run.config_id, | |
| project_id=eval_run.project_id, | |
| ) | |
| config, error = resolve_config_blob( | |
| config_crud=config_version_crud, | |
| config=LLMCallConfig(id=eval_run.config_id, version=eval_run.config_version), | |
| ) | |
| if error or config is None: | |
| raise ValueError( | |
| f"Config resolution failed for evaluation {eval_run.id} " | |
| f"(config_id={eval_run.config_id}, version={eval_run.config_version}): {error}" | |
| ) | |
| model = config.completion.params.get("model") | |
| return model | |
| def resolve_model_from_config( | |
| session: Session, | |
| eval_run: EvaluationRun, | |
| ) -> str: | |
| """ | |
| Resolve the model name from the evaluation run's config. | |
| Args: | |
| session: Database session | |
| eval_run: EvaluationRun instance | |
| Returns: | |
| Model name from config | |
| Raises: | |
| ValueError: If config is missing, invalid, or has no model | |
| """ | |
| if not eval_run.config_id or not eval_run.config_version: | |
| raise ValueError( | |
| f"Evaluation run {eval_run.id} has no config reference " | |
| f"(config_id={eval_run.config_id}, config_version={eval_run.config_version})" | |
| ) | |
| config_version_crud = ConfigVersionCrud( | |
| session=session, | |
| config_id=eval_run.config_id, | |
| project_id=eval_run.project_id, | |
| ) | |
| config, error = resolve_config_blob( | |
| config_crud=config_version_crud, | |
| config=LLMCallConfig(id=eval_run.config_id, version=eval_run.config_version), | |
| ) | |
| if error or config is None: | |
| raise ValueError( | |
| f"Config resolution failed for evaluation {eval_run.id} " | |
| f"(config_id={eval_run.config_id}, version={eval_run.config_version}): {error}" | |
| ) | |
| model = config.completion.params.get("model") | |
| if not model: | |
| raise ValueError( | |
| f"Config for evaluation {eval_run.id} does not specify a model " | |
| f"(config_id={eval_run.config_id}, version={eval_run.config_version})" | |
| ) | |
| return model |
🤖 Prompt for AI Agents
In @backend/app/crud/evaluations/core.py around lines 308 - 349,
resolve_model_from_config currently declares returning str but assigns model =
config.completion.params.get("model") which may be None; update
resolve_model_from_config to validate that model is present and a str (e.g., if
not model: raise ValueError(...) with context including eval_run.id, config_id,
config_version) before returning, or coerce/choose a safe default only if
intended; reference the resolve_model_from_config function and the model
variable from config.completion.params.get("model") when implementing the check.
| # Use model stored at creation time for cost tracking | ||
| model = resolve_model_from_config(session=session, eval_run=eval_run) | ||
|
|
||
| trace_id_mapping = create_langfuse_dataset_run( | ||
| langfuse=langfuse, | ||
| dataset_name=eval_run.dataset_name, | ||
| model=model, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate the resolve_model_from_config function
rg -n "def resolve_model_from_config" --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 146
🏁 Script executed:
# Check the imports and context around line 257-263 in processing.py
head -n 35 backend/app/crud/evaluations/processing.py | tail -n 10Repository: ProjectTech4DevAI/kaapi-backend
Length of output: 446
🏁 Script executed:
# Look at the core.py file around line 348
sed -n '340,360p' backend/app/crud/evaluations/core.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 379
🏁 Script executed:
# Get the full resolve_model_from_config function with type hints
sed -n '308,330p' backend/app/crud/evaluations/core.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 712
🏁 Script executed:
# Check the create_langfuse_dataset_run function signature
rg -n "def create_langfuse_dataset_run" -A 5 --type=pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 502
🏁 Script executed:
# Check if there's any validation before resolve_model_from_config is called
sed -n '250,270p' backend/app/crud/evaluations/processing.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 816
🏁 Script executed:
# Get the complete resolve_model_from_config function
sed -n '308,365p' backend/app/crud/evaluations/core.pyRepository: ProjectTech4DevAI/kaapi-backend
Length of output: 1340
Add validation to ensure model key exists in config params.
The resolve_model_from_config function's return type is annotated as str, and its docstring promises to raise ValueError if the config has no model. However, the implementation at line 350 uses .get("model") without validation, allowing it to return None and violating the type contract.
Add a validation check after retrieving the model to ensure the key exists, or update the return type annotation to str | None if None is acceptable. This fixes the contract mismatch between the type hint and actual implementation.
🤖 Prompt for AI Agents
In @backend/app/crud/evaluations/processing.py around lines 257 - 263,
resolve_model_from_config currently uses config.get("model") which can return
None despite its str return annotation and docstring promise; modify
resolve_model_from_config to validate the retrieved value and raise ValueError
if missing (or alternatively change the function signature to return str | None
and update callers), e.g., after fetching model = config.get("model") check if
model is truthy and raise ValueError("missing model in config") to enforce the
contract so callers like resolve_model_from_config(session=session,
eval_run=eval_run) always receive a str or an explicit None-aware type is used
consistently.
Summary
This change refactors the evaluation run process to utilize a stored configuration instead of a configuration dictionary. It introduces fields for
config_id,config_version, andmodelin the evaluation run table, streamlining the evaluation process and improving data integrity.Checklist
Before submitting a pull request, please ensure that you mark these tasks.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.