Skip to content

Customer support agent end to end example#1

Open
mramanindia wants to merge 4 commits into
mainfrom
customer_support_agent_end_to_end_example
Open

Customer support agent end to end example#1
mramanindia wants to merge 4 commits into
mainfrom
customer_support_agent_end_to_end_example

Conversation

@mramanindia

@mramanindia mramanindia commented Oct 23, 2025

Copy link
Copy Markdown
Collaborator
  • End to end customer support agent for noveum with traces integrated and evals

Summary by CodeRabbit

Release Notes

  • New Features

    • Complete dataset management workflow including creation, versioning, publishing, and uploads via API.
    • Trace fetching and data processing pipeline with automatic span filtering and enrichment.
    • Agent evaluation system with scoring and performance analysis powered by Gemini.
    • Batch score upload capability with dry-run validation mode.
    • Comprehensive demo notebook for end-to-end evaluation workflows.
  • Documentation

    • Added workflow guide detailing dataset setup and upload procedures.

@coderabbitai

coderabbitai Bot commented Oct 23, 2025

Copy link
Copy Markdown

Walkthrough

The pull request introduces a comprehensive workflow system for Noveum platform dataset management, including dataset creation, versioning, item upload, publishing, evaluation, and scoring. It adds utilities for data preprocessing, trace fetching, agent evaluation with Gemini, and API-based score uploads, alongside documentation and dependencies specifications.

Changes

Cohort / File(s) Summary
Documentation & Configuration
README_workflow.md, noveum_agent_requirements.txt, api_data.json
Adds workflow documentation with step-by-step commands, environment variables, and troubleshooting guidance; consolidates Python package dependencies; provides sample API data with item keys and IDs.
Dataset Creation & Versioning
create_dataset.py, create_dataset_version.py
Implements API-based dataset creation and version management with environment validation, beta URL routing, and structured CLI argument parsing.
Dataset Operations
fetch_dataset_items.py, upload_dataset.py, publish_dataset_version.py
Fetches dataset items from API, uploads items with schema normalization and metadata handling, and publishes dataset versions to Noveum endpoints.
Data Preprocessing Pipeline
preprocess_filter.py, preprocess_map.py, preprocess_split_data.py, fix_api_data_v2.py
Filters spans by exclusion rules, normalizes tool output formatting, enriches spans with standardized agent/tool fields, splits datasets by span name, and populates item_key fields.
Trace Management
traces/fetch_traces_api.py, traces/combine_spans_api_compat.py
Fetches traces in paginated batches from Noveum API, manages directory cleanup, and merges trace data with spans into unified dataset format.
Evaluation & Analysis Utilities
demo_utils.py, novapilot_utils.py
Provides end-to-end evaluation orchestration using Gemini (dataset loading, span-to-agent conversion, scoring, statistics), and comprehensive agent performance analysis with recommendations.
Evaluation Notebook & Scoring
final_agent_evaluation_demo.ipynb, upload_scores.py
Jupyter workflow demonstrating preprocessing and evaluation pipeline; uploads scorer results with item-key-to-item-id mapping, batch processing, and dry-run support.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant API
    participant FileSystem
    
    User->>CLI: Run create_dataset.py
    CLI->>FileSystem: Load .env
    CLI->>CLI: Validate NOVEUM_* env vars
    CLI->>API: POST /create dataset
    API->>API: Create dataset
    API-->>CLI: Return dataset_slug
    CLI->>FileSystem: Save response, output slug
    CLI-->>User: Display slug for versioning
    
    User->>CLI: Run create_dataset_version.py
    CLI->>FileSystem: Load .env
    CLI->>CLI: Validate env (slug, version)
    CLI->>API: POST /versions with version
    API-->>CLI: Return version_id
    CLI->>FileSystem: Save response
    CLI-->>User: Display version_id
    
    User->>CLI: Run upload_dataset.py
    CLI->>FileSystem: Load dataset JSON
    CLI->>CLI: Transform items (schema, metadata)
    CLI->>API: POST /items with transformed data
    API-->>CLI: Confirm upload
    CLI-->>User: Report success
    
    User->>CLI: Run publish_dataset_version.py
    CLI->>API: POST /publish
    API-->>CLI: Confirm publication
    CLI-->>User: Publish complete
    
    User->>CLI: Run upload_scores.py
    CLI->>FileSystem: Read api_data.json
    CLI->>FileSystem: Read scores CSV
    CLI->>CLI: Build batch payload (key→id mapping)
    CLI->>API: POST /scores (batched)
    API-->>CLI: Batch results
    CLI-->>User: Upload summary
Loading
sequenceDiagram
    participant Traces
    participant Fetch as fetch_traces_api.py
    participant Combine as combine_spans_api_compat.py
    participant Process as preprocess_*.py
    participant Eval as demo_utils.py
    participant Output
    
    Traces->>Fetch: API endpoint
    Fetch->>Fetch: Paginate batches (max 100)
    Fetch->>Traces: Save traces_batch_*.json
    Fetch->>Combine: Pass traces_dir
    
    Combine->>Traces: Read traces_batch_*.json
    Combine->>Combine: Merge spans + trace metadata
    Combine->>Traces: Write dataset.json
    Combine->>Process: Pass dataset.json
    
    Process->>Process: Filter (exclude_spans)
    Process->>Process: Map (enrich with agent/tool fields)
    Process->>Process: Split by span name
    Process->>Eval: Pass split_datasets/
    
    Eval->>Eval: Load & analyze each file
    Eval->>Eval: Convert spans → AgentData
    Eval->>Eval: Setup Gemini model
    Eval->>Eval: Run evaluation (sampled)
    Eval->>Output: Export results + dataset CSV
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The diff introduces 15+ new Python modules and a Jupyter notebook with diverse, domain-specific logic: API integration with environment-driven URL routing, multi-stage data transformation pipelines, CSV processing with metadata enrichment, batch pagination, Gemini LLM orchestration, and complex span-to-agent-data mapping. While individual scripts follow consistent patterns (validate → process → output), the heterogeneous nature of functionality across files, nested data transformations, and the breadth of public APIs (35+ new functions) demand careful review of correctness, error handling, and integration points.

Poem

🐰 A fuzzy tale of spans and scores,
Through datasets vast and pipeline doors,
From traces merged to gems refined,
And Noveum's wisdom intertwined,
Uploads dance, evaluations gleam—
The workflow hops, a coder's dream!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "Customer support agent end to end example" accurately reflects the main objective of the changeset, which is to introduce a complete customer support agent infrastructure for Noveum. The title clearly indicates both the primary component (customer support agent) and its scope (end-to-end functionality). While the title doesn't enumerate individual components like dataset management, score uploading, or evaluation utilities, it appropriately summarizes the overall purpose at a high level. The title is specific enough that a teammate scanning the commit history would understand this adds customer support agent infrastructure to the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 86.49% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch customer_support_agent_end_to_end_example

Comment @coderabbitai help to get the list of available commands and usage tips.

…bilities; remove requirements.txt; update README with additional instructions.
…ent performance metrics, including CSV and JSON formats across various evaluation types.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (35)
noveum_customer_support_bt/traces/combine_spans_api_compat.py (1)

97-97: Remove unnecessary f-string prefix.

The f-string on line 97 contains no placeholders.

Apply this diff:

-        print(f"\nSpan types distribution:")
+        print("\nSpan types distribution:")
noveum_customer_support_bt/publish_dataset_version.py (1)

24-40: Consider extracting shared validation logic.

The validate_environment function is duplicated across multiple files (create_dataset_version.py, this file). Consider extracting to a shared utility module to reduce duplication.

Create a shared noveum_utils.py or similar:

# noveum_utils.py
def validate_environment(required_vars: Dict[str, Optional[str]]) -> bool:
    """Validate that all required environment variables are set"""
    missing_vars = [var for var, value in required_vars.items() if not value]
    
    if missing_vars:
        print(f"Error: Missing required environment variables: {', '.join(missing_vars)}")
        print("Please set these variables in your .env file or environment")
        return False
    
    return True

Then import and use it in each script.

noveum_customer_support_bt/traces/fetch_traces_api.py (2)

84-94: Consider refactoring global variable usage.

Using global project to modify module-level state makes the code harder to test and reason about. Consider passing project as a parameter through the call chain.

def main():
    parser = argparse.ArgumentParser(...)
    parser.add_argument('count', type=int, help='Number of traces to fetch')
    parser.add_argument('--project', type=str, default='noveum-ai-agent-rag-websearch', 
                       help='Project name')
    
    args = parser.parse_args()
    
    # Pass project as parameter instead of using global
    fetch_traces(args.count, args.project)

150-150: Remove unnecessary f-string prefix.

The f-string contains no placeholders.

Apply this diff:

-    print(f"\n=== Summary ===")
+    print("\n=== Summary ===")
noveum_customer_support_bt/fix_api_data_v2.py (2)

42-44: Consider handling missing turn_id more explicitly.

Setting item_key to an empty string when turn_id is not found may cause issues downstream. Consider logging a warning or using a more explicit placeholder.

         # Add item_key field
-        fixed_item['item_key'] = turn_id or ''
+        if not turn_id:
+            print(f"Warning: Could not find turn_id in item, using empty string")
+        fixed_item['item_key'] = turn_id or ''

54-54: Remove unnecessary f-string prefix.

Apply this diff:

-    print(f"Added item_key field for each item using turn_id found in the data")
+    print("Added item_key field for each item using turn_id found in the data")
noveum_customer_support_bt/final_agent_evaluation_demo.ipynb (3)

102-119: Remove redundant import after module reload.

The run_complete_agent_evaluation function is imported again in cell starting at line 122 after being imported in this cell. The second import at line 127 makes the import at line 116 redundant.

The cell at lines 122-130 already imports and uses the function, so the import at line 116 could be part of a broader import statement or the cells could be consolidated.


140-158: Rename unused loop variable.

The loop variable idx from iterrows() is not used in the loop body. Convention is to rename it to _idx or _.

Apply this pattern to all three analysis cells (lines 154, 171, 188):

-for idx, row in task_progression.iterrows():
+for _idx, row in task_progression.iterrows():

194-228: Remove or document empty cells.

The notebook contains multiple empty code cells at the end. Consider removing them or adding comments explaining their purpose if they're meant as scratch space.

noveum_customer_support_bt/preprocess_filter.py (4)

36-59: Use json.dumps and remove no-op replace; avoid lossy dict-to-str conversion.

str(dict) yields Python repr and the .replace("'", "'") does nothing. Use JSON serialization for stable, portable strings. Also handle nested lists/dicts uniformly.

Apply:

 def convert_tool_output_to_string(span: Dict[str, Any]) -> Dict[str, Any]:
@@
-    if tool_output and isinstance(tool_output, list):
-        # Convert list of objects to concatenated string format
-        output_strings = []
-        for item in tool_output:
-            if isinstance(item, dict):
-                # Convert dict to string format like "{'url': '...', 'content': '...'}"
-                item_str = str(item).replace("'", "'")  # Ensure single quotes
-                output_strings.append(item_str)
-            else:
-                output_strings.append(str(item))
-        
-        # Join all items with space
-        attributes['tool.output.output'] = ' '.join(output_strings)
+    if isinstance(tool_output, list) and tool_output:
+        import json as _json
+        attributes['tool.output.output'] = ' '.join(
+            _json.dumps(x, ensure_ascii=False) if isinstance(x, (dict, list)) else str(x)
+            for x in tool_output
+        )
         span['attributes'] = attributes

61-79: Be defensive if the input JSON isn’t a list of spans.

If the file is a dict (e.g., {"spans":[...]}), the current code iterates keys and miscounts. Guard and normalize.

-    with open(input_file, 'r') as f:
+    with open(input_file, 'r', encoding='utf-8') as f:
         data = json.load(f)
 
-    print(f"Original dataset: {len(data)} records")
+    if isinstance(data, dict):
+        data = data.get('spans') or data.get('data') or []
+    if not isinstance(data, list):
+        raise ValueError(f"Expected a list of spans, got {type(data).__name__}")
+    print(f"Original dataset: {len(data)} records")

Also applies to: 83-87, 91-97


100-117: Narrow the broad exception; print clearer failures.

Catching Exception hides actionable errors and triggers Ruff BLE001. Catch expected I/O/JSON errors.

-    try:
+    try:
         output_file = filter_dataset(input_file)
         print(f"\nSuccess! Created {output_file}")
-    except Exception as e:
+    except (OSError, IOError, json.JSONDecodeError, ValueError, KeyError) as e:
         print(f"Error: {e}")
         sys.exit(1)

1-1: Shebang vs. execution bit.

Either make the file executable in git or drop the shebang to satisfy EXE001. Up to you.

noveum_customer_support_bt/create_dataset.py (2)

45-50: Nit: remove superfluous f-string.

No placeholders in the beta URL string. Silence F541.

-    if beta_env:
-        api_url = f"https://noveum.ai/api/v1/datasets"
+    if beta_env:
+        api_url = "https://noveum.ai/api/v1/datasets"

114-121: CLI UX: fail fast on missing name/slug mismatch.

After fixing envs, add a brief echo clarifying which slug will be used/omitted to match the payload behavior.

noveum_customer_support_bt/fetch_dataset_items.py (3)

81-86: Remove unnecessary f-string; keep output identical.

Silence F541.

-        print(f"\nDataset items saved to: api_data.json")
+        print("\nDataset items saved to: api_data.json")

23-31: Optional: add basic pagination support (future-proofing).

If the API paginates items, loop until exhaustion; otherwise large datasets may truncate.

Happy to draft a fetch_all_dataset_items() helper if the API exposes next/cursor.

Also applies to: 43-52


60-74: Print remediation for missing envs.

Mirror the guidance style used in upload_scores.py by printing the variable names with placeholders.

noveum_customer_support_bt/preprocess_split_data.py (3)

31-35: Stronger filename sanitization with regex.

The chained replace can miss other invalid chars and is harder to maintain.

+import re
@@
-def sanitize_filename(name):
+def sanitize_filename(name):
@@
-    safe_name = name.replace(':', '_').replace('/', '_').replace('\\', '_').replace('*', '_').replace('?', '_').replace('"', '_').replace('<', '_').replace('>', '_').replace('|', '_')
+    safe_name = re.sub(r'[^A-Za-z0-9._-]+', '_', name)
     return f"{safe_name}_dataset.json"

50-55: Defensive load: ensure list.

Gracefully handle unexpected root JSON shapes.

-    with open(input_file, 'r') as f:
+    with open(input_file, 'r', encoding='utf-8') as f:
         data = json.load(f)
-    
-    print(f"Loaded {len(data)} objects")
+    if isinstance(data, dict):
+        data = data.get('data') or data.get('spans') or []
+    if not isinstance(data, list):
+        raise ValueError(f"Expected a list, got {type(data).__name__}")
+    print(f"Loaded {len(data)} objects")

1-1: Shebang vs. execution bit.

Same EXE001 note as other scripts: either make executable or drop the shebang.

noveum_customer_support_bt/upload_dataset.py (1)

141-158: Payload size and duplication caution.

Surfacing many schema keys plus embedding full raw can bloat requests. If the API has payload limits, consider sending only the normalized fields and storing raw client-side.

Also applies to: 165-179

noveum_customer_support_bt/create_dataset_version.py (2)

68-77: Minor: prefer try/else for success path (style).

Not required, but moving the success return data to an else: avoids returning from within try (Ruff TRY300).


1-1: Shebang vs. execution bit.

Same EXE001 suggestion as others.

noveum_customer_support_bt/upload_scores.py (2)

1-1: Shebang without exec bit

Either make the file executable (chmod +x) or drop the shebang and run via python upload_scores.py.

-#!/usr/bin/env python3
+#!/usr/bin/env python3

Note: keep if you plan to chmod +x noveum_customer_support_bt/upload_scores.py. Otherwise remove it.


95-151: Make pass threshold configurable and use integer ms

Avoid hardcoding 0.5 and prefer inclusive check. Also store executionTimeMs as int.

-def create_batch_payload(
+def create_batch_payload(
     csv_data: List[Dict],
     key_to_id: Dict[str, str],
     org_slug: str,
     project: str,
     environment: str,
     dataset_slug: str,
     dataset_version: str,
     scorer_id: str = "custom_scorer",
-    scorer_version: str = "1.0.0"
+    scorer_version: str = "1.0.0",
+    pass_threshold: float = 0.5,
 ) -> List[Dict]:
@@
-            "score": row['score'],
-            "passed": row['score'] > 0.5,  # Default threshold, can be adjusted
+            "score": row['score'],
+            "passed": row['score'] >= pass_threshold,
@@
-            "executionTimeMs": 0.0
+            "executionTimeMs": 0

Remember to plumb pass_threshold from CLI or env if needed.

noveum_customer_support_bt/preprocess_map.py (3)

1-1: Shebang without exec bit

As with other scripts, either mark executable or remove the shebang.


175-196: Add encoding and preserve unicode in JSON output

Explicit encoding for reads/writes and ensure_ascii=False for readable UTF‑8.

-    with open(input_file, 'r') as f:
+    with open(input_file, 'r', encoding='utf-8') as f:
         data = json.load(f)
@@
-    with open(output_file, 'w') as f:
-        json.dump(mapped_data, f, indent=2)
+    with open(output_file, 'w', encoding='utf-8') as f:
+        json.dump(mapped_data, f, indent=2, ensure_ascii=False)

210-215: Avoid blind except Exception; narrow error handling and re-raise

Catch specific errors and preserve traceback.

-    except Exception as e:
-        print(f"Error: {e}")
-        sys.exit(1)
+    except (OSError, json.JSONDecodeError, ValueError) as e:
+        print(f"Error: {e!s}")
+        sys.exit(1)
noveum_customer_support_bt/novapilot_utils.py (3)

159-175: CSV parsing is brittle; validate structure and handle NaNs

Don’t assume 4 ID columns or perfect pairs; add checks and safe conversions.

-        # Get column names
-        columns = df.columns.tolist()
-        
-        # Skip first 4 columns (IDs)
-        # The remaining columns are split into score columns and reasoning columns
-        remaining_columns = columns[4:]
-        n_scorers = len(remaining_columns) // 2  # Each scorer has score + reasoning column
-        
-        score_columns = remaining_columns[:n_scorers]  # First half are score columns
-        reasoning_columns = remaining_columns[n_scorers:]  # Second half are reasoning columns
+        # Identify scorer columns heuristically:
+        # keep columns ending with "_reasoning" separate from score columns
+        df = df.fillna({"": None})
+        columns = df.columns.tolist()
+        reasoning_columns = [c for c in columns if c.endswith("_reasoning")]
+        score_columns = [c for c in columns if c not in reasoning_columns][:len(reasoning_columns)]
+        if not score_columns or len(score_columns) != len(reasoning_columns):
+            raise ValueError("CSV does not contain matched score/_reasoning column pairs")

And coerce NaNs/None when building strings:

-                score = row[scorer_name]
-                reasoning = row[reasoning_col]
+                score = row.get(scorer_name, "")
+                reasoning = row.get(reasoning_col, "")
+                if pd.isna(score): score = ""
+                if pd.isna(reasoning): reasoning = ""

222-225: Preserve original traceback: use raise, not raise e

Replace raise e with bare raise in exception blocks.

-            raise e
+            raise
@@
-            raise e
+            raise
@@
-            raise e
+            raise

Also applies to: 257-259, 389-391


129-147: Potential PII in logs

Full model inputs/outputs are logged to disk. Consider a redact option or a flag to disable logging sensitive fields.

noveum_customer_support_bt/demo_utils.py (3)

191-193: Avoid dumping full span on unknown type (noise/PII)

Log minimal identifiers instead of the entire span.

-    print('returning unknown type for span')
-    print(span)
+    print('returning unknown type for span:', span.get('name'), span.get('span_id'))

912-924: Tidy unused vars and minor string nit

Store computed values to avoid lints and aid debugging; remove stray f.

-        spans_data, span_types = load_and_analyze_dataset(selected_file)
+        spans_data, span_types = load_and_analyze_dataset(selected_file)
+        results['span_types'] = span_types
@@
-        agent_data_list, conversion_errors, dataset = convert_spans_to_agent_dataset(spans_data)
+        agent_data_list, conversion_errors, dataset = convert_spans_to_agent_dataset(spans_data)
+        results['conversion_errors'] = conversion_errors[:5]
@@
-        stats = analyze_dataset_statistics(dataset)
-        behavior_analysis = analyze_agent_behavior_patterns(dataset)
+        stats = analyze_dataset_statistics(dataset)
+        behavior_analysis = analyze_agent_behavior_patterns(dataset)
+        results['stats'] = stats
+        results['behavior_analysis'] = behavior_analysis
@@
-        print(f"📊 Final Results:")
+        print("📊 Final Results:")

Also applies to: 938-939, 1000-1000


49-53: Narrow overly broad exception handling where feasible

Consider catching specific exceptions (e.g., FileNotFoundError, JSONDecodeError, ValueError, OSError) instead of blanket Exception to reduce false positives and preserve signal. Keep broad catch only at top-level pipeline boundary.

Would you like me to open a follow-up PR to systematically narrow these across this module?

Also applies to: 84-95, 340-347, 409-418, 514-564, 561-564, 642-647, 747-760, 1017-1023

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0fc05a and 17aceda.

⛔ Files ignored due to path filters (1)
  • noveum_customer_support_bt/support_agent.png is excluded by !**/*.png
📒 Files selected for processing (18)
  • noveum_customer_support_bt/README_workflow.md (1 hunks)
  • noveum_customer_support_bt/api_data.json (1 hunks)
  • noveum_customer_support_bt/create_dataset.py (1 hunks)
  • noveum_customer_support_bt/create_dataset_version.py (1 hunks)
  • noveum_customer_support_bt/demo_utils.py (1 hunks)
  • noveum_customer_support_bt/fetch_dataset_items.py (1 hunks)
  • noveum_customer_support_bt/final_agent_evaluation_demo.ipynb (1 hunks)
  • noveum_customer_support_bt/fix_api_data_v2.py (1 hunks)
  • noveum_customer_support_bt/novapilot_utils.py (1 hunks)
  • noveum_customer_support_bt/noveum_agent_requirements.txt (1 hunks)
  • noveum_customer_support_bt/preprocess_filter.py (1 hunks)
  • noveum_customer_support_bt/preprocess_map.py (1 hunks)
  • noveum_customer_support_bt/preprocess_split_data.py (1 hunks)
  • noveum_customer_support_bt/publish_dataset_version.py (1 hunks)
  • noveum_customer_support_bt/traces/combine_spans_api_compat.py (1 hunks)
  • noveum_customer_support_bt/traces/fetch_traces_api.py (1 hunks)
  • noveum_customer_support_bt/upload_dataset.py (1 hunks)
  • noveum_customer_support_bt/upload_scores.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (13)
noveum_customer_support_bt/upload_scores.py (2)
noveum_customer_support_bt/fetch_dataset_items.py (1)
  • main (60-91)
noveum_customer_support_bt/upload_dataset.py (1)
  • main (188-210)
noveum_customer_support_bt/preprocess_split_data.py (2)
noveum_customer_support_bt/preprocess_filter.py (1)
  • main (100-116)
noveum_customer_support_bt/preprocess_map.py (1)
  • main (199-215)
noveum_customer_support_bt/create_dataset_version.py (3)
noveum_customer_support_bt/create_dataset.py (2)
  • validate_environment (24-40)
  • main (93-158)
noveum_customer_support_bt/publish_dataset_version.py (2)
  • validate_environment (24-40)
  • main (79-114)
noveum_customer_support_bt/upload_dataset.py (2)
  • validate_environment (27-43)
  • main (188-210)
noveum_customer_support_bt/publish_dataset_version.py (1)
noveum_customer_support_bt/create_dataset_version.py (2)
  • validate_environment (24-40)
  • main (85-120)
noveum_customer_support_bt/create_dataset.py (3)
noveum_customer_support_bt/create_dataset_version.py (2)
  • validate_environment (24-40)
  • main (85-120)
noveum_customer_support_bt/upload_dataset.py (2)
  • validate_environment (27-43)
  • main (188-210)
noveum_customer_support_bt/upload_scores.py (1)
  • main (214-332)
noveum_customer_support_bt/preprocess_map.py (2)
noveum_customer_support_bt/preprocess_filter.py (1)
  • main (100-116)
noveum_customer_support_bt/preprocess_split_data.py (1)
  • main (88-115)
noveum_customer_support_bt/upload_dataset.py (3)
noveum_customer_support_bt/create_dataset_version.py (2)
  • validate_environment (24-40)
  • main (85-120)
noveum_customer_support_bt/fetch_dataset_items.py (1)
  • main (60-91)
noveum_customer_support_bt/upload_scores.py (1)
  • main (214-332)
noveum_customer_support_bt/preprocess_filter.py (2)
noveum_customer_support_bt/preprocess_map.py (1)
  • main (199-215)
noveum_customer_support_bt/preprocess_split_data.py (1)
  • main (88-115)
noveum_customer_support_bt/fetch_dataset_items.py (1)
noveum_customer_support_bt/upload_scores.py (1)
  • main (214-332)
noveum_customer_support_bt/traces/combine_spans_api_compat.py (1)
noveum_customer_support_bt/traces/fetch_traces_api.py (1)
  • main (84-158)
noveum_customer_support_bt/novapilot_utils.py (1)
noveum_customer_support_bt/demo_utils.py (1)
  • setup_logging (769-794)
noveum_customer_support_bt/demo_utils.py (4)
noveum_customer_support_bt/create_dataset.py (1)
  • validate_environment (24-40)
noveum_customer_support_bt/create_dataset_version.py (1)
  • validate_environment (24-40)
noveum_customer_support_bt/publish_dataset_version.py (1)
  • validate_environment (24-40)
noveum_customer_support_bt/upload_dataset.py (1)
  • validate_environment (27-43)
noveum_customer_support_bt/traces/fetch_traces_api.py (1)
noveum_customer_support_bt/traces/combine_spans_api_compat.py (1)
  • main (66-99)
🪛 Ruff (0.14.1)
noveum_customer_support_bt/upload_scores.py

1-1: Shebang is present but file is not executable

(EXE001)


70-70: Avoid specifying long messages outside the exception class

(TRY003)


72-72: Avoid specifying long messages outside the exception class

(TRY003)


74-74: Avoid specifying long messages outside the exception class

(TRY003)


208-208: Do not catch blind exception: Exception

(BLE001)


209-209: Use explicit conversion flag

Replace with conversion flag

(RUF010)

noveum_customer_support_bt/fix_api_data_v2.py

1-1: Shebang is present but file is not executable

(EXE001)


31-31: Loop control variable key not used within loop body

Rename unused key to _key

(B007)


39-39: Do not use bare except

(E722)


39-40: try-except-pass detected, consider logging the exception

(S110)


54-54: f-string without any placeholders

Remove extraneous f prefix

(F541)

noveum_customer_support_bt/preprocess_split_data.py

1-1: Shebang is present but file is not executable

(EXE001)

noveum_customer_support_bt/create_dataset_version.py

1-1: Shebang is present but file is not executable

(EXE001)


76-76: Consider moving this statement to an else block

(TRY300)

noveum_customer_support_bt/publish_dataset_version.py

1-1: Shebang is present but file is not executable

(EXE001)


70-70: Consider moving this statement to an else block

(TRY300)

noveum_customer_support_bt/create_dataset.py

1-1: Shebang is present but file is not executable

(EXE001)


47-47: f-string without any placeholders

Remove extraneous f prefix

(F541)


84-84: Consider moving this statement to an else block

(TRY300)


152-152: f-string without any placeholders

Remove extraneous f prefix

(F541)


155-155: f-string without any placeholders

Remove extraneous f prefix

(F541)

noveum_customer_support_bt/preprocess_map.py

1-1: Shebang is present but file is not executable

(EXE001)


213-213: Do not catch blind exception: Exception

(BLE001)

noveum_customer_support_bt/upload_dataset.py

1-1: Shebang is present but file is not executable

(EXE001)


56-56: Consider moving this statement to an else block

(TRY300)


179-179: Consider moving this statement to an else block

(TRY300)

noveum_customer_support_bt/preprocess_filter.py

1-1: Shebang is present but file is not executable

(EXE001)


114-114: Do not catch blind exception: Exception

(BLE001)

noveum_customer_support_bt/fetch_dataset_items.py

1-1: Shebang is present but file is not executable

(EXE001)


51-51: Consider moving this statement to an else block

(TRY300)


85-85: f-string without any placeholders

Remove extraneous f prefix

(F541)

noveum_customer_support_bt/traces/combine_spans_api_compat.py

1-1: Shebang is present but file is not executable

(EXE001)


59-59: Do not catch blind exception: Exception

(BLE001)


97-97: f-string without any placeholders

Remove extraneous f prefix

(F541)

noveum_customer_support_bt/final_agent_evaluation_demo.ipynb

42-42: Redefinition of unused run_complete_agent_evaluation from line 36

Remove definition: run_complete_agent_evaluation

(F811)


56-56: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)


67-67: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)


78-78: Loop control variable idx not used within loop body

Rename unused idx to _idx

(B007)

noveum_customer_support_bt/novapilot_utils.py

92-92: Avoid specifying long messages outside the exception class

(TRY003)


138-138: Avoid specifying long messages outside the exception class

(TRY003)


220-220: Consider moving this statement to an else block

(TRY300)


223-223: Use explicit conversion flag

Replace with conversion flag

(RUF010)


225-225: Use raise without specifying exception name

Remove exception name

(TRY201)


254-254: Consider moving this statement to an else block

(TRY300)


257-257: Use explicit conversion flag

Replace with conversion flag

(RUF010)


259-259: Use raise without specifying exception name

Remove exception name

(TRY201)


300-300: Do not catch blind exception: Exception

(BLE001)


302-302: Use explicit conversion flag

Replace with conversion flag

(RUF010)


311-311: Consider moving this statement to an else block

(TRY300)


312-312: Do not catch blind exception: Exception

(BLE001)


314-314: Use explicit conversion flag

Replace with conversion flag

(RUF010)


365-365: Avoid specifying long messages outside the exception class

(TRY003)


386-386: Consider moving this statement to an else block

(TRY300)


389-389: Use explicit conversion flag

Replace with conversion flag

(RUF010)


391-391: Use raise without specifying exception name

Remove exception name

(TRY201)

noveum_customer_support_bt/demo_utils.py

49-49: Consider moving this statement to an else block

(TRY300)


50-50: Do not catch blind exception: Exception

(BLE001)


84-84: Consider moving this statement to an else block

(TRY300)


92-92: Do not catch blind exception: Exception

(BLE001)


340-340: Do not catch blind exception: Exception

(BLE001)


409-409: Do not catch blind exception: Exception

(BLE001)


410-410: Use explicit conversion flag

Replace with conversion flag

(RUF010)


514-514: Consider moving this statement to an else block

(TRY300)


515-515: Do not catch blind exception: Exception

(BLE001)


560-560: Consider moving this statement to an else block

(TRY300)


561-561: Do not catch blind exception: Exception

(BLE001)


642-642: Do not catch blind exception: Exception

(BLE001)


747-747: Do not catch blind exception: Exception

(BLE001)


757-757: Do not catch blind exception: Exception

(BLE001)


912-912: Unpacked variable span_types is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


924-924: Unpacked variable agent_data_list is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


938-938: Local variable stats is assigned to but never used

Remove assignment to unused variable stats

(F841)


939-939: Local variable behavior_analysis is assigned to but never used

Remove assignment to unused variable behavior_analysis

(F841)


1000-1000: f-string without any placeholders

Remove extraneous f prefix

(F541)


1015-1015: Consider moving this statement to an else block

(TRY300)


1017-1017: Do not catch blind exception: Exception

(BLE001)


1018-1018: Use explicit conversion flag

Replace with conversion flag

(RUF010)

noveum_customer_support_bt/traces/fetch_traces_api.py

59-59: Probable use of requests call without timeout

(S113)


64-64: Consider moving this statement to an else block

(TRY300)


150-150: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (8)
noveum_customer_support_bt/traces/fetch_traces_api.py (1)

59-64: LGTM - Timeout is properly configured.

The request includes timeout=30, which is appropriate for API calls. The static analysis hint about missing timeout is a false positive.

noveum_customer_support_bt/api_data.json (1)

1-84: LGTM - Clean JSON structure.

The JSON file has a consistent structure with proper UUID formatting for item_key fields. The data appears to be well-formed for use as a mapping between item keys and IDs.

noveum_customer_support_bt/noveum_agent_requirements.txt (1)

8-11: Update LangChain packages to latest stable versions after compatibility testing.

As of October 2025, the latest stable versions are: langchain 1.0.0, langchain-core 1.0.0, langchain-community 0.4, and langchain-openai 1.0.0. The pinned versions (0.3.x) are significantly outdated and likely miss critical updates. Update dependencies in requirements.txt to the latest versions and verify that your code is compatible with the new major releases, as breaking changes are probable.

noveum_customer_support_bt/publish_dataset_version.py (1)

52-55: Review Cookie header usage for redundancy and security.

The codebase shows inconsistent header patterns: five dataset-related scripts include both Authorization and Cookie headers with the same API key, while upload_scores.py and fetch_traces_api.py use only Authorization. Without confirmed API requirements, verify whether Cookie: apiKeyCookie={api_key} is actually required or redundant. If redundant, remove it—passing credentials in both Authorization and Cookie increases exposure in logs and traces.

noveum_customer_support_bt/create_dataset.py (1)

51-57: Auth redundancy: confirm cookie requirement.

You already send Bearer auth; the cookie mirrors the key and may be unnecessary. Fewer auth vectors reduce risk. Keep only if API mandates both.

Would you confirm whether apiKeyCookie is required? If not, we should drop it for all scripts for consistency and security.

noveum_customer_support_bt/upload_dataset.py (1)

147-153: Auth redundancy: confirm cookie requirement across scripts.

Same note as in create_dataset.py; keep both only if mandated.

If not required, drop 'Cookie': f'apiKeyCookie={api_key}' for consistency.

noveum_customer_support_bt/create_dataset_version.py (1)

24-41: Env validation consistency looks good.

Matches other scripts’ pattern. LGTM.

Consider centralizing validate_environment() in a shared module to DRY; want me to draft it?

noveum_customer_support_bt/upload_scores.py (1)

270-288: The general Noveum.ai documentation doesn't specify the expected score range. However, I found that NovaEval examples reference a "semantic_similarity" scorer with "threshold: 0.8", suggesting a normalized [0, 1] range. Let me search for more specific API endpoint documentation.

Based on my verification, I could not locate the official Noveum API documentation that explicitly specifies the expected score range for the batch scorer results endpoint. While NovaEval documentation shows a semantic_similarity scorer with threshold: 0.8, suggesting a [0, 1] normalized range, this does not definitively confirm the API's actual requirements for the upload endpoint.

Verify score range expectations with Noveum API documentation or testing

The code assumes scores are in [0, 1] range (line 146: hardcoded threshold of 0.5) but lacks:

  • Input validation or range checking
  • Score normalization logic
  • Documented API requirements confirmation

Confirm the expected score range with official Noveum API docs or test with sample data before uploading production scores. If the CSV uses a different scale (0-100, 1-5, etc.), add normalization accordingly.

Comment on lines +17 to +23
# Get API credentials from environment
api_key = os.getenv('NOVEUM_API_KEY')
org_slug = os.getenv('NOVEUM_ORG_SLUG')
dataset_slug = os.getenv('NOVEUM_DATASET_SLUG')
dataset_name = os.getenv('NOVEUM_DATASET_SLUG')
beta_env = os.getenv('BETA', 'false').lower() == 'true'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Blocker: dataset_name is read from the slug env; request uses slug incorrectly.

  • dataset_name = os.getenv('NOVEUM_DATASET_SLUG') should read NOVEUM_DATASET_NAME.
  • Validation lists NOVEUM_DATASET_NAME but the value comes from the slug, so missing-name won’t be detected.
  • Payload sets "slug": dataset_name; should be dataset_slug or omit to let API auto-generate (as comment suggests).
- dataset_slug = os.getenv('NOVEUM_DATASET_SLUG')
- dataset_name = os.getenv('NOVEUM_DATASET_SLUG')
+ dataset_slug = os.getenv('NOVEUM_DATASET_SLUG')
+ dataset_name = os.getenv('NOVEUM_DATASET_NAME')
@@
-    required_vars = {
+    required_vars = {
         'NOVEUM_API_KEY': api_key,
         'NOVEUM_ORG_SLUG': org_slug,
         'NOVEUM_DATASET_SLUG': dataset_slug,
         'NOVEUM_DATASET_NAME': dataset_name
     }
@@
-    request_data = {
-        "name": dataset_name,
-        "slug": dataset_name,  # Will be auto-generated by the API
+    request_data = {
+        "name": dataset_name,
+        # Include slug only if provided; otherwise let API generate it.
+        **({"slug": dataset_slug} if dataset_slug else {}),
         "description": description,
         "visibility": visibility,
         "dataset_type": dataset_type,
         "environment": environment
     }

Also applies to: 24-33, 58-66

Comment on lines +259 to +267
# Agent response
finish_values = attributes.get('agent.output.finish.return_values', {})
if isinstance(finish_values, dict) and 'output' in finish_values:
data['agent_response'] = finish_values['output']
elif attributes.get("agent_response"):
data['agent_response'] = attributes.get("agent_response")
else:
print("agent_response is not available " + span['span_id'])
# Tool calls from agent actions - handle different span types

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

KeyError risk on span['span_id']; use safe access (also for call_id)

Direct indexing can crash when span_id is absent. Use .get() fallbacks for prints and for ToolCall/ToolResult IDs.

-                print("agent_response is not available  " + span['span_id'])
+                print("agent_response is not available  " + str(span.get('span_id', 'unknown')))
@@
-                tool_call = ToolCall(
+                tool_call = ToolCall(
                     tool_name=tool_name,
                     parameters={'input': tool_input} if tool_input else {},
-                    call_id=span['span_id']
+                    call_id=str(span.get('span_id', 'unknown'))
                 )
@@
-            print("tool_output is not available " + span['span_id'])
+            print("tool_output is not available " + str(span.get('span_id', 'unknown')))
@@
-            tool_result = ToolResult(
-                call_id=span['span_id'],
+            tool_result = ToolResult(
+                call_id=str(span.get('span_id', 'unknown')),
                 result=tool_output,
                 success=span.get('status') == 'ok',
                 error_message=None if span.get('status') == 'ok' else 'Tool execution failed'
             )

Also applies to: 304-312, 355-365

🤖 Prompt for AI Agents
noveum_customer_support_bt/demo_utils.py around lines 259-267 (also apply same
pattern to 304-312 and 355-365): the code accesses span['span_id'] (and other
dict keys like call_id) via direct indexing which can raise KeyError; change
these to use safe .get() with sensible fallbacks (e.g., span.get('span_id',
'<unknown-span>') and call.get('call_id', '<unknown-call>')) when building
log/print messages and when reading ToolCall/ToolResult IDs so missing keys
don't crash execution; update all print/log and id lookups in the cited ranges
to use .get(...) and handle None/empty cases consistently.

Comment on lines +31 to +40
for key, value in item.items():
if isinstance(value, str) and 'turn_id' in value:
try:
# Try to parse as JSON and extract turn_id
parsed = json.loads(value)
if isinstance(parsed, dict) and 'turn_id' in parsed:
turn_id = parsed['turn_id']
break
except:
pass

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Improve exception handling and address unused variable.

The code has several issues:

  1. Loop variable key is unused (should be _key)
  2. Bare except catches all exceptions including system exits
  3. Silent failure in try-except-pass hides parsing errors

Apply this diff:

             # Search through all string values for turn_id pattern
-            for key, value in item.items():
+            for _key, value in item.items():
                 if isinstance(value, str) and 'turn_id' in value:
                     try:
                         # Try to parse as JSON and extract turn_id
                         parsed = json.loads(value)
                         if isinstance(parsed, dict) and 'turn_id' in parsed:
                             turn_id = parsed['turn_id']
                             break
-                    except:
-                        pass
+                    except json.JSONDecodeError:
+                        # Not a JSON string, continue searching
+                        continue
🧰 Tools
🪛 Ruff (0.14.1)

31-31: Loop control variable key not used within loop body

Rename unused key to _key

(B007)


39-39: Do not use bare except

(E722)


39-40: try-except-pass detected, consider logging the exception

(S110)

Comment on lines +350 to +366
def create_final_analysis(self, dataset_summaries: List[str], agent_doc: Optional[str] = None) -> str:
"""
Create final comprehensive analysis from all dataset summaries.

Args:
dataset_summaries: List of dataset summary strings.
agent_doc: Agent documentation string. If None, uses loaded documentation.

Returns:
Final analysis response from Gemini AI.
"""
if agent_doc is None:
agent_doc = self.reddit_agent_doc

if not agent_doc:
raise ValueError("No agent documentation provided. Load it first or pass as parameter.")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Guard against empty dataset_summaries

If none were produced, return a clear message instead of prompting the model with empty content.

-        # Combine all dataset summaries
-        combined_summaries = "\n\n".join(dataset_summaries)
+        if not dataset_summaries:
+            return "No dataset summaries available to analyze."
+        combined_summaries = "\n\n".join(dataset_summaries)

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.1)

365-365: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In noveum_customer_support_bt/novapilot_utils.py around lines 350 to 366, the
method create_final_analysis does not guard against an empty or all-empty
dataset_summaries list; update the function to detect if dataset_summaries is
empty or contains only blank strings and immediately return a clear, user-facing
message (e.g., "No dataset summaries were produced; cannot generate final
analysis.") instead of calling the model with empty content; keep existing
agent_doc handling and error for missing agent documentation intact.

@@ -0,0 +1,33 @@
# Noveum AI Agent with RAG + Web Search - Additional Requirements
# Core dependencies (already in main requirements.txt)
requests==2.32.3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the requests library version for security advisories.

Since this is a security-sensitive dependency, ensure version 2.32.3 is current and free from known vulnerabilities.


🏁 Script executed:

#!/bin/bash
# Check for security advisories for requests library

# Check PyPI for latest version
curl -s https://pypi.org/pypi/requests/json | jq '.info.version'

# Check for security advisories
gh api graphql -f query='
{
  securityVulnerabilities(first: 10, ecosystem: PIP, package: "requests") {
    nodes {
      advisory {
        summary
        severity
        publishedAt
      }
      vulnerableVersionRange
      firstPatchedVersion {
        identifier
      }
    }
  }
}'

Length of output: 1688


Update requests library to 2.32.4 or later to fix a known security vulnerability.

Version 2.32.3 is vulnerable to a credentials leak via .netrc files (MODERATE severity, CVE published June 2025). The vulnerability affects versions < 2.32.4 and is patched in 2.32.4. The latest available version is 2.32.5—upgrade to this or at minimum 2.32.4.

🤖 Prompt for AI Agents
In noveum_customer_support_bt/noveum_agent_requirements.txt around line 3, the
pinned requests version is 2.32.3 which is vulnerable; update the requirement to
at least 2.32.4 (preferably 2.32.5) to address the CVE and re-run dependency
install/lockfile generation (pip-compile or pip freeze) and CI dependency checks
to ensure the updated version is used across environments.

Comment on lines +66 to +69
def main():
# Get the directory where this script is located
script_dir = Path(__file__).parent
traces_dir = str(script_dir / "traces")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify the traces directory path resolution.

The path construction script_dir / "traces" may create a nested path traces/traces/ if the script is already in a traces directory. Review the expected directory structure.


🏁 Script executed:

#!/bin/bash
# Check the directory structure around combine_spans_api_compat.py
fd -t d traces | head -20
fd combine_spans_api_compat.py

Length of output: 96



---

🏁 Script executed:

```shell
#!/bin/bash
# Check if traces/traces directory exists and explore the actual directory structure
find noveum_customer_support_bt -type d -name traces
# Also check what files are in the traces directory
ls -la noveum_customer_support_bt/traces/ | head -30
# Check the content of combine_spans_api_compat.py around the main() function
cat noveum_customer_support_bt/traces/combine_spans_api_compat.py | head -100

Length of output: 3751


Fix incorrect traces directory path.

The code appends "traces" to the parent directory, creating a non-existent nested path noveum_customer_support_bt/traces/traces. Since the script already resides in the traces directory and the trace files are at noveum_customer_support_bt/traces, the os.listdir() call will fail to find any trace files.

Change line 69 from:

traces_dir = str(script_dir / "traces")

to:

traces_dir = str(script_dir)
🤖 Prompt for AI Agents
In noveum_customer_support_bt/traces/combine_spans_api_compat.py around lines 66
to 69, the script builds an incorrect path by appending "traces" to the script's
parent directory, resulting in a non-existent nested traces/traces directory;
change the assignment so traces_dir references the script directory itself
(i.e., use the script_dir path directly) so os.listdir() points to the actual
traces folder where the trace files reside.

Comment on lines +92 to +101
for item in items:
# Create a copy of the item to avoid modifying the original
item['item_type'] = item_type
item_copy = item.copy()

# Start with base structure
transformed_item = {
"item_key": item.get("turn_id", ""),
"item_type": item_type, # Use the provided item_type
"metadata": {} # Empty metadata as specified

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Blocker: mutating source items and overwriting normalized content.

  • You set item['item_type'] = item_type before copying, mutating callers’ data.
  • You parse/normalize content earlier, then overwrite it with the entire original item at Line 138, losing the normalization.

Apply:

+import copy
@@
-    for item in items:
-        # Create a copy of the item to avoid modifying the original
-        item['item_type'] = item_type
-        item_copy = item.copy()
+    for item in items:
+        original = copy.deepcopy(item)
+        # Work on a copy; do not mutate callers' data
+        item_copy = item.copy()
@@
-        transformed_item = {
-            "item_key": item.get("turn_id", ""),
+        transformed_item = {
+            # item_key fallback is set after schema surfacing
             "item_type": item_type,  # Use the provided item_type
             "metadata": {}  # Empty metadata as specified
         }
@@
-            if key in item_copy:
+            if key in item_copy:
                 value = item_copy.pop(key)
@@
-                elif key == "content":
+                elif key == "content":
                     # Handle content field - always ensure it's an object
                     if isinstance(value, str) and value.strip():
                         try:
                             transformed_item[key] = json.loads(value)
                         except json.JSONDecodeError:
                             transformed_item[key] = {}
                     elif isinstance(value, dict):
                         transformed_item[key] = value
                     else:
                         transformed_item[key] = {}
                 else:
                     transformed_item[key] = value
@@
-        # Always put the entire original item in content field
-        transformed_item["content"] = item
+        # Preserve original item separately without clobbering normalized 'content'
+        transformed_item["raw"] = original
+
+        # Ensure item_type present if source provided one
+        if item_copy.get("item_type"):
+            transformed_item["item_type"] = item_copy["item_type"] or transformed_item["item_type"]
+
+        # Finalize item_key fallback if absent
+        if not transformed_item.get("item_key"):
+            transformed_item["item_key"] = (
+                item_copy.get("item_key")
+                or item_copy.get("turn_id")
+                or item_copy.get("id")
+                or ""
+            )

Also applies to: 104-139

Comment on lines +26 to +41
print(f"Loading API data from {api_data_path}...")
with open(api_data_path, 'r') as f:
data = json.load(f)

# Create mapping from item_key to item_id
key_to_id = {}
items = data.get('items', [])

for item in items:
item_key = item.get('item_key')
item_id = item.get('item_id')
if item_key and item_id:
key_to_id[item_key] = item_id

print(f"Loaded {len(key_to_id)} item mappings")
return key_to_id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden api_data.json loading (encoding, errors, malformed data)

Add encoding, handle I/O/JSON errors, and validate expected structure.

-    with open(api_data_path, 'r') as f:
-        data = json.load(f)
+    try:
+        with open(api_data_path, 'r', encoding='utf-8') as f:
+            data = json.load(f)
+    except (OSError, json.JSONDecodeError) as e:
+        raise RuntimeError(f"Failed to read/parse API data at {api_data_path}: {e!s}") from e
@@
-    items = data.get('items', [])
+    items = data.get('items', [])
+    if not isinstance(items, list):
+        raise ValueError("api_data.json malformed: 'items' must be a list")
@@
-        if item_key and item_id:
+        if item_key and item_id:
             key_to_id[item_key] = item_id
🤖 Prompt for AI Agents
In noveum_customer_support_bt/upload_scores.py around lines 26 to 41, the
api_data.json loading is unprotected and assumes well-formed UTF-8 JSON; wrap
the file open and json.load in a try/except that opens the file with
encoding='utf-8' (or use pathlib.read_text('utf-8')), catch
FileNotFoundError/IOError and json.JSONDecodeError, log or print a clear error
and return an empty mapping (or re-raise if preferred), validate that the loaded
object is a dict and that data.get('items') is a list before iterating, and when
building key_to_id ensure item_key and item_id are the expected types (e.g.,
str/int) before inserting so malformed entries are skipped with a debug message.

Comment on lines +65 to +89
with open(csv_path, 'r', encoding='utf-8') as f:
reader = csv.DictReader(f)

# Verify columns exist
if item_key_col not in reader.fieldnames:
raise ValueError(f"Column '{item_key_col}' not found in CSV. Available columns: {reader.fieldnames}")
if score_col not in reader.fieldnames:
raise ValueError(f"Column '{score_col}' not found in CSV. Available columns: {reader.fieldnames}")
if reasoning_col not in reader.fieldnames:
raise ValueError(f"Column '{reasoning_col}' not found in CSV. Available columns: {reader.fieldnames}")

for row in reader:
item_key = row[item_key_col]
score = row[score_col]
reasoning = row[reasoning_col]

# Skip empty rows
if not item_key or not score:
continue

results.append({
'item_key': item_key,
'score': float(score),
'reasoning': reasoning
})

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CSV robustness: empty files, trimming, numeric conversion, and row skipping

Guard against empty CSVs (fieldnames is None), trim values, and skip non‑numeric scores instead of crashing.

-    with open(csv_path, 'r', encoding='utf-8') as f:
+    with open(csv_path, 'r', encoding='utf-8') as f:
         reader = csv.DictReader(f)
-        
-        # Verify columns exist
+        # Verify columns exist
+        if reader.fieldnames is None:
+            raise ValueError("CSV has no header row or is empty")
         if item_key_col not in reader.fieldnames:
             raise ValueError(f"Column '{item_key_col}' not found in CSV. Available columns: {reader.fieldnames}")
         if score_col not in reader.fieldnames:
             raise ValueError(f"Column '{score_col}' not found in CSV. Available columns: {reader.fieldnames}")
         if reasoning_col not in reader.fieldnames:
             raise ValueError(f"Column '{reasoning_col}' not found in CSV. Available columns: {reader.fieldnames}")
-        
-        for row in reader:
-            item_key = row[item_key_col]
-            score = row[score_col]
-            reasoning = row[reasoning_col]
-            
-            # Skip empty rows
-            if not item_key or not score:
-                continue
-            
-            results.append({
-                'item_key': item_key,
-                'score': float(score),
-                'reasoning': reasoning
-            })
+        for row in reader:
+            item_key = (row.get(item_key_col) or "").strip()
+            score_raw = (row.get(score_col) or "").strip()
+            reasoning = (row.get(reasoning_col) or "").strip()
+            if not item_key or not score_raw:
+                continue
+            try:
+                score_val = float(score_raw)
+            except ValueError:
+                print(f"Warning: skipping non-numeric score for key '{item_key}': {score_raw!r}")
+                continue
+            results.append({'item_key': item_key, 'score': score_val, 'reasoning': reasoning})
🧰 Tools
🪛 Ruff (0.14.1)

70-70: Avoid specifying long messages outside the exception class

(TRY003)


72-72: Avoid specifying long messages outside the exception class

(TRY003)


74-74: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In noveum_customer_support_bt/upload_scores.py around lines 65 to 89, the CSV
parsing code must handle an empty CSV (reader.fieldnames can be None), trim
whitespace from cell values, and avoid crashing on non‑numeric scores; first
check if reader.fieldnames is None and raise a clear ValueError about an empty
or malformed CSV, then for each row strip() the item_key, score, and reasoning
values, skip rows where stripped item_key or score are empty, attempt to convert
score to float inside a try/except and skip (or log) rows where conversion fails
instead of raising, and ensure reasoning defaults to an empty string if None
after stripping.

Comment on lines +178 to +210
api_url = f"https://beta.noveum.ai/api/v1/scorers/results/batch?organizationSlug={org_slug}"

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}

# Split results into batches
total = len(results)
batches = [results[i:i + batch_size] for i in range(0, total, batch_size)]

print(f"\nUploading {total} results in {len(batches)} batches...")

for i, batch in enumerate(batches, 1):
payload = {"results": batch}

try:
response = requests.post(
api_url,
headers=headers,
json=payload,
timeout=60
)

if response.status_code == 200:
print(f"✓ Batch {i}/{len(batches)} uploaded successfully ({len(batch)} results)")
else:
print(f"✗ Batch {i}/{len(batches)} failed: {response.status_code}")
print(f" Response: {response.text}")

except Exception as e:
print(f"✗ Batch {i}/{len(batches)} error: {str(e)}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

API upload resilience: status handling, retries, and exception scope

Use a Session with retries, check response.ok/raise_for_status(), and catch requests-specific exceptions.

-    for i, batch in enumerate(batches, 1):
-        payload = {"results": batch}
-        
-        try:
-            response = requests.post(
-                api_url,
-                headers=headers,
-                json=payload,
-                timeout=60
-            )
-            
-            if response.status_code == 200:
-                print(f"✓ Batch {i}/{len(batches)} uploaded successfully ({len(batch)} results)")
-            else:
-                print(f"✗ Batch {i}/{len(batches)} failed: {response.status_code}")
-                print(f"  Response: {response.text}")
-        
-        except Exception as e:
-            print(f"✗ Batch {i}/{len(batches)} error: {str(e)}")
+    from requests.adapters import HTTPAdapter
+    from urllib3.util.retry import Retry
+    session = requests.Session()
+    session.headers.update(headers)
+    retry = Retry(total=3, backoff_factor=0.5, status_forcelist=(429, 500, 502, 503, 504))
+    session.mount("https://", HTTPAdapter(max_retries=retry))
+
+    for i, batch in enumerate(batches, 1):
+        payload = {"results": batch}
+        try:
+            response = session.post(api_url, json=payload, timeout=60)
+            if response.ok:
+                print(f"✓ Batch {i}/{len(batches)} uploaded successfully ({len(batch)} results)")
+            else:
+                print(f"✗ Batch {i}/{len(batches)} failed: {response.status_code}")
+                print(f"  Response: {response.text}")
+                # Optional: response.raise_for_status()
+        except (requests.Timeout, requests.ConnectionError, requests.RequestException) as e:
+            print(f"✗ Batch {i}/{len(batches)} error: {e!s}")

Additionally, accept 2xx codes, not only 200.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
api_url = f"https://beta.noveum.ai/api/v1/scorers/results/batch?organizationSlug={org_slug}"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
# Split results into batches
total = len(results)
batches = [results[i:i + batch_size] for i in range(0, total, batch_size)]
print(f"\nUploading {total} results in {len(batches)} batches...")
for i, batch in enumerate(batches, 1):
payload = {"results": batch}
try:
response = requests.post(
api_url,
headers=headers,
json=payload,
timeout=60
)
if response.status_code == 200:
print(f"✓ Batch {i}/{len(batches)} uploaded successfully ({len(batch)} results)")
else:
print(f"✗ Batch {i}/{len(batches)} failed: {response.status_code}")
print(f" Response: {response.text}")
except Exception as e:
print(f"✗ Batch {i}/{len(batches)} error: {str(e)}")
api_url = f"https://beta.noveum.ai/api/v1/scorers/results/batch?organizationSlug={org_slug}"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
# Split results into batches
total = len(results)
batches = [results[i:i + batch_size] for i in range(0, total, batch_size)]
print(f"\nUploading {total} results in {len(batches)} batches...")
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
session.headers.update(headers)
retry = Retry(total=3, backoff_factor=0.5, status_forcelist=(429, 500, 502, 503, 504))
session.mount("https://", HTTPAdapter(max_retries=retry))
for i, batch in enumerate(batches, 1):
payload = {"results": batch}
try:
response = session.post(api_url, json=payload, timeout=60)
if response.ok:
print(f"✓ Batch {i}/{len(batches)} uploaded successfully ({len(batch)} results)")
else:
print(f"✗ Batch {i}/{len(batches)} failed: {response.status_code}")
print(f" Response: {response.text}")
# Optional: response.raise_for_status()
except (requests.Timeout, requests.ConnectionError, requests.RequestException) as e:
print(f"✗ Batch {i}/{len(batches)} error: {e!s}")
🧰 Tools
🪛 Ruff (0.14.1)

208-208: Do not catch blind exception: Exception

(BLE001)


209-209: Use explicit conversion flag

Replace with conversion flag

(RUF010)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant