Add NovaBot and NovaBot Simulator agents by mramanindia · Pull Request #2 · Noveum/agents-library

mramanindia · 2025-12-28T14:07:37Z

NovaBot and NovaBot Simulator agents - chatbot based on api.

Summary by CodeRabbit

New Features
- NovaBot agent for document-backed chat with a dedicated /novabot/chat endpoint
- NovaBot Simulator for automated end-to-end testing
- Added convenience agent methods (chat/process) for API compatibility
Documentation
- NovaBot README, get-started guide, simulator README, and dataset README added
API Improvements
- Enhanced agent health checks and async instance management; agent requests accept optional metadata
Chores
- Local docs index and metadata, scraping/build scripts, and agent-specific requirements added

…oints, and documentation. Introduced metadata and vector files for enhanced functionality. Updated shared components for improved modularity and tracing integration.

coderabbitai · 2025-12-28T14:07:46Z

📝 Walkthrough

Walkthrough

Adds a RAG-enabled NovaBot agent and simulator, local scraping and embedding-index build scripts, config/dataclasses and requirements, API/registry enhancements (metadata forwarding, /novabot/chat), shared export adjustments, and prebuilt Noveum docs index artifacts.

Changes

Cohort / File(s)	Summary
Index & Prebuilt Data `NoveumDocsData/index/metadata.json`, `NoveumDocsData/index/.gitkeep`, `NoveumDocsData/README.md`	Adds index metadata JSON (embedding model, normalized flag, count, items mapping doc/chunk/url), preserves index dir, and dataset README.
NovaBot Config & Metadata `src/agents/business/nova_bot/config.py`, `src/agents/business/nova_bot/agent_info.yaml`, `src/agents/business/nova_bot/requirements.txt`	New `NovaBotConfig` dataclass (from_env), agent metadata YAML, and NovaBot-specific requirements.
NovaBot Core `src/agents/business/nova_bot/main.py`, `src/agents/business/nova_bot/README.md`, `src/agents/business/nova_bot/getstarted.md`	Adds NovaBotAgent, RetrievedChunk, NumpyVectorRetriever, embeddings/LLM factories, session state, RAG flow, docs loader, system prompt and main entry.
Index Build & Scraping Scripts `scripts/novabot_build_index.py`, `scripts/novabot_scrape_docs.py`	New scraper producing chunked docs JSON and index builder that computes normalized embeddings, saves `vectors.npy` and `metadata.json`.
NovaBot Simulator `src/agents/business/nova_bot_simulator/...` `main.py`, `config.py`, `agent_info.yaml`, `README.md`, `requirements.txt`	New simulator, `SimulatorConfig`, `NovaBotSimulatorAgent`, Interaction/Session/Simulation dataclasses, question pools (normal/adversarial), run_simulation, aggregation, reporting and persistence.
Agent Pattern Updates `src/agents/basic/simple-chat-agent/main.py`, `src/agents/business/helpdesk-agent/main.py`, `src/agents/business/helpdesk-agent/config.py`	Add dynamic local config loader, update imports to `src.shared.*`, and expose endpoint-friendly wrapper methods (`process`, `chat` aliases) preserving behavior.
API & Registry `src/api/main.py`, `src/api/agent_registry.py`, `src/api/requirements.txt`	Forwards optional `metadata` to agents (with fallback), adds `/novabot/chat` route, sanitizes dynamic imports, makes health checks endpoint-driven, and adds async registry methods `check_agent_health` and `get_agent_instance`.
Shared Module & Tracing `src/shared/__init__.py`, `src/shared/noveum_tracer.py`	Adjusts public exports (`ConversationMemory` replaces `Memory`, adds `ChatAgentConfig`), and refactors tracer to use context-manager-based SDK v1.x tracing (internal changes only).
Misc Agents Docs/Config `src/agents/business/helpdesk-agent/`, `src/agents/basic/simple-chat-agent/`	Documentation additions and small refactors; added `process` aliases and import/path adjustments.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant API as "FastAPI /novabot/chat"
    participant Registry as "AgentRegistry"
    participant Agent as "NovaBotAgent"
    participant Retriever as "NumpyVectorRetriever"
    participant LLM as "LLM (OpenAI/Gemini)"
    participant Memory as "SessionMemory"

    User->>API: POST /novabot/chat (message, user_id, metadata)
    API->>Registry: get_agent_instance("business.nova_bot")
    Registry-->>API: NovaBotAgent
    API->>Agent: chat(message, user_id, metadata)
    activate Agent

    Agent->>Memory: _resolve_session_id(user_id, metadata)
    Memory-->>Agent: session_id
    Agent->>Memory: _get_chat_history_messages(session_id)
    Memory-->>Agent: history

    Agent->>Agent: _needs_rag(message)
    alt needs RAG
        Agent->>Retriever: aget_top_k(query)
        activate Retriever
        Retriever->>Retriever: embed_query & search vectors.npy
        Retriever-->>Agent: RetrievedChunk[]
        deactivate Retriever
        Agent->>Agent: _build_chain(with retrieved context)
    else no RAG
        Agent->>Agent: _build_chain(system + history)
    end

    Agent->>LLM: invoke(prompt)
    activate LLM
    LLM-->>Agent: response
    deactivate LLM

    Agent->>Agent: format_response(with rag meta)
    Agent-->>API: response_str + metadata
    deactivate Agent
    API-->>User: 200 OK (response)

sequenceDiagram
    actor Tester
    participant Simulator as "NovaBotSimulator"
    participant Driver as "SessionRunner"
    participant Agent as "NovaBotAgent"
    participant Aggregator as "MetricsAggregator"

    Tester->>Simulator: run_simulation()
    Simulator->>Simulator: load_config() & load_novabot_agent()
    Simulator->>Simulator: generate_question_pools()

    loop per session
        Simulator->>Driver: run_session(session_id, is_adversarial)
        activate Driver
        loop per question
            Driver->>Agent: chat(question)
            Agent-->>Driver: response + meta
            Driver->>Driver: record interaction (latency, rag, chunks)
        end
        Driver-->>Simulator: SessionResult
        deactivate Driver
    end

    Simulator->>Aggregator: aggregate_results(sessions)
    Aggregator-->>Simulator: SimulationResults
    Simulator->>Simulator: save_results(JSON) & print_summary()
    Simulator-->>Tester: SimulationResults

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Poem

🐇 I hopped through docs and vectors bright,

I stitched each chunk with embeddings light.
Sessions chatter, simulators hum,
NovaBot answers, seekers come.
Hooray — the rabbit cheers: “Well done!”

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'Add NovaBot and NovaBot Simulator agents' directly and clearly describes the main changes, accurately reflecting the primary scope of adding two new chatbot agents.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch demo_dataset_for_noveum

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (13)

src/api/main.py (1)
226-232: Broaden TypeError handling could mask other issues.

The try/except block catches TypeError to handle agents that don't support the metadata parameter. However, this could also catch other TypeErrors (e.g., wrong argument types, other signature mismatches), potentially masking real bugs.

Consider using introspection to check the signature instead:
🔎 Proposed refactor using signature inspection
+import inspect
+
 # Invoke agent
-# Backwards compatible: only pass metadata if the agent accepts it.
-try:
-    response = await agent_instance.chat(
-        request.message, request.user_id, request.metadata or {}
-    )
-except TypeError:
-    response = await agent_instance.chat(request.message, request.user_id)
+# Backwards compatible: only pass metadata if the agent accepts it.
+sig = inspect.signature(agent_instance.chat)
+if 'metadata' in sig.parameters:
+    response = await agent_instance.chat(
+        request.message, request.user_id, request.metadata or {}
+    )
+else:
+    response = await agent_instance.chat(request.message, request.user_id)
This approach explicitly checks for metadata parameter support without catching other potential TypeErrors.
src/agents/basic/simple-chat-agent/main.py (1)
21-33: Shadowed sys import inside function.

Line 23 imports sys locally, but sys is already imported at module level (line 17). The local import is technically redundant since the module-level import is already available.
🔎 Suggested simplification
 def _load_local_config():
     import importlib.util
-    import sys
     from pathlib import Path
 
     config_path = Path(__file__).resolve().with_name("config.py")
src/agents/business/nova_bot_simulator/README.md (2)
110-134: Add language specifier to fenced code block.

The code block showing console output should have a language specifier (e.g., text or console) for consistent markdown linting.
🔎 Suggested fix
-```
+```text
 🤖 NovaBotSimulator initialized
62-75: Consider clarifying the duplicate OPENAI_API_KEY reference.

The same OPENAI_API_KEY variable appears twice in the example (lines 64 and 69). While this is technically correct (same key used for both LLM and embeddings), consider adding a brief note that a single key serves both purposes to avoid confusion.
src/agents/business/nova_bot/config.py (1)
104-106: Consider defensive parsing for numeric environment variables.

int() and float() will raise ValueError if the environment variable is set to a non-numeric value. Consider adding try/except with fallback to defaults for more graceful handling.
🔎 Example defensive parsing
def _safe_int(value: str, default: int) -> int:
    try:
        return int(value)
    except (ValueError, TypeError):
        return default

# Usage:
top_k=_safe_int(os.getenv("NOVABOT_TOP_K"), 5),
src/agents/business/helpdesk-agent/main.py (1)
30-42: Same shadowed sys import as in simple-chat-agent.

The local sys import on line 32 shadows the module-level import on line 24. Consider removing the redundant local import.
🔎 Suggested fix
 def _load_local_config():
     import importlib.util
-    import sys
     from pathlib import Path
src/api/agent_registry.py (1)
471-493: Add explicit Optional type hint for config_overrides parameter.

The static analysis correctly flags that PEP 484 prohibits implicit Optional. The parameter config_overrides: Dict[str, Any] = None should be explicitly typed as Optional[Dict[str, Any]].

Also, line 488 has a redundant or {} since config_overrides is already truthy-checked on line 478.
🔎 Proposed fix
-    async def get_agent_instance(self, agent_id: str, config_overrides: Dict[str, Any] = None):
+    async def get_agent_instance(self, agent_id: str, config_overrides: Optional[Dict[str, Any]] = None):
         """
         Async wrapper used by the FastAPI app.

         If config_overrides are provided, a fresh instance is created (not cached)
         to avoid cross-request config contamination.
         """
         if config_overrides:
             module = self.load_agent_module(agent_id)
             agent_info = self.get_agent_info(agent_id)
             if not module or not agent_info:
                 return None

             agent_class = getattr(module, agent_info.class_name)
             config_class = getattr(module, agent_info.config_class)

             config = config_class.from_env()
-            for key, value in (config_overrides or {}).items():
+            for key, value in config_overrides.items():
                 if hasattr(config, key):
                     setattr(config, key, value)
             return agent_class(config)

         return self.create_agent_instance(agent_id, config_overrides=None)
src/agents/business/nova_bot_simulator/config.py (1)

35-37: Consider extracting _repo_root to a shared utility.

This helper is duplicated in src/agents/business/nova_bot/config.py (lines 14-16). While not critical, extracting it to a shared module would reduce duplication and ensure consistency.
src/agents/business/nova_bot/main.py (2)
250-263: Silently swallowing tracing initialization errors may hide configuration issues.

While the comment explains that tracing is optional, completely silencing errors makes debugging tracing issues difficult. Consider logging the exception at debug/warning level.
🔎 Proposed improvement
         try:
             import noveum_trace
             from noveum_trace import NoveumTraceCallbackHandler

             noveum_trace.init(
                 api_key=self.config.noveum_api_key,
                 project=self.config.noveum_project,
                 environment=self.config.noveum_environment,
             )

             self._noveum_callback = NoveumTraceCallbackHandler()
-        except Exception:
+        except Exception as e:
             # Tracing is optional; never break the bot if tracing deps are missing.
+            print(f"⚠️  Noveum tracing initialization skipped: {e}")
             self._noveum_callback = None
449-456: Broad exception catch in RAG retrieval could mask issues.

The except Exception block sets empty context and continues, which is reasonable for resilience, but logging the error would help with debugging retrieval failures.
🔎 Proposed improvement
             if needs_rag and self._embeddings is not None:
                 try:
                     tool_out = await rag_tool.ainvoke(q, config=config)
                     context = str(tool_out.get("context", ""))
                     sources = list(tool_out.get("sources", [])) or []
                     chunks_retrieved = int(tool_out.get("chunks_retrieved", 0))
-                except Exception:
+                except Exception as e:
+                    print(f"⚠️  RAG retrieval failed: {e}")
                     context, sources, chunks_retrieved = "", [], 0
src/agents/business/nova_bot_simulator/main.py (3)
115-147: URL parsing logic is repetitive.

The topic extraction duplicates the same pattern for each URL path segment. Consider refactoring to reduce repetition.
🔎 Proposed refactor
 def _extract_topics_from_metadata(metadata: Dict[str, Any]) -> List[str]:
     """Extract topic keywords from metadata URLs."""
     topics = set()
     items = metadata.get("items", [])
+    path_segments = [
+        "/concepts/", "/best-practices/", "/platform/",
+        "/getting-started/", "/evaluation/", "/integration-examples/"
+    ]
     for item in items:
         url = item.get("url", "")
-        # Extract meaningful parts from URLs
-        if "/concepts/" in url:
-            topic = url.split("/concepts/")[-1].split("#")[0].split("/")[0]
-            if topic:
-                topics.add(topic)
-        elif "/best-practices/" in url:
-            # ... repeated pattern
+        for segment in path_segments:
+            if segment in url:
+                topic = url.split(segment)[-1].split("#")[0].split("/")[0]
+                if topic:
+                    topics.add(topic)
+                break
     return sorted(list(topics))
357-366: Question sampling may produce duplicates when pool is exhausted.

When messages_per_session exceeds the question pool size, the while loop appends random samples that could include duplicates within the same session. Consider using random.choices for explicit repetition or shuffling the extended pool.
🔎 Proposed improvement
         # Generate questions for this session
-        session_questions = random.sample(
-            question_pool, min(self.config.messages_per_session, len(question_pool))
-        )
-        
-        # If not enough questions, repeat with shuffling
-        while len(session_questions) < self.config.messages_per_session:
-            session_questions.extend(
-                random.sample(question_pool, min(len(question_pool), self.config.messages_per_session - len(session_questions)))
-            )
-        session_questions = session_questions[:self.config.messages_per_session]
+        # Allow repetition if pool is smaller than requested count
+        session_questions = random.choices(question_pool, k=self.config.messages_per_session)
428-434: Remove extraneous f-string prefix.

Line 430 has an f-string without any placeholders.
🔎 Proposed fix
-        print(f"\n🚀 Starting simulation...")
+        print("\n🚀 Starting simulation...")

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0fc05a and f516261.

📒 Files selected for processing (22)

NoveumDocsData/index/metadata.json
NoveumDocsData/index/vectors.npy
NoveumDocsData/processed/docs.json
src/agents/basic/simple-chat-agent/main.py
src/agents/business/helpdesk-agent/config.py
src/agents/business/helpdesk-agent/main.py
src/agents/business/nova_bot/README.md
src/agents/business/nova_bot/agent_info.yaml
src/agents/business/nova_bot/config.py
src/agents/business/nova_bot/getstarted.md
src/agents/business/nova_bot/main.py
src/agents/business/nova_bot/requirements.txt
src/agents/business/nova_bot_simulator/README.md
src/agents/business/nova_bot_simulator/agent_info.yaml
src/agents/business/nova_bot_simulator/config.py
src/agents/business/nova_bot_simulator/main.py
src/agents/business/nova_bot_simulator/requirements.txt
src/api/agent_registry.py
src/api/main.py
src/api/requirements.txt
src/shared/__init__.py
src/shared/noveum_tracer.py

🧰 Additional context used

🧬 Code graph analysis (8)

src/agents/business/nova_bot/config.py (2)

src/agents/business/nova_bot_simulator/config.py (2)

_repo_root (35-37)

from_env (56-75)

src/agents/business/helpdesk-agent/config.py (1)

from_env (45-100)

src/api/agent_registry.py (2)

src/api/main.py (1)

get_agent_info (168-176)

src/api/client.py (2)

get_agent_info (97-113)

get_agent_info (349-361)

src/agents/basic/simple-chat-agent/main.py (2)

src/agents/business/helpdesk-agent/main.py (1)

_load_local_config (30-42)

src/agents/business/nova_bot/main.py (1)

_load_local_config (25-38)

src/agents/business/nova_bot/main.py (1)

src/agents/business/nova_bot/config.py (2)

NovaBotConfig (21-107)

from_env (78-107)

src/agents/business/helpdesk-agent/config.py (1)

src/shared/config.py (3)

LLMConfig (15-23)

MemoryConfig (27-33)

NoveumConfig (37-43)

src/agents/business/nova_bot_simulator/config.py (1)

src/agents/business/nova_bot/config.py (3)

NovaBotConfig (21-107)

_repo_root (15-17)

from_env (78-107)

src/agents/business/nova_bot_simulator/main.py (1)

src/agents/business/nova_bot_simulator/config.py (3)

SimulatorConfig (41-79)

get_novabot_config (77-79)

from_env (56-75)

src/shared/__init__.py (4)

src/shared/llm_client.py (1)

LLMClient (184-284)

src/shared/memory.py (1)

ConversationMemory (236-282)

src/shared/noveum_tracer.py (1)

NoveumTracer (16-269)

src/shared/config.py (4)

LLMConfig (15-23)

MemoryConfig (27-33)

NoveumConfig (37-43)

ChatAgentConfig (47-220)

🪛 markdownlint-cli2 (0.18.1)

src/agents/business/nova_bot_simulator/README.md

110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.10)

src/api/agent_registry.py

471-471: PEP 484 prohibits implicit Optional

Convert to Optional[T]

(RUF013)

src/agents/basic/simple-chat-agent/main.py

29-29: Avoid specifying long messages outside the exception class

(TRY003)

src/agents/business/nova_bot/main.py

33-33: Avoid specifying long messages outside the exception class

(TRY003)

261-261: Do not catch blind exception: Exception

(BLE001)

286-288: Avoid specifying long messages outside the exception class

(TRY003)

372-372: Do not catch blind exception: Exception

(BLE001)

455-455: Do not catch blind exception: Exception

(BLE001)

530-530: Prefer TypeError exception for invalid type

(TRY004)

530-530: Avoid specifying long messages outside the exception class

(TRY003)

src/agents/business/nova_bot_simulator/config.py

24-24: Avoid specifying long messages outside the exception class

(TRY003)

src/agents/business/helpdesk-agent/main.py

38-38: Avoid specifying long messages outside the exception class

(TRY003)

src/shared/noveum_tracer.py

237-237: Do not catch blind exception: Exception

(BLE001)

240-240: Unused method argument: trace_id

(ARG002)

240-240: Unused method argument: result_data

(ARG002)

247-247: Do not catch blind exception: Exception

(BLE001)

250-250: Unused method argument: event_data

(ARG002)

src/agents/business/nova_bot_simulator/main.py

32-32: Avoid specifying long messages outside the exception class

(TRY003)

51-51: Avoid specifying long messages outside the exception class

(TRY003)

399-399: Do not catch blind exception: Exception

(BLE001)

430-430: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (28)

src/agents/business/nova_bot/README.md (1)

1-32: LGTM! Clear and comprehensive documentation.

The README provides good coverage of NovaBot's purpose, setup, API routes, and configuration. The environment variables are well-documented with sensible defaults.

src/agents/business/helpdesk-agent/config.py (1)

10-10: LGTM! Import standardization improves maintainability.

The change from dynamic sys.path manipulation to direct absolute import from src.shared.config is a good practice that improves code clarity and maintainability. This aligns with the PR's objective to standardize configuration loading patterns.

src/api/main.py (2)

40-40: LGTM! Metadata field addition.

The optional metadata field allows passing additional context to agents while maintaining backwards compatibility through the default None value.

264-272: LGTM! Clean convenience endpoint for NovaBot.

The dedicated /novabot/chat endpoint provides a cleaner API surface for NovaBot users while properly delegating to the existing chat_with_agent handler. This maintains consistency with the generic agent routing pattern.

src/api/requirements.txt (1)

24-24: Note: noveum-trace is now a required dependency.

The noveum-trace>=0.1.0 dependency is now uncommented/required, making Noveum tracing a mandatory dependency for the API. Ensure this is intentional and won't break deployments where Noveum tracing isn't needed.

src/agents/business/nova_bot/agent_info.yaml (1)

1-32: LGTM! Well-structured agent metadata.

The agent metadata is comprehensive and follows a clear schema. The input/output contracts are well-defined, and the configuration references align with the NovaBot implementation files mentioned in the AI summary.

src/agents/business/nova_bot_simulator/requirements.txt (1)

1-17: LGTM! Dependencies correctly mirror NovaBot requirements.

Since the simulator directly imports NovaBotAgent, it appropriately includes the same runtime dependencies as NovaBot. The comment clarifying that standard library modules are built-in is helpful.

NoveumDocsData/index/metadata.json (1)

1-1017: URL patterns are intentional; no changes needed.

The metadata contains both /en/ localized and non-localized URL patterns because the documentation is indexed under both locale versions (e.g., https://noveum.ai/en/docs/getting-started/sdk-integration and https://noveum.ai/docs/getting-started/sdk-integration represent the same page). The retrieval system correctly extracts and returns URLs as-is from the docs index, treating both patterns as separate indexed documents. This is working as designed.

src/agents/basic/simple-chat-agent/main.py (1)

108-115: LGTM!

The process() method provides a clean endpoint-compatible alias, consistent with the pattern used in other agents (e.g., helpdesk-agent).

src/agents/business/nova_bot/getstarted.md (1)

1-99: Documentation structure looks good.

The guide covers setup, configuration, running the server, and troubleshooting comprehensively.

src/agents/business/helpdesk-agent/main.py (1)

226-240: LGTM!

The chat() method properly extracts the response text from the structured ticket data, with safe fallback handling. The process() alias maintains consistency with the API surface pattern.

src/shared/__init__.py (2)

12-20: LGTM!

The __all__ exports are consistent with the imports, and the addition of ChatAgentConfig provides a clean public API for agent configuration.

7-10: The review comment cannot be verified. A search of the codebase reveals:

No Memory class is currently exported from src/shared

No downstream code imports Memory from src.shared

The current __init__.py exports only ConversationMemory, not a Memory class

The breaking change claim cannot be substantiated based on the current state of the code.

Likely an incorrect or invalid review comment.

src/agents/business/nova_bot_simulator/agent_info.yaml (1)

1-36: LGTM!

The agent metadata is well-structured with clear input/output schema definitions. The configuration aligns with the simulator implementation described in the README.

src/shared/noveum_tracer.py (1)

39-46: SDK initialization correctly follows noveum-trace v1.x pattern.

The change to use noveum_trace.init() with the specified parameters and storing the module reference is consistent with the official v1.x documentation. The defensive hasattr check before calling start_trace() appropriately handles potential method availability.

src/api/agent_registry.py (2)

311-336: Good improvement on sys.path cleanup.

The try/finally pattern for temporarily adding and removing the agent directory from sys.path prevents cross-agent import collisions. This is a solid defensive coding practice.

395-423: Endpoint validation logic is well-structured.

The fallback logic correctly handles cases where process can fall back to chat, and support can fall back to handle_support_request or chat. This aligns with the API layer's fallback behavior mentioned in the comments.

src/agents/business/nova_bot_simulator/config.py (2)

16-28: Dynamic config loading pattern is consistent.

The importlib-based loading approach matches the pattern used elsewhere in the codebase and handles Python 3.12+ dataclass requirements correctly.

40-79: Config structure and from_env implementation look good.

The dataclass correctly separates simulator-specific settings from the underlying NovaBot configuration. The environment variable naming convention (SIMULATOR_*) is clear and consistent.

src/agents/business/nova_bot/main.py (4)

25-42: Dynamic config loading pattern is appropriate for AgentRegistry integration.

The comment on lines 23-24 explains the necessity of this approach for importlib-based loading. The pattern correctly handles Python 3.12+ dataclass requirements.

285-288: Good validation for required LLM API key.

The agent correctly raises a clear error when neither OpenAI nor Gemini API keys are configured, preventing silent failures during chat operations.

302-343: Chat method implementation is well-structured.

The session management with async lock, metadata handling, and response formatting are implemented correctly. The chain invocation properly passes callbacks for tracing.

521-532: Good defensive handling for missing docs file.

Returning an empty list when the docs file doesn't exist allows the agent to operate (with degraded RAG) rather than failing entirely.

src/agents/business/nova_bot_simulator/main.py (5)

24-59: Dynamic loading pattern is consistent with other agent modules.

The importlib-based loading for both SimulatorConfig and NovaBotAgent follows the established pattern, ensuring compatibility with the AgentRegistry's dynamic module loading.

62-104: Data models are well-structured for simulation metrics.

The dataclass hierarchy (InteractionResult → SessionResult → SimulationResults) provides clear separation of concerns for per-message, per-session, and aggregate metrics.

246-298: Comprehensive adversarial test cases.

The adversarial questions include system prompt extraction, security probing, confusion attacks, and edge cases (empty string, XSS, SQL injection attempts). This is valuable for testing bot robustness.

449-458: Parallel execution is correctly implemented.

Using asyncio.gather for parallel execution and sequential loop for non-parallel mode is the appropriate pattern. Session state isolation is handled through unique session IDs.

491-502: Results persistence handles directory creation correctly.

The mkdir(parents=True, exist_ok=True) ensures the output directory exists before writing, preventing FileNotFoundError.

coderabbitai · 2025-12-28T14:12:50Z

+    # Gemini (fallback if OpenAI not set)
+    gemini_api_key: Optional[str] = None
+    gemini_model: str = "gemini-1.5-flash"


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n src/agents/business/nova_bot/config.py | sed -n '36,90p'

Repository: Noveum/agents-library

Length of output: 2399

🏁 Script executed:

rg "gemini-1\.5-flash|gemini-2\.5-flash" --context 3

Repository: Noveum/agents-library

Length of output: 2255

Inconsistent default gemini_model values between dataclass and from_env() method.

Line 38 defaults to "gemini-1.5-flash" but line 88 defaults to "gemini-2.5-flash". This inconsistency causes different behavior depending on instantiation method. Documentation (README.md and getstarted.md) confirms "gemini-1.5-flash" as the intended default, so line 88 should be updated:

- gemini_model=os.getenv("GEMINI_MODEL", "gemini-2.5-flash"), + gemini_model=os.getenv("GEMINI_MODEL", "gemini-1.5-flash"),

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In src/agents/business/nova_bot/config.py lines 36-38 and around line 88, the dataclass default for gemini_model is "gemini-1.5-flash" but the from_env() method sets "gemini-2.5-flash"; update the default in from_env() (around line 88) to "gemini-1.5-flash" so both instantiation paths use the documented intended default.

… README for dataset structure and update requirements for dependencies. Implement NovaBot index building functionality with environment variable checks.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

src/agents/business/nova_bot/README.md (1)

3-3: Minor URL inconsistency with the scraper script.

The README references https://noveum.ai/docs, but the scraper script (scripts/novabot_scrape_docs.py) defaults to https://noveum.ai/en/docs. Consider updating the README to match the actual default start URL for consistency.
scripts/novabot_build_index.py (2)
105-106: Remove redundant file creation before np.save.

The write_bytes(b"") call on line 105 is unnecessary. np.save will create or overwrite the file directly. This empty write adds no value and could be misleading.
🔧 Suggested fix
-    (out_dir / "vectors.npy").write_bytes(b"")  # ensure writable
     np.save(out_dir / "vectors.npy", mat)
31-32: Consider using TypeError for type validation.

Per Python conventions (and Ruff TRY004), TypeError is more appropriate than ValueError when validating that data is a specific type (list).
🔧 Suggested fix
     if not isinstance(data, list):
-        raise ValueError("docs.json must be a list")
+        raise TypeError("docs.json must be a list")
scripts/novabot_scrape_docs.py (2)
156-157: Use collections.deque for efficient queue operations.

queue.pop(0) on a list is O(n). For up to 500 pages, this becomes O(n²) overall. Using deque.popleft() provides O(1) performance.
🔧 Suggested fix
+from collections import deque
 # ...
 
-    queue: List[str] = [start_url]
+    queue = deque([start_url])
     # ...
     while queue and len(pages) < args.max_pages:
-        url = queue.pop(0)
+        url = queue.popleft()
162-167: Consider logging exceptions instead of silently swallowing them.

Catching bare Exception and continuing silently can mask important errors (DNS failures, SSL issues, timeouts). Per Ruff S112/BLE001, at least log the exception for debugging.
🔧 Suggested fix
+import logging
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
         try:
             resp = session.get(url, timeout=args.timeout)
-        except Exception:
+        except Exception as e:
+            logger.warning("Failed to fetch %s: %s", url, e)
             continue

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f516261 and 9ce212f.

📒 Files selected for processing (6)

NoveumDocsData/README.md
NoveumDocsData/index/.gitkeep
scripts/novabot_build_index.py
scripts/novabot_scrape_docs.py
src/agents/business/nova_bot/README.md
src/agents/business/nova_bot/requirements.txt

✅ Files skipped from review due to trivial changes (2)

NoveumDocsData/index/.gitkeep
NoveumDocsData/README.md

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/novabot_scrape_docs.py (1)

scripts/novabot_build_index.py (2)

repo_root (24-26)

main (36-118)

🪛 Ruff (0.14.10)

scripts/novabot_build_index.py

32-32: Prefer TypeError exception for invalid type

(TRY004)

32-32: Avoid specifying long messages outside the exception class

(TRY003)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

scripts/novabot_scrape_docs.py

164-165: try-except-continue detected, consider logging the exception

(S112)

164-164: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (5)

src/agents/business/nova_bot/README.md (1)

1-30: Documentation is clear and comprehensive.

The README effectively documents the NovaBot's architecture, API routes, and environment variables. Good coverage of required vs. optional configuration.

scripts/novabot_build_index.py (1)

36-118: Overall implementation is well-structured.

The script properly validates the API key, handles batch processing, normalizes vectors for cosine similarity, and writes both vectors and metadata. The lazy imports are acceptable for a standalone script.

scripts/novabot_scrape_docs.py (2)

99-123: Chunking logic is correctly implemented.

The overlap handling properly advances through the text while ensuring the loop terminates when end reaches the text length. The logic correctly handles edge cases where text is shorter than max_chars.

126-189: Main function is well-structured with sensible defaults.

The crawler correctly handles URL canonicalization, respects allowed prefixes, implements polite crawling with sleep delays, and outputs structured JSON for downstream processing.

src/agents/business/nova_bot/requirements.txt (1)

1-14: All LangChain and related package versions exist on PyPI and are available for installation.

…ge project name to "NovaBot" and update Gemini model version. Enhance content hashing method in scraping script and add logging for index bounds checking in vector retrieval.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

scripts/novabot_scrape_docs.py (1)
163-168: Log exceptions for better debugging.

The broad exception handling ensures the scraper continues on transient failures, but silent exceptions make debugging difficult when legitimate issues occur (e.g., network configuration, DNS failures, authentication issues).
🔎 Add logging for visibility
+        import logging
+        logger = logging.getLogger(__name__)
+
         try:
             resp = session.get(url, timeout=args.timeout)
-        except Exception:
+        except Exception as e:
+            logger.debug(f"Failed to fetch {url}: {e}")
             continue
Alternatively, if you prefer to keep imports at the top:

Add import logging at line 17, then:
+    logger = logging.getLogger(__name__)
+    
     while queue and len(pages) < args.max_pages:
         url = queue.pop(0)
         if url in seen:
             continue
         seen.add(url)
 
         try:
             resp = session.get(url, timeout=args.timeout)
-        except Exception:
+        except Exception as e:
+            logger.debug(f"Failed to fetch {url}: {e}")
             continue
src/agents/business/nova_bot/main.py (1)
531-542: Consider TypeError for type validation.

Line 540 raises ValueError when docs.json is not a list. Since this is a type validation error rather than a value error, TypeError would be more semantically appropriate.
🔎 Use TypeError for type mismatch
         if not isinstance(data, list):
-            raise ValueError("docs.json must be a list of chunk objects")
+            raise TypeError("docs.json must be a list of chunk objects")

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ce212f and bbf05ad.

📒 Files selected for processing (5)

scripts/novabot_scrape_docs.py
src/agents/business/nova_bot/config.py
src/agents/business/nova_bot/main.py
src/agents/business/nova_bot/requirements.txt
src/api/requirements.txt

🚧 Files skipped from review as they are similar to previous changes (2)

src/api/requirements.txt
src/agents/business/nova_bot/requirements.txt

🧰 Additional context used

🧬 Code graph analysis (3)

scripts/novabot_scrape_docs.py (1)

scripts/novabot_build_index.py (1)

repo_root (24-26)

src/agents/business/nova_bot/main.py (1)

src/agents/business/nova_bot/config.py (2)

NovaBotConfig (21-107)

from_env (78-107)

src/agents/business/nova_bot/config.py (2)

src/agents/business/nova_bot_simulator/config.py (2)

_repo_root (35-37)

from_env (56-75)

src/agents/business/helpdesk-agent/config.py (1)

from_env (45-100)

🪛 Ruff (0.14.10)

scripts/novabot_scrape_docs.py

115-115: Probable use of insecure hash functions in hashlib: md5

(S324)

165-166: try-except-continue detected, consider logging the exception

(S112)

165-165: Do not catch blind exception: Exception

(BLE001)

src/agents/business/nova_bot/main.py

34-34: Avoid specifying long messages outside the exception class

(TRY003)

271-271: Do not catch blind exception: Exception

(BLE001)

296-298: Avoid specifying long messages outside the exception class

(TRY003)

382-382: Do not catch blind exception: Exception

(BLE001)

465-465: Do not catch blind exception: Exception

(BLE001)

540-540: Prefer TypeError exception for invalid type

(TRY004)

540-540: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (3)

scripts/novabot_scrape_docs.py (1)

100-124: LGTM! Content hashing fixed as recommended.

The implementation now uses hashlib.md5 for stable, reproducible content hashing across script runs, which addresses the previous concern about Python's non-deterministic hash(). The MD5 usage here is appropriate since it's for content fingerprinting rather than cryptographic security.

The chunking logic correctly handles overlap and prevents infinite loops with proper boundary checks.

src/agents/business/nova_bot/main.py (1)

160-208: LGTM! Bounds checking properly implemented.

The retriever now includes defensive bounds checking (lines 189-195) with logging when the vector index and docs list are mismatched, which addresses the previously reported IndexError concern. This ensures robustness against stale or corrupted index data.

src/agents/business/nova_bot/config.py (1)

38-38: Gemini model default is valid and consistent. The dataclass default (line 38) and from_env default (line 88) both use "gemini-2.5-flash", which is a valid, officially supported Google Gemini model available on both the Gemini API and Vertex AI. No changes needed.

…ment vectors in index building, and improve system instruction generation to accept dynamic agent names. Update NoveumTracer to support concurrent tracing with context management and enhanced event logging capabilities.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

src/shared/noveum_tracer.py (1)
248-256: Consider removing unused result_data parameter.

The result_data parameter is no longer used in the new context-manager-based implementation. Since this is an internal method, removing it would simplify the signature, though it's a minor code cleanliness issue.
🔎 Optional cleanup
-    async def _end_trace(self, trace_id: str, result_data: Dict[str, Any]):
+    async def _end_trace(self, trace_id: str):
         """End a trace with results."""
         try:
             # Pop the context manager from the dictionary to support concurrent traces
             active = self._active_traces.pop(trace_id, None)
             if active is not None:
                 active.__exit__(None, None, None)
         except Exception as e:
             print(f"⚠️  Failed to end trace: {e}")
Update call sites (lines 105, 119) accordingly:
-            await self._end_trace(trace_id, {
-                "response": response,
-                "duration_seconds": duration,
-                "status": "success",
-                "response_length": len(response) if response else 0
-            })
+            await self._end_trace(trace_id)
scripts/novabot_build_index.py (1)
29-33: Consider using TypeError for type validation.

Using TypeError instead of ValueError would be more semantically correct when validating that the data is a list, as this is a type constraint rather than a value constraint.
🔎 Suggested change
 def load_docs(docs_path: Path) -> List[Dict[str, Any]]:
     data = json.loads(docs_path.read_text(encoding="utf-8"))
     if not isinstance(data, list):
-        raise ValueError("docs.json must be a list")
+        raise TypeError("docs.json must be a list")
     return data

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbf05ad and 906d9b2.

📒 Files selected for processing (3)

scripts/novabot_build_index.py
src/agents/business/nova_bot/main.py
src/shared/noveum_tracer.py

🧰 Additional context used

🧬 Code graph analysis (1)

src/agents/business/nova_bot/main.py (1)

src/agents/business/nova_bot/config.py (2)

NovaBotConfig (21-107)

from_env (78-107)

🪛 Ruff (0.14.10)

scripts/novabot_build_index.py

32-32: Prefer TypeError exception for invalid type

(TRY004)

32-32: Avoid specifying long messages outside the exception class

(TRY003)

63-63: Avoid specifying long messages outside the exception class

(TRY003)

src/agents/business/nova_bot/main.py

34-34: Avoid specifying long messages outside the exception class

(TRY003)

271-271: Do not catch blind exception: Exception

(BLE001)

296-298: Avoid specifying long messages outside the exception class

(TRY003)

382-382: Do not catch blind exception: Exception

(BLE001)

465-465: Do not catch blind exception: Exception

(BLE001)

540-540: Prefer TypeError exception for invalid type

(TRY004)

540-540: Avoid specifying long messages outside the exception class

(TRY003)

src/shared/noveum_tracer.py

245-245: Do not catch blind exception: Exception

(BLE001)

248-248: Unused method argument: result_data

(ARG002)

255-255: Do not catch blind exception: Exception

(BLE001)

270-270: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (11)

src/shared/noveum_tracer.py (3)

29-29: Good addition for concurrent trace support.

The _active_traces dictionary properly enables multiple concurrent traces by keying on trace_id, resolving the race condition identified in previous reviews.

258-280: Event logging now properly implemented.

The _log_event method is now fully functional, addressing the previous review concern. The implementation correctly:

Checks if tracing is enabled

Calls the SDK's log_event method when available

Falls back to trace_event for older SDK versions

Handles both sync and async SDK methods

Uses defensive exception handling to prevent tracing issues from breaking the application

39-47: The noveum-trace SDK v1.x initialization pattern is correct. The code properly calls noveum_trace.init() with the appropriate parameters (api_key, project, environment) as documented in the official SDK docs, and storing the module as self._client to access module-level functions like start_trace(), log_event(), and trace_event() matches the standard API pattern. No changes needed.

scripts/novabot_build_index.py (2)

24-26: LGTM!

The repo_root() helper correctly computes the repository root by navigating up one level from the scripts directory.

100-102: Empty documents case properly handled.

The guard at lines 100-102 correctly addresses the previous review concern, preventing np.vstack(vectors) from failing when no documents are processed. The early exit with status code 1 is appropriate.

src/agents/business/nova_bot/main.py (6)

26-43: Well-designed dynamic config loading.

The dynamic import approach correctly avoids relative import issues when the agent is loaded by AgentRegistry. The comment about sys.modules for Python 3.12+ dataclasses is helpful context. The ImportError on line 34 appropriately fails fast if the config module is missing.

64-110: Agent name mismatch resolved.

The system instruction now correctly accepts agent_name as a parameter and uses it throughout the prompt (lines 66, 69), addressing the previous review concern. The call site at line 408 properly passes self.agent_name from the configuration.

187-207: IndexError risk properly mitigated.

The bounds checking at lines 188-195 correctly addresses the previous review concern. The implementation:

Validates indices before access

Logs warnings with helpful debugging context

Continues processing remaining results

Returns fewer than k results if needed (correct behavior for data integrity)

This defensive approach prevents crashes while providing visibility into index/docs mismatches.

213-246: LGTM! Well-structured initialization.

The initialization sequence is logical and properly ordered:

Configuration and session state

Optional tracing setup

Data loading

LangChain components

Chain assembly

The dynamic provider indication in the success message (line 246) is helpful for debugging.

312-353: LGTM! Robust chat implementation.

The chat method is well-designed with:

Proper input validation

Thread-safe session management with async locks

Insightful comment about not setting global run_name (line 334)

History trimming to prevent unbounded growth

Useful debugging metadata in responses

The implementation correctly handles session lifecycle and concurrent access.

395-506: Excellent chain architecture with good observability.

The chain implementation demonstrates strong design:

Tool wrapper for retrieval (lines 442-447) ensures proper trace spans

Graceful degradation on retrieval errors (lines 465-466)

Clean separation of concerns: prepare → generate → finalize

Insightful comments explaining why automatic span naming is preferred (lines 492-505)

The defensive exception handling during retrieval is appropriate - the bot can still attempt to answer without context.

coderabbitai · 2025-12-30T07:31:57Z

+        if self._client and hasattr(self._client, "start_trace"):
            try:
-                await self._client.start_trace(trace_data)
+                trace_id = trace_data.get("trace_id")
+                if not trace_id:
+                    print("⚠️  No trace_id in trace_data, cannot start trace")
+                    return
+
+                # start_trace returns a ContextualTrace context manager; keep it for later end.
+                ctx = self._client.start_trace(
+                    name=trace_data.get("agent_name", "agent_interaction"),
+                    metadata=trace_data,
+                )
+                # Enter it to set context
+                ctx.__enter__()
+                # Store the context manager keyed by trace_id for concurrent trace support
+                self._active_traces[trace_id] = ctx
            except Exception as e:
                print(f"⚠️  Failed to start trace: {e}")


⚠️ Potential issue | 🟡 Minor

Potential resource leak if exception occurs between enter and dict storage.

If an exception occurs after ctx.__enter__() (line 242) but before storing in self._active_traces[trace_id] (line 244), the context will be entered but never properly exited. While the window is small, consider wrapping the storage in the same try block or using a more robust pattern.

🔎 Suggested defensive approach

async def _start_trace(self, trace_data: Dict[str, Any]): """Start a new trace.""" if self._client and hasattr(self._client, "start_trace"): try: trace_id = trace_data.get("trace_id") if not trace_id: print("⚠️ No trace_id in trace_data, cannot start trace") return # start_trace returns a ContextualTrace context manager; keep it for later end. ctx = self._client.start_trace( name=trace_data.get("agent_name", "agent_interaction"), metadata=trace_data, ) - # Enter it to set context - ctx.__enter__() - # Store the context manager keyed by trace_id for concurrent trace support - self._active_traces[trace_id] = ctx + # Enter and store atomically to prevent leaks + ctx.__enter__() + try: + self._active_traces[trace_id] = ctx + except Exception: + # If storing fails, exit the context to avoid leaks + ctx.__exit__(None, None, None) + raise except Exception as e: print(f"⚠️ Failed to start trace: {e}")

🧰 Tools

🪛 Ruff (0.14.10)

245-245: Do not catch blind exception: Exception

(BLE001)

…ariable dependencies, replacing them with hardcoded defaults. Update NovaBot to support custom Noveum endpoint configuration and enhance session trace management. Modify question generation logic in the simulator for improved topic extraction and question phrasing.

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@src/agents/business/nova_bot_simulator/config.py`:
- Around line 53-56: The from_env classmethod on SimulatorConfig is misleading
because its name implies loading from environment variables while the docstring
and implementation just return defaults; update either the method name or
implementation: either rename from_env to from_defaults (or similar) and adjust
the docstring accordingly, or implement actual environment loading inside
SimulatorConfig.from_env by reading expected env vars and constructing cls(...)
with those values; reference SimulatorConfig.from_env and the SimulatorConfig
class when making the change so callers remain consistent.

In `@src/agents/business/nova_bot_simulator/main.py`:
- Line 345: The print statement in main.py uses an unnecessary f-string: remove
the leading "f" from the print call (the line containing print(f"\n🚀 Starting
simulation...")) since there are no placeholders; change it to a plain string in
the same location (the print statement that starts the simulation).
- Around line 223-237: The __init__ currently hardcodes
self.messages_per_session = 3 instead of using the SimulatorConfig value; update
NovaBotSimulator.__init__ to assign self.messages_per_session =
config.messages_per_session so the SimulatorConfig.messages_per_session field is
honored (ensure SimulatorConfig is the same type expected and handle
missing/None by falling back to a default if needed).

In `@src/agents/business/nova_bot/config.py`:
- Around line 77-79: The from_env method on NovaBotConfig currently just returns
cls() but should either be renamed or actually populate fields from environment
variables; update NovaBotConfig.from_env to mirror the pattern used in
helpdesk-agent config by importing os (or using the existing env helpers) and
reading the expected env keys into the class constructor (e.g., map variables
like NOVA_BOT_API_KEY, NOVA_BOT_MODEL, etc., to the NovaBotConfig fields) and
return cls(...) with those values, or if you choose to rename, change the method
to a clearer name like default() and leave from_env unimplemented/removed to
avoid confusion. Ensure you reference the NovaBotConfig.from_env method and any
field names in the class when implementing the env reads so tests and call sites
continue to work.
- Around line 27-38: The config file currently hardcodes API keys
(noveum_api_key, openai_api_key, gemini_api_key) in NovaBotConfig; change
NovaBotConfig (and its from_env classmethod) to read these values from
environment variables (e.g., NOVEUM_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY)
with empty-string defaults or optional validation, and remove the literal
secret-like defaults from the class attributes so secrets are not stored in
source; optionally make required keys raise a clear error in from_env when
missing.

In `@src/agents/business/nova_bot/main.py`:
- Around line 307-317: The _create_embeddings method currently only checks if
self.config.openai_api_key is truthy, which misses placeholder strings; update
_create_embeddings to treat placeholder keys as invalid by verifying the key is
non-empty and looks like a real OpenAI secret (e.g., startswith "sk-") and/or
does not contain placeholder substrings like "OPENAI" or "YOUR_OPENAI"; only
then instantiate OpenAIEmbeddings with model=self.config.openai_embedding_model
and api_key=self.config.openai_api_key.
- Around line 246-247: The provider-detection uses a truthy check on
self.config.openai_api_key which misidentifies placeholder keys like "sk-***" as
valid; update the logic in the initialization (the llm_provider assignment and
the print that uses self.agent_name) to verify the API key is a real-looking key
(e.g., non-empty and not a placeholder—check it does not contain '*' and
optionally startswith("sk-")) or change the config default to an empty string so
the current truthy check works; adjust the conditional that sets llm_provider
accordingly (refer to self.config.openai_api_key and the llm_provider variable).

🧹 Nitpick comments (4)

src/agents/business/nova_bot_simulator/config.py (2)
14-30: Consider using standard imports instead of dynamic importlib loading.

The dynamic import via importlib.util is unusual when a standard relative import would work. This pattern adds complexity, makes IDE navigation harder, and bypasses Python's normal module resolution.

A simpler approach:
from ..nova_bot.config import NovaBotConfig
If this is intentional to avoid circular imports or handle dynamic loading scenarios (e.g., the AgentRegistry), please add a comment explaining why.

33-35: _repo_root() is duplicated from nova_bot/config.py.

This helper function appears identically in both config files. Consider extracting it to a shared utility module to reduce duplication and ensure consistency.
src/agents/business/nova_bot_simulator/main.py (1)
314-328: Consider logging the exception details for debugging.

The broad except Exception catch is reasonable for simulation resilience, but silently storing only str(e) loses the traceback. Consider logging the full exception for debugging purposes.
🔍 Proposed improvement
+            import logging
+            logger = logging.getLogger(__name__)
             except Exception as e:
+                logger.exception(f"Error in session {session_id} message {msg_idx}")
                 error_count += 1
                 interactions.append(
                     InteractionResult(
src/agents/business/nova_bot/main.py (1)
337-342: Consider logging silenced tracing exceptions.

Multiple try-except-pass blocks (lines 340, 352, 380) silently swallow exceptions. While tracing is optional and shouldn't break the bot, logging these exceptions at DEBUG level would help troubleshoot tracing issues.
🔍 Proposed improvement
+logger = logging.getLogger(__name__)
+
 if session_id and reset:
     async with self._sessions_lock:
         self._sessions.pop(session_id, None)
     if self._noveum_callback and session_id in self._active_traces:
         try:
             self._noveum_callback.end_trace()
         except Exception:
-            pass
+            logger.debug("Failed to end trace on session reset", exc_info=True)
         self._active_traces.pop(session_id, None)
Also applies to: 349-353, 376-382

coderabbitai · 2026-02-03T06:18:19Z

+    @classmethod
+    def from_env(cls) -> "SimulatorConfig":
+        """Load configuration from local defaults."""
+        return cls()


⚠️ Potential issue | 🟡 Minor

from_env() docstring is misleading—it loads defaults, not environment variables.

Same issue as in nova_bot/config.py: the method name and docstring ("Load configuration from local defaults") don't align. The name suggests environment variable loading, but it returns the default instance.

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot_simulator/config.py` around lines 53 - 56, The from_env classmethod on SimulatorConfig is misleading because its name implies loading from environment variables while the docstring and implementation just return defaults; update either the method name or implementation: either rename from_env to from_defaults (or similar) and adjust the docstring accordingly, or implement actual environment loading inside SimulatorConfig.from_env by reading expected env vars and constructing cls(...) with those values; reference SimulatorConfig.from_env and the SimulatorConfig class when making the change so callers remain consistent.

coderabbitai · 2026-02-03T06:18:20Z

+    def __init__(self, config: SimulatorConfig):
+        self.config = config
+        self.agent_name = "NovaBotSimulator"
+        self.messages_per_session = 3
+
+        # Load docs metadata for question generation
+        self.metadata = _load_docs_metadata(config.metadata_json_path)
+        self.topics = _extract_topics_from_metadata(self.metadata)
+        self.normal_questions = _generate_normal_questions(self.topics)
+        self.adversarial_questions = _generate_adversarial_questions()
+
+        print(f"🤖 {self.agent_name} initialized")
+        print(f"   Normal questions: {len(self.normal_questions)}")
+        print(f"   Adversarial questions: {len(self.adversarial_questions)}")
+        print(f"   Sessions: {config.num_sessions}, Messages per session: {self.messages_per_session}")


⚠️ Potential issue | 🟡 Minor

messages_per_session is hardcoded, ignoring the config value.

Line 226 sets self.messages_per_session = 3 but SimulatorConfig has a field messages_per_session (default 3). The config value is never used, making the config field misleading.

🛠️ Proposed fix

def __init__(self, config: SimulatorConfig): self.config = config self.agent_name = "NovaBotSimulator" - self.messages_per_session = 3 + self.messages_per_session = config.messages_per_session

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot_simulator/main.py` around lines 223 - 237, The __init__ currently hardcodes self.messages_per_session = 3 instead of using the SimulatorConfig value; update NovaBotSimulator.__init__ to assign self.messages_per_session = config.messages_per_session so the SimulatorConfig.messages_per_session field is honored (ensure SimulatorConfig is the same type expected and handle missing/None by falling back to a default if needed).

coderabbitai · 2026-02-03T06:18:20Z

+
+    async def run_simulation(self) -> SimulationResults:
+        """Run the complete simulation with all sessions."""
+        print(f"\n🚀 Starting simulation...")


⚠️ Potential issue | 🟡 Minor

Remove extraneous f prefix from string without placeholders.

This f-string has no placeholders, making the f prefix unnecessary.

🛠️ Proposed fix

- print(f"\n🚀 Starting simulation...") + print("\n🚀 Starting simulation...")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

print(f"\n🚀 Starting simulation...")

print("\n🚀 Starting simulation...")

🧰 Tools

🪛 Ruff (0.14.14)

[error] 345-345: f-string without any placeholders

Remove extraneous f prefix

(F541)

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot_simulator/main.py` at line 345, The print statement in main.py uses an unnecessary f-string: remove the leading "f" from the print call (the line containing print(f"\n🚀 Starting simulation...")) since there are no placeholders; change it to a plain string in the same location (the print statement that starts the simulation).

coderabbitai · 2026-02-03T06:18:20Z

+    noveum_api_key: str = "*****-xC"
+    noveum_project: str = "NovaBot-demo"
+    noveum_environment: str = "dev-Novabot"
+    noveum_endpoint: str = ""
+
+    # LLM provider (answer generation) - supports OpenAI or Gemini
+    openai_api_key: str = "sk-***"
+    openai_model: str = "gpt-4o-mini"
+
+    # Gemini (fallback if OpenAI not set)
+    gemini_api_key: str = "******-"
+    gemini_model: str = "gemini-2.5-flash"


⚠️ Potential issue | 🟠 Major

Hardcoded API keys in source code pose a security risk.

Lines 27, 33, and 37 contain hardcoded API key values. Even if these are placeholder/redacted values, committing secrets (or patterns resembling secrets) to source control is a security anti-pattern:

They may be accidentally replaced with real keys

They establish a pattern that encourages storing secrets in code

Static analysis tools and secret scanners will flag these

Consider using environment variables with empty string defaults, or raise an error prompting the user to set required keys.

🔒 Proposed fix to use environment variables

+import os + `@dataclass` class NovaBotConfig: """Runtime configuration for NovaBot.""" agent_name: str = "NovaBot" # Noveum tracing (LangChain integration) noveum_enabled: bool = True - noveum_api_key: str = "*****-xC" + noveum_api_key: str = "" noveum_project: str = "NovaBot-demo" noveum_environment: str = "dev-Novabot" noveum_endpoint: str = "" # LLM provider (answer generation) - supports OpenAI or Gemini - openai_api_key: str = "sk-***" + openai_api_key: str = "" openai_model: str = "gpt-4o-mini" # Gemini (fallback if OpenAI not set) - gemini_api_key: str = "******-" + gemini_api_key: str = "" gemini_model: str = "gemini-2.5-flash"

Then in from_env():

`@classmethod` def from_env(cls) -> "NovaBotConfig": return cls( noveum_api_key=os.getenv("NOVEUM_API_KEY", ""), openai_api_key=os.getenv("OPENAI_API_KEY", ""), gemini_api_key=os.getenv("GEMINI_API_KEY", ""), # ... other env-driven fields )

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot/config.py` around lines 27 - 38, The config file currently hardcodes API keys (noveum_api_key, openai_api_key, gemini_api_key) in NovaBotConfig; change NovaBotConfig (and its from_env classmethod) to read these values from environment variables (e.g., NOVEUM_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY) with empty-string defaults or optional validation, and remove the literal secret-like defaults from the class attributes so secrets are not stored in source; optionally make required keys raise a clear error in from_env when missing.

coderabbitai · 2026-02-03T06:18:20Z

+    @classmethod
+    def from_env(cls) -> "NovaBotConfig":
+        return cls()


⚠️ Potential issue | 🟡 Minor

from_env() method is misleading—it doesn't read environment variables.

The method name and the pattern from other config files (e.g., src/agents/business/helpdesk-agent/config.py) suggests this should load configuration from environment variables. Currently it just returns the default instance, making from_env() functionally identical to cls().

Either rename the method to clarify its behavior, or implement actual environment variable loading to match the established pattern in this codebase.

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot/config.py` around lines 77 - 79, The from_env method on NovaBotConfig currently just returns cls() but should either be renamed or actually populate fields from environment variables; update NovaBotConfig.from_env to mirror the pattern used in helpdesk-agent config by importing os (or using the existing env helpers) and reading the expected env keys into the class constructor (e.g., map variables like NOVA_BOT_API_KEY, NOVA_BOT_MODEL, etc., to the NovaBotConfig fields) and return cls(...) with those values, or if you choose to rename, change the method to a clearer name like default() and leave from_env unimplemented/removed to avoid confusion. Ensure you reference the NovaBotConfig.from_env method and any field names in the class when implementing the env reads so tests and call sites continue to work.

coderabbitai · 2026-02-03T06:18:20Z

+        llm_provider = "OpenAI" if self.config.openai_api_key else "Gemini"
+        print(f"🛰️  {self.agent_name} initialized (LangChain-only {llm_provider} + OpenAIEmbeddings RAG)")


⚠️ Potential issue | 🟠 Major

Provider detection will incorrectly select OpenAI when using placeholder keys.

Line 246 checks if self.config.openai_api_key to determine the LLM provider. However, the config defaults include placeholder values like "sk-***" which are truthy, causing OpenAI to always be selected even when the user intends to use Gemini.

This ties into the earlier security concern—if you use empty strings as defaults for API keys, this logic would work correctly.

🛠️ Proposed fix

- llm_provider = "OpenAI" if self.config.openai_api_key else "Gemini" + # Check for actual API keys, not placeholder values + has_openai = self.config.openai_api_key and not self.config.openai_api_key.startswith("sk-***") + llm_provider = "OpenAI" if has_openai else "Gemini"

Or better, use empty string defaults in config and this check works as-is.

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot/main.py` around lines 246 - 247, The provider-detection uses a truthy check on self.config.openai_api_key which misidentifies placeholder keys like "sk-***" as valid; update the logic in the initialization (the llm_provider assignment and the print that uses self.agent_name) to verify the API key is a real-looking key (e.g., non-empty and not a placeholder—check it does not contain '*' and optionally startswith("sk-")) or change the config default to an empty string so the current truthy check works; adjust the conditional that sets llm_provider accordingly (refer to self.config.openai_api_key and the llm_provider variable).

coderabbitai · 2026-02-03T06:18:20Z

+    def _create_llm(self):
+        # Prefer OpenAI if available, otherwise fall back to Gemini
+        if self.config.openai_api_key:
+            from langchain_openai import ChatOpenAI
+
+            return ChatOpenAI(
+                model=self.config.openai_model,
+                api_key=self.config.openai_api_key,
+                temperature=self.config.temperature,
+                max_tokens=self.config.max_output_tokens,
+            )
+        elif self.config.gemini_api_key:
+            from langchain_google_genai import ChatGoogleGenerativeAI
+
+            return ChatGoogleGenerativeAI(
+                model=self.config.gemini_model,
+                google_api_key=self.config.gemini_api_key,
+                temperature=self.config.temperature,
+                max_output_tokens=self.config.max_output_tokens,
+            )
+        else:
+            raise ValueError(
+                "Missing LLM API key. Set either OPENAI_API_KEY or GEMINI_API_KEY."
+            )


⚠️ Potential issue | 🟠 Major

Same provider detection issue in _create_llm().

The truthy check on self.config.openai_api_key (line 284) will always pass with placeholder values, causing Gemini to never be used as fallback. The error on line 303-305 will also never trigger since the placeholder key is truthy.

🧰 Tools

🪛 Ruff (0.14.14)

[warning] 303-305: Avoid specifying long messages outside the exception class

(TRY003)

coderabbitai · 2026-02-03T06:18:20Z

+    def _create_embeddings(self):
+        if not self.config.openai_api_key:
+            # Allow running without retrieval (RAG will return empty context)
+            return None
+
+        from langchain_openai import OpenAIEmbeddings
+
+        return OpenAIEmbeddings(
+            model=self.config.openai_embedding_model,
+            api_key=self.config.openai_api_key,
+        )


⚠️ Potential issue | 🟠 Major

Embeddings creation also affected by placeholder key issue.

Line 308's check if not self.config.openai_api_key will never be true with placeholder values, so embeddings will always be created even when the key is invalid.

🤖 Prompt for AI Agents

In `@src/agents/business/nova_bot/main.py` around lines 307 - 317, The _create_embeddings method currently only checks if self.config.openai_api_key is truthy, which misses placeholder strings; update _create_embeddings to treat placeholder keys as invalid by verifying the key is non-empty and looks like a real OpenAI secret (e.g., startswith "sk-") and/or does not contain placeholder substrings like "OPENAI" or "YOUR_OPENAI"; only then instantiate OpenAIEmbeddings with model=self.config.openai_embedding_model and api_key=self.config.openai_api_key.

Add NovaBot and NovaBot Simulator agents with configuration, API endp…

f516261

…oints, and documentation. Introduced metadata and vector files for enhanced functionality. Updated shared components for improved modularity and tracing integration.

mramanindia requested a review from imshashank December 28, 2025 14:07

coderabbitai Bot reviewed Dec 28, 2025

View reviewed changes

Add scripts for scraping and indexing Noveum documentation. Introduce…

9ce212f

… README for dataset structure and update requirements for dependencies. Implement NovaBot index building functionality with environment variable checks.

coderabbitai Bot reviewed Dec 29, 2025

View reviewed changes

Comment thread scripts/novabot_build_index.py

Comment thread scripts/novabot_scrape_docs.py Outdated

Comment thread src/agents/business/nova_bot/requirements.txt Outdated

Update NovaBot configuration and improve scraping functionality. Chan…

bbf05ad

…ge project name to "NovaBot" and update Gemini model version. Enhance content hashing method in scraping script and add logging for index bounds checking in vector retrieval.

coderabbitai Bot reviewed Dec 30, 2025

View reviewed changes

Comment thread src/agents/business/nova_bot/main.py Outdated

coderabbitai Bot reviewed Dec 30, 2025

View reviewed changes

coderabbitai Bot reviewed Feb 3, 2026

View reviewed changes

	print(f"\n🚀 Starting simulation...")
	print("\n🚀 Starting simulation...")

		llm_provider = "OpenAI" if self.config.openai_api_key else "Gemini"
		print(f"🛰️ {self.agent_name} initialized (LangChain-only {llm_provider} + OpenAIEmbeddings RAG)")

Conversation

mramanindia commented Dec 28, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mramanindia commented Dec 28, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Dec 28, 2025 •

edited

Loading