Feat/ai chatbot integration#112
Conversation
|
@pranavshankar1221 is attempting to deploy a commit to the karan3431's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning
|
| Layer / File(s) | Summary |
|---|---|
TF-IDF RAG retriever backend/rag_retriever.py |
Adds STOP_WORDS, tokenize(), RAGChunk, and RAGRetriever that load README.md/DOCUMENTATION.md, split them into header- and paragraph-based chunks, compute per-chunk TF and corpus IDF, score queries via IDF-weighted dot product with token-overlap fallback, and expose a lazy singleton get_retriever(). |
Multi-provider LLM abstraction backend/llm_provider.py |
Adds LLMProvider base class and GeminiProvider, OpenAIProvider, ClaudeProvider, OllamaProvider, and MockProvider implementations, each issuing httpx requests with history-aware payloads and mapping errors to RuntimeError. get_llm_provider() selects from LLM_PROVIDER env, falling back to MockProvider. |
SQLite chat log persistence backend/chat_logger.py |
Adds init_db(), log_chat_message(), and update_chat_feedback() backed by a local SQLite file in data/, with try/commit/finally connection handling for all write operations. |
Chat router and main.py wiring backend/chat_router.py, backend/main.py |
Adds /api/v1/chat/message (RAG retrieval, prompt construction, LLM call, UUID response, logging) and /api/v1/chat/feedback (feedback validation, 400/500 error mapping). Wires chat_router into main.py alongside SlowAPIMiddleware, built-in 429 handler, removal of the custom RateLimitExceeded handler, and dotenv override=True. |
ChatAssistant HUD, API client, and Layout mount src/components/ChatAssistant.tsx, src/lib/api.ts, src/components/Layout.tsx |
Adds the floating ChatAssistant component with message state, page-context derivation, formatText() markdown renderer, feedback controls, and suggested prompts. Extends api.ts with chatMessage and submitChatFeedback. Mounts ChatAssistant in Layout for all pages. |
Config and housekeeping
| Layer / File(s) | Summary |
|---|---|
Env example and TODO backend/.env.example, TODO.md |
Adds LLM_PROVIDER, GEMINI_API_KEY, OPENAI_API_KEY, CLAUDE_API_KEY, and OLLAMA_BASE_URL placeholders to .env.example (note: CORS_ALLOW_ALL appears twice). Adds a backend error resolution plan to TODO.md. |
Sequence Diagram
sequenceDiagram
participant User
participant ChatAssistant
participant ChatRouter
participant RAGRetriever
participant LLMProvider
participant ChatLogger
User->>ChatAssistant: submit question
ChatAssistant->>ChatRouter: POST /api/v1/chat/message {question, page, feature, history}
ChatRouter->>RAGRetriever: retrieve_relevant_context(question)
RAGRetriever-->>ChatRouter: top-N context chunks
ChatRouter->>LLMProvider: generate_response(system_prompt, user_prompt, history)
LLMProvider-->>ChatRouter: assistant text
ChatRouter->>ChatLogger: log_chat_message(msg_id, question, response)
ChatLogger-->>ChatRouter: ok
ChatRouter-->>ChatAssistant: {message_id, response}
ChatAssistant-->>User: render assistant message + feedback controls
User->>ChatAssistant: click thumbs up/down
ChatAssistant->>ChatRouter: POST /api/v1/chat/feedback {message_id, feedback}
ChatRouter->>ChatLogger: update_chat_feedback(msg_id, feedback)
ChatRouter-->>ChatAssistant: {success: true}
Estimated code review effort
🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
- jpdevhub/FreshScanAi#81: Touches the same Turnstile-enabled
/api/v1/auth/login/googleendpoint andsrc/lib/api.tsloginUrlmethod that this PR modifies in the rate-limiting and API client refactor.
Suggested reviewers
- jpdevhub
Poem
🐇 Hippity hoppity, questions arise,
A chat box now floats before your eyes!
TF-IDF sniffs the docs with care,
Gemini, Claude, or Ollama — pick your hare.
Feedback buttons: thumbs up or down,
The rabbit builds your AI town! 🏙️
🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | Docstring coverage is 34.78% which is insufficient. The required threshold is 80.00%. | Write docstrings for the functions missing them to satisfy the coverage threshold. |
✅ Passed checks (4 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title check | ✅ Passed | The title 'Feat/ai chatbot integration' clearly and directly summarizes the main change: adding AI chatbot functionality to the application. |
| Linked Issues check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Out of Scope Changes check | ✅ Passed | Check skipped because no linked issues were found for this pull request. |
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/greetings.yml:
- Line 1: The workflow file has a leading space before the `name:` key
declaration on line 1, which causes YAML parsing to fail and prevents actionlint
from recognizing the workflow structure. Remove the leading whitespace so that
`name:` starts at column 0 (no indentation). This will ensure the YAML is
properly parsed and the workflow is recognized with all required sections
intact.
In `@backend/.env.example`:
- Line 33: Remove the stray merge artifact (`=========`) from line 33 in the
backend/.env.example file. This string is not valid dotenv syntax and will cause
parsing errors when the template is copied to a .env file. Simply delete the
entire line containing only the equals signs.
In `@backend/chat_logger.py`:
- Around line 42-54: The exception handler in the database insert operation for
chat_logs is catching exceptions but only printing them without re-raising,
which prevents upstream code from detecting write failures. In the except block
that catches Exception during the INSERT statement execution, after logging the
error with print, you must re-raise the exception using a bare raise statement
so that upstream handlers (such as feedback and message APIs) can properly
detect and handle the persistence failure instead of incorrectly reporting
success.
- Around line 63-71: The UPDATE statement for the chat_logs table in the
feedback update section does not validate whether the update actually affected
any rows. After executing the UPDATE query with cursor.execute using the msg_id
parameter, add a check for cursor.rowcount to verify that at least one row was
updated. If cursor.rowcount is 0, it indicates the message ID does not exist and
should be treated as a failed update by raising an appropriate exception (such
as a custom feedback update error or ValueError) before calling conn.commit() to
prevent silent failures when given nonexistent message IDs.
In `@backend/llm_provider.py`:
- Around line 33-37: The synchronous httpx.post calls in the provider
implementations block the async event loop when invoked from the async endpoint
in chat_router.py. Additionally, GeminiProvider exposes credentials in exception
messages by embedding the API key in the query string. To fix this: (1) Convert
the generate_response method definition and all provider-specific method
implementations to async by adding the async keyword, (2) Replace all
synchronous httpx.post calls with await httpx.AsyncClient().post calls at the
affected locations (lines 72, 115, 164, 207 and corresponding sibling sites),
(3) In GeminiProvider, move the API key from the query string parameter to the
Authorization header, and (4) Update the call site in chat_router.py:117 to
await the provider.generate_response call.
- Around line 38-42: Remove sensitive credential exposure from exception logging
across all LLM provider classes. At the exception handling blocks in
GeminiProvider (lines 77-79), OpenAIProvider (lines 122-124), ClaudeProvider
(lines 171-173), and OllamaProvider (lines 214-216), replace the logging of raw
exception objects with sanitized error messages. Avoid logging the complete
exception object or passing the exception directly to logger calls, as this may
expose the API key embedded in request URLs or headers. Instead, extract and log
only the error message string while ensuring the API key and other sensitive
data are not included in the logged output.
In `@backend/main.py`:
- Around line 10-21: The Turnstile protection is vulnerable because the import
of TURNSTILE_SECRET_KEY from the turnstile module happens before the .env file
is loaded, causing the constant to be imported with a falsy value in .env-driven
setups. Move the load_dotenv call (currently inside the try block) to execute
before the import statement for TURNSTILE_SECRET_KEY so that the environment
variable is available when the constant is imported. Additionally, review the
verification logic around lines 350-353 to ensure it fails closed (rejects
requests) rather than failing open (skips verification) when
TURNSTILE_SECRET_KEY is missing or falsy.
In `@backend/test_auth.py`:
- Around line 231-237: The Turnstile and rate-limit tests are non-assertive and
only log unexpected behavior instead of failing. In the validation response
check (lines 231-237), replace the else clause that logs unexpected status codes
with an assertion that fails the test if the status is not 400 or 502. In the
rate-limit test sections (lines 297-306), add explicit assertions to verify that
a 429 status code is actually returned when expected, and ensure the test fails
if rate limiting doesn't occur as required by the 5/minute policy. Replace
logging-only patterns with proper test assertions so that regressions in
Turnstile token validation and rate limit behavior are caught.
In `@README.md`:
- Around line 117-120: The README.md file documents required Turnstile
environment variables (VITE_TURNSTILE_SITE_KEY for the frontend and
TURNSTILE_SECRET_KEY for the backend) in the note section, but these variables
are missing from the production environment variable configuration tables that
appear later in the document. Add VITE_TURNSTILE_SITE_KEY to the frontend/Vercel
environment variables table and TURNSTILE_SECRET_KEY to the backend secrets
table to ensure consistency and prevent deployment issues.
In `@src/components/ChatAssistant.tsx`:
- Around line 187-192: Icon-only controls and chat input elements lack explicit
accessible names for screen readers. Add aria-label attributes to provide
descriptive names that convey the purpose of each element. In
src/components/ChatAssistant.tsx at line 187-192, add an aria-label to the close
button with the X icon describing its function (e.g., "Close chat"). Similarly,
apply the same fix to the icon controls and input fields at lines 255-278 and
lines 317-324, ensuring each interactive element or input has a descriptive
aria-label that explains its purpose to screen reader users.
In `@src/lib/useTurnstile.ts`:
- Around line 124-128: The execute() method call in the Promise handler can
leave previous promises unresolved when called concurrently. Before overwriting
pendingRef.current with new resolve and reject callbacks, check if there is
already a pending operation stored in pendingRef.current and handle it
appropriately (either reject it or prevent the duplicate call). Additionally,
the optional chaining on window.turnstile?.execute() can silently fail without
throwing an exception, leaving the promise hanging indefinitely. After the try
block, verify that window.turnstile is actually defined and that execute() was
successfully called; if either condition fails, reject the promise immediately
within the catch block or with an explicit check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f01bae95-b3c0-459d-94d7-8a30cf8991e7
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (19)
.env.example.github/workflows/greetings.yml.gitignoreREADME.mdTODO.mdbackend/.env.examplebackend/chat_logger.pybackend/chat_router.pybackend/llm_provider.pybackend/main.pybackend/rag_retriever.pybackend/requirements-base.txtbackend/test_auth.pybackend/turnstile.pysrc/components/ChatAssistant.tsxsrc/components/Layout.tsxsrc/lib/api.tssrc/lib/useTurnstile.tssrc/pages/AuthPage.tsx
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/greetings.yml:
- Line 1: The workflow file has a leading space before the `name:` key
declaration on line 1, which causes YAML parsing to fail and prevents actionlint
from recognizing the workflow structure. Remove the leading whitespace so that
`name:` starts at column 0 (no indentation). This will ensure the YAML is
properly parsed and the workflow is recognized with all required sections
intact.
In `@backend/.env.example`:
- Line 33: Remove the stray merge artifact (`=========`) from line 33 in the
backend/.env.example file. This string is not valid dotenv syntax and will cause
parsing errors when the template is copied to a .env file. Simply delete the
entire line containing only the equals signs.
In `@backend/chat_logger.py`:
- Around line 42-54: The exception handler in the database insert operation for
chat_logs is catching exceptions but only printing them without re-raising,
which prevents upstream code from detecting write failures. In the except block
that catches Exception during the INSERT statement execution, after logging the
error with print, you must re-raise the exception using a bare raise statement
so that upstream handlers (such as feedback and message APIs) can properly
detect and handle the persistence failure instead of incorrectly reporting
success.
- Around line 63-71: The UPDATE statement for the chat_logs table in the
feedback update section does not validate whether the update actually affected
any rows. After executing the UPDATE query with cursor.execute using the msg_id
parameter, add a check for cursor.rowcount to verify that at least one row was
updated. If cursor.rowcount is 0, it indicates the message ID does not exist and
should be treated as a failed update by raising an appropriate exception (such
as a custom feedback update error or ValueError) before calling conn.commit() to
prevent silent failures when given nonexistent message IDs.
In `@backend/llm_provider.py`:
- Around line 33-37: The synchronous httpx.post calls in the provider
implementations block the async event loop when invoked from the async endpoint
in chat_router.py. Additionally, GeminiProvider exposes credentials in exception
messages by embedding the API key in the query string. To fix this: (1) Convert
the generate_response method definition and all provider-specific method
implementations to async by adding the async keyword, (2) Replace all
synchronous httpx.post calls with await httpx.AsyncClient().post calls at the
affected locations (lines 72, 115, 164, 207 and corresponding sibling sites),
(3) In GeminiProvider, move the API key from the query string parameter to the
Authorization header, and (4) Update the call site in chat_router.py:117 to
await the provider.generate_response call.
- Around line 38-42: Remove sensitive credential exposure from exception logging
across all LLM provider classes. At the exception handling blocks in
GeminiProvider (lines 77-79), OpenAIProvider (lines 122-124), ClaudeProvider
(lines 171-173), and OllamaProvider (lines 214-216), replace the logging of raw
exception objects with sanitized error messages. Avoid logging the complete
exception object or passing the exception directly to logger calls, as this may
expose the API key embedded in request URLs or headers. Instead, extract and log
only the error message string while ensuring the API key and other sensitive
data are not included in the logged output.
In `@backend/main.py`:
- Around line 10-21: The Turnstile protection is vulnerable because the import
of TURNSTILE_SECRET_KEY from the turnstile module happens before the .env file
is loaded, causing the constant to be imported with a falsy value in .env-driven
setups. Move the load_dotenv call (currently inside the try block) to execute
before the import statement for TURNSTILE_SECRET_KEY so that the environment
variable is available when the constant is imported. Additionally, review the
verification logic around lines 350-353 to ensure it fails closed (rejects
requests) rather than failing open (skips verification) when
TURNSTILE_SECRET_KEY is missing or falsy.
In `@backend/test_auth.py`:
- Around line 231-237: The Turnstile and rate-limit tests are non-assertive and
only log unexpected behavior instead of failing. In the validation response
check (lines 231-237), replace the else clause that logs unexpected status codes
with an assertion that fails the test if the status is not 400 or 502. In the
rate-limit test sections (lines 297-306), add explicit assertions to verify that
a 429 status code is actually returned when expected, and ensure the test fails
if rate limiting doesn't occur as required by the 5/minute policy. Replace
logging-only patterns with proper test assertions so that regressions in
Turnstile token validation and rate limit behavior are caught.
In `@README.md`:
- Around line 117-120: The README.md file documents required Turnstile
environment variables (VITE_TURNSTILE_SITE_KEY for the frontend and
TURNSTILE_SECRET_KEY for the backend) in the note section, but these variables
are missing from the production environment variable configuration tables that
appear later in the document. Add VITE_TURNSTILE_SITE_KEY to the frontend/Vercel
environment variables table and TURNSTILE_SECRET_KEY to the backend secrets
table to ensure consistency and prevent deployment issues.
In `@src/components/ChatAssistant.tsx`:
- Around line 187-192: Icon-only controls and chat input elements lack explicit
accessible names for screen readers. Add aria-label attributes to provide
descriptive names that convey the purpose of each element. In
src/components/ChatAssistant.tsx at line 187-192, add an aria-label to the close
button with the X icon describing its function (e.g., "Close chat"). Similarly,
apply the same fix to the icon controls and input fields at lines 255-278 and
lines 317-324, ensuring each interactive element or input has a descriptive
aria-label that explains its purpose to screen reader users.
In `@src/lib/useTurnstile.ts`:
- Around line 124-128: The execute() method call in the Promise handler can
leave previous promises unresolved when called concurrently. Before overwriting
pendingRef.current with new resolve and reject callbacks, check if there is
already a pending operation stored in pendingRef.current and handle it
appropriately (either reject it or prevent the duplicate call). Additionally,
the optional chaining on window.turnstile?.execute() can silently fail without
throwing an exception, leaving the promise hanging indefinitely. After the try
block, verify that window.turnstile is actually defined and that execute() was
successfully called; if either condition fails, reject the promise immediately
within the catch block or with an explicit check.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f01bae95-b3c0-459d-94d7-8a30cf8991e7
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (19)
.env.example.github/workflows/greetings.yml.gitignoreREADME.mdTODO.mdbackend/.env.examplebackend/chat_logger.pybackend/chat_router.pybackend/llm_provider.pybackend/main.pybackend/rag_retriever.pybackend/requirements-base.txtbackend/test_auth.pybackend/turnstile.pysrc/components/ChatAssistant.tsxsrc/components/Layout.tsxsrc/lib/api.tssrc/lib/useTurnstile.tssrc/pages/AuthPage.tsx
🛑 Comments failed to post (11)
.github/workflows/greetings.yml (1)
1-1:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
#!/bin/bash set -euo pipefail actionlint .github/workflows/greetings.ymlRepository: jpdevhub/FreshScanAi
Length of output: 314
Remove the leading space before
name:on line 1.The workflow header is indented, causing actionlint to report missing
onandjobssections. This breaks YAML parsing and prevents the workflow from being recognized. Movename:back to column 0.🧰 Tools
🪛 actionlint (1.7.12)
[error] 1-1: "jobs" section is missing in workflow
(syntax-check)
[error] 1-1: "on" section is missing in workflow
(syntax-check)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/greetings.yml at line 1, The workflow file has a leading space before the `name:` key declaration on line 1, which causes YAML parsing to fail and prevents actionlint from recognizing the workflow structure. Remove the leading whitespace so that `name:` starts at column 0 (no indentation). This will ensure the YAML is properly parsed and the workflow is recognized with all required sections intact.Source: Linters/SAST tools
backend/.env.example (1)
33-33:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the stray merge artifact from the env template.
Line 33 (
=========) is invalid in dotenv syntax and can break parsing when this file is copied to.env.Suggested fix
-=========📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.🧰 Tools
🪛 dotenv-linter (4.0.0)
[warning] 33-33: [LeadingCharacter] Invalid leading character detected
(LeadingCharacter)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/.env.example` at line 33, Remove the stray merge artifact (`=========`) from line 33 in the backend/.env.example file. This string is not valid dotenv syntax and will cause parsing errors when the template is copied to a .env file. Simply delete the entire line containing only the equals signs.Source: Linters/SAST tools
backend/chat_logger.py (2)
42-54:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winRe-raise DB write failures instead of swallowing them.
Line 52 and Line 73 print errors and return normally, so upstream handlers can’t detect write failure (downstream effect: feedback/message APIs can report success when persistence failed).
Proposed fix
+import logging import sqlite3 from pathlib import Path from datetime import datetime, timezone +logger = logging.getLogger("freshscan.chat_logger") ... - except Exception as e: - print(f"ChatLogger Error logging message: {e}") + except Exception: + logger.exception("ChatLogger failed to insert chat message") + raise ... - except Exception as e: - print(f"ChatLogger Error updating feedback: {e}") + except Exception: + logger.exception("ChatLogger failed to update chat feedback") + raiseAlso applies to: 62-75
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/chat_logger.py` around lines 42 - 54, The exception handler in the database insert operation for chat_logs is catching exceptions but only printing them without re-raising, which prevents upstream code from detecting write failures. In the except block that catches Exception during the INSERT statement execution, after logging the error with print, you must re-raise the exception using a bare raise statement so that upstream handlers (such as feedback and message APIs) can properly detect and handle the persistence failure instead of incorrectly reporting success.
63-71:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat unknown message IDs as a failed feedback update.
Line 63-71 does not validate
cursor.rowcount, so a nonexistentidis silently treated as success.Proposed fix
cursor.execute( """ UPDATE chat_logs SET feedback = ? WHERE id = ? """, (feedback, msg_id) ) + if cursor.rowcount == 0: + raise LookupError(f"chat_logs row not found for id={msg_id}") conn.commit()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.cursor.execute( """ UPDATE chat_logs SET feedback = ? WHERE id = ? """, (feedback, msg_id) ) if cursor.rowcount == 0: raise LookupError(f"chat_logs row not found for id={msg_id}") conn.commit()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/chat_logger.py` around lines 63 - 71, The UPDATE statement for the chat_logs table in the feedback update section does not validate whether the update actually affected any rows. After executing the UPDATE query with cursor.execute using the msg_id parameter, add a check for cursor.rowcount to verify that at least one row was updated. If cursor.rowcount is 0, it indicates the message ID does not exist and should be treated as a failed update by raising an appropriate exception (such as a custom feedback update error or ValueError) before calling conn.commit() to prevent silent failures when given nonexistent message IDs.backend/llm_provider.py (2)
33-37:
⚠️ Potential issue | 🔴 Critical🧩 Analysis chain
🏁 Script executed:
find . -name "llm_provider.py" -o -name "chat_router.py"Repository: jpdevhub/FreshScanAi
Length of output: 115
🏁 Script executed:
# Check line count and content of llm_provider.py wc -l backend/llm_provider.py head -250 backend/llm_provider.pyRepository: jpdevhub/FreshScanAi
Length of output: 7753
🏁 Script executed:
# Check chat_router.py wc -l backend/chat_router.py cat -n backend/chat_router.py | head -150Repository: jpdevhub/FreshScanAi
Length of output: 7335
Convert provider HTTP methods to async to prevent blocking the async request event loop.
Lines 72, 115, 164, and 207 use synchronous
httpx.post(...)calls in methods invoked from the async endpoint atchat_router.py:117, which blocks concurrent request handling under load.Additionally,
GeminiProviderat line 36–39 embeds the API key as a query string parameter. If an exception occurs during the request (line 81–82), the full exception object—including the URL with the embedded key—is logged and propagated, risking credential exposure.Suggested direction
class LLMProvider: - def generate_response( + async def generate_response( self, system_prompt: str, prompt: str, history: List[Dict[str, str]] = None, - ) -> str: + ) -> str: raise NotImplementedError(...)- import httpx - ... - response = httpx.post(url, json=payload, timeout=30.0) + import httpx + ... + async with httpx.AsyncClient(timeout=30.0) as client: + response = await client.post(url, json=payload)Update
chat_router.py:117toawait provider.generate_response(...).For Gemini, pass the API key in the
Authorizationheader instead of the query string to avoid exposure in exception messages.Also applies to: 71–73, 87–92, 114–117, 134–139, 163–166, 181–186, 206–209
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/llm_provider.py` around lines 33 - 37, The synchronous httpx.post calls in the provider implementations block the async event loop when invoked from the async endpoint in chat_router.py. Additionally, GeminiProvider exposes credentials in exception messages by embedding the API key in the query string. To fix this: (1) Convert the generate_response method definition and all provider-specific method implementations to async by adding the async keyword, (2) Replace all synchronous httpx.post calls with await httpx.AsyncClient().post calls at the affected locations (lines 72, 115, 164, 207 and corresponding sibling sites), (3) In GeminiProvider, move the API key from the query string parameter to the Authorization header, and (4) Update the call site in chat_router.py:117 to await the provider.generate_response call.
38-42:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🏁 Script executed:
find . -name "llm_provider.py" -type fRepository: jpdevhub/FreshScanAi
Length of output: 90
🏁 Script executed:
cat -n backend/llm_provider.pyRepository: jpdevhub/FreshScanAi
Length of output: 13208
🌐 Web query:
httpx exception messages include URL logging security💡 Result:
The HTTPX library has historically faced security concerns regarding the logging of full URLs, which can include sensitive credentials like usernames, passwords, or API tokens when they are embedded directly in the URL string [1][2][3]. While HTTPX has taken steps to sanitize sensitive information in its internal representation (such as obfuscating credentials in the repr method of URL and Header classes [4]), users have reported that these credentials can still leak into application logs and exception messages [1][2][3][5]. Key aspects of this issue include: 1. Logging Behavior: By default, HTTPX may log the full URL of HTTP requests at the INFO level [1][3]. If a URL contains embedded credentials (e.g., https://username:password@example.com), those credentials appear in plain text in the logs [1][2][3]. 2. Exception Messages: Exception objects, particularly HTTPStatusError, often include the request URL in their string representation [6][7]. If an application logs the full exception message (e.g., using logger.exception or printing the exception object), sensitive information contained in the URL may be exposed [2][5]. 3. Mitigation Strategies: Because there is no universal built-in redaction for all exception messages and logs in the library, developers are encouraged to implement manual mitigations [2][3][5]: - Adjust Log Levels: You can limit exposure by increasing the log level for the httpx logger (e.g., to WARNING) to suppress INFO-level URL logging [2][3][8]. - Manual Redaction: Developers often implement wrapper functions or regex-based sanitizers to strip or redact sensitive patterns from URL strings before they are logged or returned in error messages [1][5]. - Structured Logging: When logging exceptions, avoid passing the exception object directly into logger methods that might auto-serialize the message including its sensitive parts. Instead, consider using custom logic to extract and redact necessary diagnostic information [5]. The community has discussed and proposed various patches to improve default behavior, such as modifying how URLs are converted to strings, though users should continue to exercise caution when handling URLs containing secrets [1][2].
Citations:
- 1: encode/httpx#2765
- 2: encode/httpx#3513
- 3: crate/cratedb-mcp#63
- 4: encode/httpx#222
- 5: https://github.com/Aureliolo/synthorg/blob/main/src/synthorg/observability/redaction.py
- 6: https://www.python-httpx.org/exceptions/
- 7: https://github.com/encode/httpx/blob/master/httpx/_exceptions.py
- 8: crate/cratedb-mcp@d2c2311
🌐 Web query:
Python httpx library exception string representation includes request URL💡 Result:
In the Python httpx library, the string representation (str) of exception objects does not automatically include the request URL [1][2]. Instead, httpx exceptions such as RequestError and HTTPStatusError provide a.request attribute that contains the request object, which allows you to programmatically access the URL for logging or debugging purposes [1][3][2]. When handling exceptions, the recommended approach is to access the URL explicitly from the exception's.request attribute [2]. For example: try: response = httpx.get("https://www.example.com") response.raise_for_status except httpx.RequestError as exc: print(f"An error occurred while requesting {exc.request.url}.") except httpx.HTTPStatusError as exc: print(f"Error {exc.response.status_code} while requesting {exc.request.url}.") While the exception's str method provides the error message passed during initialization, the URL is not typically part of that message unless specifically included by the code that raised the exception [1][4]. Additionally, be aware that sensitive information like user credentials in URLs is handled by the library to prevent accidental leakage [5].
Citations:
- 1: https://github.com/encode/httpx/blob/master/httpx/_exceptions.py
- 2: https://www.python-httpx.org/quickstart/
- 3: https://deepwiki.com/encode/httpx/7.2-handling-network-errors
- 4: https://www.python-httpx.org/exceptions/
- 5: encode/httpx#3513
Remove API key exposure from exception logging in all LLM providers.
Lines 77-79 (GeminiProvider), 122-124 (OpenAIProvider), 171-173 (ClaudeProvider), and 214-216 (OllamaProvider) log raw exception objects that may include request details containing sensitive credentials. In GeminiProvider, the API key is embedded directly in the URL (line 41); if httpx raises any exception, the request object attached to that exception could leak the key through exception strings. Implement the proposed fix to prevent credential leakage:
Proposed fix
- except Exception as e: - logger.error(f"Gemini API error: {e}") - raise RuntimeError(f"Gemini provider failed: {e}") + except Exception: + logger.exception("Gemini API request failed") + raise RuntimeError("Gemini provider failed")Apply the same pattern to OpenAIProvider (lines 122-124), ClaudeProvider (lines 171-173), and OllamaProvider (lines 214-216).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/llm_provider.py` around lines 38 - 42, Remove sensitive credential exposure from exception logging across all LLM provider classes. At the exception handling blocks in GeminiProvider (lines 77-79), OpenAIProvider (lines 122-124), ClaudeProvider (lines 171-173), and OllamaProvider (lines 214-216), replace the logging of raw exception objects with sanitized error messages. Avoid logging the complete exception object or passing the exception directly to logger calls, as this may expose the API key embedded in request URLs or headers. Instead, extract and log only the error message string while ensuring the API key and other sensitive data are not included in the logged output.backend/main.py (1)
10-21:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winTurnstile protection can be bypassed due to fail-open config handling.
Line 351 skips verification whenever
TURNSTILE_SECRET_KEYis falsy, and Lines 10-21 load.envafter importing that constant. In.env-driven setups this can disable CAPTCHA checks instead of failing closed.Suggested fix
# backend/main.py -from turnstile import TURNSTILE_SECRET_KEY, verify_turnstile_token +from turnstile import verify_turnstile_token ... async def _verify_turnstile(turnstile_token: str | None, request: Request) -> None: - if TURNSTILE_SECRET_KEY: - client_host = request.client.host if request.client else None - await verify_turnstile_token(turnstile_token, client_host) + if not os.environ.get("TURNSTILE_SECRET_KEY", "").strip(): + raise HTTPException( + status_code=503, + detail="Turnstile is not configured. Set TURNSTILE_SECRET_KEY.", + ) + client_host = request.client.host if request.client else None + await verify_turnstile_token(turnstile_token, client_host)# backend/turnstile.py -TURNSTILE_SECRET_KEY = os.environ.get('TURNSTILE_SECRET_KEY', '') ... - if not TURNSTILE_SECRET_KEY: + turnstile_secret_key = os.environ.get("TURNSTILE_SECRET_KEY", "").strip() + if not turnstile_secret_key: raise HTTPException( status_code=500, detail='Turnstile secret key is not configured. Set TURNSTILE_SECRET_KEY.', ) ... - 'secret': TURNSTILE_SECRET_KEY, + 'secret': turnstile_secret_key,Also applies to: 350-353
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` around lines 10 - 21, The Turnstile protection is vulnerable because the import of TURNSTILE_SECRET_KEY from the turnstile module happens before the .env file is loaded, causing the constant to be imported with a falsy value in .env-driven setups. Move the load_dotenv call (currently inside the try block) to execute before the import statement for TURNSTILE_SECRET_KEY so that the environment variable is available when the constant is imported. Additionally, review the verification logic around lines 350-353 to ensure it fails closed (rejects requests) rather than failing open (skips verification) when TURNSTILE_SECRET_KEY is missing or falsy.backend/test_auth.py (1)
231-237:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winRate-limit/Turnstile tests are non-assertive and can pass on regressions.
Line 236 only logs unexpected statuses, and Lines 297-300 don’t fail when 429 never occurs. Also, Lines 302-305 assume recovery after 1 second despite a 5/minute policy.
Suggested fix
def test_google_oauth_post_with_invalid_turnstile(): @@ - else: - info(f"Turnstile validation returned {r.status_code}: {r.text[:80]}") + else: + fail(f"Invalid Turnstile token should not pass, got {r.status_code}: {r.text[:120]}") @@ def test_auth_login_rate_limiting(): @@ - if not rate_limit_hit and attempts >= 5: - info("Rate limiting may take time to accumulate; requests might not trigger immediately.") - info(f" (Made {attempts} requests without hitting 429 limit)") + if not rate_limit_hit: + fail(f"Expected 429 after exceeding 5/minute limit (attempts={attempts})") @@ - time.sleep(1) - r_recovery = requests.get(url_get, allow_redirects=False, timeout=10) - if r_recovery.status_code != 429: - info("✓ Rate limit counter appears to reset after brief wait") + # Optional: check Retry-After semantics instead of assuming 1-second reset.Also applies to: 297-306
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/test_auth.py` around lines 231 - 237, The Turnstile and rate-limit tests are non-assertive and only log unexpected behavior instead of failing. In the validation response check (lines 231-237), replace the else clause that logs unexpected status codes with an assertion that fails the test if the status is not 400 or 502. In the rate-limit test sections (lines 297-306), add explicit assertions to verify that a 429 status code is actually returned when expected, and ensure the test fails if rate limiting doesn't occur as required by the 5/minute policy. Replace logging-only patterns with proper test assertions so that regressions in Turnstile token validation and rate limit behavior are caught.README.md (1)
117-120:
⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep deployment variable tables consistent with this new Turnstile requirement.
This note introduces required Turnstile keys, but the production env tables later in the README still omit
VITE_TURNSTILE_SITE_KEY(Vercel) andTURNSTILE_SECRET_KEY(backend secrets). Please add them there to prevent broken auth setups during deployment.🧰 Tools
🪛 LanguageTool
[grammar] ~119-~119: Use a hyphen to join words.
Context: ...> Authentication start requests are rate limited in the backend to 5 requests per...(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.22.1)
[warning] 118-118: Blank line inside blockquote
(MD028, no-blanks-blockquote)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` around lines 117 - 120, The README.md file documents required Turnstile environment variables (VITE_TURNSTILE_SITE_KEY for the frontend and TURNSTILE_SECRET_KEY for the backend) in the note section, but these variables are missing from the production environment variable configuration tables that appear later in the document. Add VITE_TURNSTILE_SITE_KEY to the frontend/Vercel environment variables table and TURNSTILE_SECRET_KEY to the backend secrets table to ensure consistency and prevent deployment issues.src/components/ChatAssistant.tsx (1)
187-192:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd accessible names for icon controls and chat input.
Line 187, Line 255, Line 267, and Line 317 use icon-only controls/input without explicit labels, which blocks screen-reader usability.
Proposed fix
<button onClick={() => setIsOpen(false)} className="text-on-surface-variant hover:text-neon cursor-pointer transition-colors" + aria-label="Close chat assistant" + title="Close chat assistant" > <X size={14} /> </button> ... <button onClick={() => handleFeedback(msg.id!, idx, 'up')} disabled={!!msg.feedback} + aria-label="Mark response as helpful" className={`cursor-pointer transition-colors ${ ... <button onClick={() => handleFeedback(msg.id!, idx, 'down')} disabled={!!msg.feedback} + aria-label="Mark response as unhelpful" className={`cursor-pointer transition-colors ${ ... <form ... + <label htmlFor="chat-assistant-input" className="sr-only"> + Message the chat assistant + </label> <input + id="chat-assistant-input" type="text"Also applies to: 255-278, 317-324
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/components/ChatAssistant.tsx` around lines 187 - 192, Icon-only controls and chat input elements lack explicit accessible names for screen readers. Add aria-label attributes to provide descriptive names that convey the purpose of each element. In src/components/ChatAssistant.tsx at line 187-192, add an aria-label to the close button with the X icon describing its function (e.g., "Close chat"). Similarly, apply the same fix to the icon controls and input fields at lines 255-278 and lines 317-324, ensuring each interactive element or input has a descriptive aria-label that explains its purpose to screen reader users.src/lib/useTurnstile.ts (1)
124-128:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
execute()against concurrent calls that orphan pending promises.
pendingRef.currentis overwritten on each call. Ifexecute()is triggered twice before a callback fires, the first promise can remain unresolved indefinitely. Also, optional chaining onexecutecan silently no-op and leave the new promise hanging.Suggested fix
const execute = async (): Promise<string> => { @@ if (!ready || widgetIdRef.current === null) { throw new Error('Turnstile is not ready yet. Please wait and try again.'); } + if (pendingRef.current) { + throw new Error('Turnstile verification is already in progress.'); + } + if (!window.turnstile) { + throw new Error('Turnstile is unavailable in this environment.'); + } return new Promise((resolve, reject) => { pendingRef.current = { resolve, reject }; try { - window.turnstile?.execute(widgetIdRef.current as number); + window.turnstile.execute(widgetIdRef.current as number); } catch (exc) { pendingRef.current = null; reject(exc instanceof Error ? exc : new Error('Turnstile execution failed.')); } }); };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/useTurnstile.ts` around lines 124 - 128, The execute() method call in the Promise handler can leave previous promises unresolved when called concurrently. Before overwriting pendingRef.current with new resolve and reject callbacks, check if there is already a pending operation stored in pendingRef.current and handle it appropriately (either reject it or prevent the duplicate call). Additionally, the optional chaining on window.turnstile?.execute() can silently fail without throwing an exception, leaving the promise hanging indefinitely. After the try block, verify that window.turnstile is actually defined and that execute() was successfully called; if either condition fails, reject the promise immediately within the catch block or with an explicit check.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/.env.example (1)
37-44:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winRemove the duplicate
CORS_ALLOW_ALLentry.It is defined twice, so the later value silently overrides the earlier one. Even though both are
truenow, this makes the example fragile and keeps the config lint warning alive; keep the CORS toggle in one place and normalize the LLM keys while you’re here.♻️ Proposed cleanup
# ── CORS ────────────────────────────────────────────────────────────────────── CORS_ALLOW_ALL=true # ── LLM Provider (for AI Chat Assistant) ────────────────────────────────────── # Options: gemini (default), openai, claude, ollama LLM_PROVIDER=gemini GEMINI_API_KEY= +OLLAMA_BASE_URL=http://localhost:11434 OPENAI_API_KEY= CLAUDE_API_KEY= -OLLAMA_BASE_URL=http://localhost:11434 -CORS_ALLOW_ALL=true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/.env.example` around lines 37 - 44, The backend/.env.example file contains a duplicate CORS_ALLOW_ALL entry where the later value silently overrides the earlier one. Remove the duplicate CORS_ALLOW_ALL definition from the file, keeping only a single instance in the appropriate location. Additionally, ensure the LLM Provider section (containing LLM_PROVIDER, GEMINI_API_KEY, OPENAI_API_KEY, CLAUDE_API_KEY, and OLLAMA_BASE_URL) is cleanly organized and consistent with the rest of the configuration file's formatting conventions.Source: Linters/SAST tools
🧹 Nitpick comments (1)
backend/main.py (1)
122-125: ⚡ Quick winIncorrect exception handler registration for 429 status.
Line 123 registers
_rate_limit_exceeded_handlerfor HTTP status code 429, but this handler expects aRateLimitExceededexception as its second argument. SlowAPI raisesRateLimitExceededexceptions (notHTTPException(status_code=429)), which are caught by the custom handler at lines 127-137. The status-code-based registration at line 123 will never be invoked.Either remove line 123 (since the custom handler below handles everything), or replace line 123 with the correct exception-based registration and remove the custom handler.
Option 1: Remove the redundant line (keep custom handler)
app.state.limiter = limiter -app.add_exception_handler(429, _rate_limit_exceeded_handler) app.add_middleware(SlowAPIMiddleware) app.include_router(chat_router)Option 2: Use built-in handler correctly (remove custom handler)
app.state.limiter = limiter -app.add_exception_handler(429, _rate_limit_exceeded_handler) +app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler) app.add_middleware(SlowAPIMiddleware) app.include_router(chat_router) - -@app.exception_handler(RateLimitExceeded) -async def rate_limit_handler(request: Request, exc: RateLimitExceeded): - return JSONResponse( - status_code=429, - content={ - "error": "rate_limit_exceeded", - "detail": "Too many requests. Please slow down.", - "retry_after": exc.headers.get("Retry-After", "60"), - }, - headers={"Retry-After": exc.headers.get("Retry-After", "60")}, - )Note: Option 1 preserves your custom response format; Option 2 uses SlowAPI's default format.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/main.py` around lines 122 - 125, The exception handler registration on line 123 uses app.add_exception_handler(429, _rate_limit_exceeded_handler) which registers the handler for HTTP status code 429, but SlowAPI raises RateLimitExceeded exceptions (not HTTP 429 status codes), so this registration will never be invoked. Since the custom _rate_limit_exceeded_handler function (defined at lines 127-137) correctly catches RateLimitExceeded exceptions, remove the incorrect status-code-based registration at line 123 entirely. This keeps your custom response format and eliminates the unused handler registration.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@backend/.env.example`:
- Around line 37-44: The backend/.env.example file contains a duplicate
CORS_ALLOW_ALL entry where the later value silently overrides the earlier one.
Remove the duplicate CORS_ALLOW_ALL definition from the file, keeping only a
single instance in the appropriate location. Additionally, ensure the LLM
Provider section (containing LLM_PROVIDER, GEMINI_API_KEY, OPENAI_API_KEY,
CLAUDE_API_KEY, and OLLAMA_BASE_URL) is cleanly organized and consistent with
the rest of the configuration file's formatting conventions.
---
Nitpick comments:
In `@backend/main.py`:
- Around line 122-125: The exception handler registration on line 123 uses
app.add_exception_handler(429, _rate_limit_exceeded_handler) which registers the
handler for HTTP status code 429, but SlowAPI raises RateLimitExceeded
exceptions (not HTTP 429 status codes), so this registration will never be
invoked. Since the custom _rate_limit_exceeded_handler function (defined at
lines 127-137) correctly catches RateLimitExceeded exceptions, remove the
incorrect status-code-based registration at line 123 entirely. This keeps your
custom response format and eliminates the unused handler registration.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 9dca830d-44a4-4a4e-81a0-52e517e0a0dd
📒 Files selected for processing (5)
backend/.env.examplebackend/chat_logger.pybackend/main.pybackend/rag_retriever.pysrc/lib/api.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/chat_logger.py
- backend/rag_retriever.py
|
Successfully added this feature issue no : #62 |
|
Hi @jpdevhub I have completed the feature integration verify it and if any changes required feel free to ask .......... |
|
You have bring back the tursentile commits in this pr ?? |
|
I have created a Seperate branch pranavshankar1221:feat/ai-chatbot-integration for the chatbot Integration |
|
See the commits above the tursentile one is here ! fix the changes in the new branch with fresh pr only the chatbot commits and then push with the commits only for this issue. |
|
I have opened a fresh Pull request #118 feat: add AI chatbot assistant with RAG support |
closes #62
Description
Checklist
npm run lintpasses with no errorsnpm run buildcompiles without TypeScript errorspython -m pytestpasses (including new tests I added).envfiles, API keys, secrets, model weights, or__pycache__in this diffmain, not mergedSummary by CodeRabbit
New Features