Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions backend/api/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
from fastapi import FastAPI, UploadFile, File, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
from typing import List, Optional
import sys
import os

# Add parent directory to path
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

from ingestion.ingestion_manager import IngestionManager
from retrieval.query_processor import QueryProcessor
from generation.lmstudio_generator import LMStudioGenerator
from indexing.vector_store import VectorStore

app = FastAPI(title="NeuraX API", version="1.0.0")

# CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)

# Initialize components
ingestion_manager = IngestionManager()
query_processor = QueryProcessor()
vector_store = VectorStore()
Comment on lines +27 to +30

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, find the QueryProcessor and VectorStore classes
fd -t f "\.py$" | head -20

Repository: thrishank007/NeuraX

Length of output: 578


🏁 Script executed:

# Locate QueryProcessor class definition
rg "class QueryProcessor" -A 10

Repository: thrishank007/NeuraX

Length of output: 832


🏁 Script executed:

# Locate VectorStore class definition
rg "class VectorStore" -A 10

Repository: thrishank007/NeuraX

Length of output: 807


🏁 Script executed:

# Examine the main.py file around lines 27-30
sed -n '1,40p' backend/api/main.py

Repository: thrishank007/NeuraX

Length of output: 1184


🏁 Script executed:

# Check IngestionManager signature
rg "class IngestionManager" -A 10

Repository: thrishank007/NeuraX

Length of output: 786


🏁 Script executed:

# Check the full main.py file to see if there's any post-initialization
sed -n '27,100p' backend/api/main.py

Repository: thrishank007/NeuraX

Length of output: 2273


🏁 Script executed:

# Check if QueryProcessor has a process() method
rg "def process" retrieval/query_processor.py -B 2 -A 5

Repository: thrishank007/NeuraX

Length of output: 971


Component initialization will fail: constructor arguments missing.

  • Line 29: QueryProcessor() requires (embedding_manager, vector_store, config) but called with zero arguments—raises TypeError.
  • Line 30: VectorStore() requires (persist_directory) but called with zero arguments—raises TypeError.
  • Line 79: query_processor.process() method does not exist. QueryProcessor has process_text_query(), process_image_query(), and process_multimodal_query() methods.

Pass required dependencies during initialization and use the correct method name.

🤖 Prompt for AI Agents
In `@backend/api/main.py` around lines 27 - 30, The code currently instantiates
VectorStore(), QueryProcessor(), and then calls query_processor.process(),
causing TypeErrors and a missing method call; fix by constructing VectorStore
with its required persist_directory (pass the configured persist directory),
create or obtain the embedding_manager and config objects, then initialize
QueryProcessor with (embedding_manager, vector_store, config) instead of no
args, and replace the call to query_processor.process(...) with the appropriate
method name from QueryProcessor (use process_text_query(),
process_image_query(), or process_multimodal_query() depending on the input
type) so the correct signature and dependencies are used.


@app.get("/")
async def root():
return {"message": "NeuraX API is running", "version": "1.0.0"}

@app.get("/health")
async def health_check():
"""Health check endpoint"""
return {
"status": "healthy",
"components": {
"database": "connected",
"lm_studio": "checking..."
}
}

@app.post("/api/upload")
async def upload_files(files: List[UploadFile] = File(...)):
"""Upload and process files"""
try:
results = []
for file in files:
# Save file
file_path = f"./data/uploads/{file.filename}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Path traversal vulnerability: filename is used directly without sanitization.

A malicious filename like ../../etc/passwd could write files outside the upload directory.

Proposed fix
+from pathlib import Path
+import re
+
+def sanitize_filename(filename: str) -> str:
+    """Remove path components and dangerous characters from filename."""
+    # Get only the filename, not any path components
+    name = Path(filename).name
+    # Remove any remaining dangerous characters
+    return re.sub(r'[^\w\-_\.]', '_', name)
+
 `@app.post`("/api/upload")
 async def upload_files(files: List[UploadFile] = File(...)):
     try:
         results = []
         for file in files:
-            file_path = f"./data/uploads/{file.filename}"
+            safe_filename = sanitize_filename(file.filename)
+            file_path = os.path.join(UPLOAD_DIR, safe_filename)
🤖 Prompt for AI Agents
In `@backend/api/main.py` at line 54, The code currently builds file_path by
concatenating file.filename directly (file_path =
f"./data/uploads/{file.filename}"), which allows path traversal; change the
upload handling to sanitize and constrain filenames: use a safe filename routine
(e.g., secure_filename or strip path components), or generate a server-side name
(uuid/timestamp) and preserve the original extension, then resolve the final
path with pathlib and verify it is inside the uploads directory before saving;
update uses of file_path and file.filename accordingly and ensure the uploads
directory is created if missing.

with open(file_path, "wb") as f:
content = await file.read()
f.write(content)
Comment on lines +52 to +57

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Upload directory is not created before writing files.

If ./data/uploads doesn't exist, the file write at line 55-57 will raise FileNotFoundError.

Proposed fix
+import os
+
+# Ensure upload directory exists
+UPLOAD_DIR = "./data/uploads"
+os.makedirs(UPLOAD_DIR, exist_ok=True)
+
 `@app.post`("/api/upload")
 async def upload_files(files: List[UploadFile] = File(...)):
     """Upload and process files"""
     try:
         results = []
         for file in files:
             # Save file
-            file_path = f"./data/uploads/{file.filename}"
+            file_path = os.path.join(UPLOAD_DIR, file.filename)
             with open(file_path, "wb") as f:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for file in files:
# Save file
file_path = f"./data/uploads/{file.filename}"
with open(file_path, "wb") as f:
content = await file.read()
f.write(content)
import os
# Ensure upload directory exists
UPLOAD_DIR = "./data/uploads"
os.makedirs(UPLOAD_DIR, exist_ok=True)
`@app.post`("/api/upload")
async def upload_files(files: List[UploadFile] = File(...)):
"""Upload and process files"""
try:
results = []
for file in files:
# Save file
file_path = os.path.join(UPLOAD_DIR, file.filename)
with open(file_path, "wb") as f:
content = await file.read()
f.write(content)
🤖 Prompt for AI Agents
In `@backend/api/main.py` around lines 52 - 57, The upload code writes files to
"./data/uploads/{file.filename}" without ensuring the directory exists; before
the loop that iterates over files (the for file in files block) create the
directory (e.g., call os.makedirs("./data/uploads", exist_ok=True) or
Path("./data/uploads").mkdir(parents=True, exist_ok=True)) and do it once
outside the loop so the subsequent file_path writes in the file.read()/f.write()
block won't raise FileNotFoundError.


# Process file
result = ingestion_manager.process_file(file_path)
results.append({
"filename": file.filename,
"status": "processed",
"result": result
})

return JSONResponse({"status": "success", "files": results})
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

@app.post("/api/query")
async def process_query(query: dict):
"""Process multimodal query"""
try:
query_text = query.get("text", "")
query_type = query.get("type", "text")

# Process query
results = query_processor.process(query_text)
Comment on lines +78 to +79

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Method query_processor.process() doesn't exist; should be process_text_query().

Per the relevant code snippets from retrieval/query_processor.py, the QueryProcessor class exposes process_text_query(query, filters, k), not process().

Proposed fix
     try:
         query_text = query.get("text", "")
-        query_type = query.get("type", "text")
         
         # Process query
-        results = query_processor.process(query_text)
+        result = query_processor.process_text_query(query_text)
         
         return JSONResponse({
             "status": "success",
-            "results": results
+            "results": result.results if hasattr(result, 'results') else []
         })
🤖 Prompt for AI Agents
In `@backend/api/main.py` around lines 78 - 79, The call to
query_processor.process(query_text) is invalid because QueryProcessor exposes
process_text_query(query, filters, k); update the call in main.py to use
query_processor.process_text_query and supply the expected parameters (at
minimum the query_text, and pass any applicable filters and k from the request
or defaults). Locate usages of QueryProcessor (query_processor) and replace
process(...) with process_text_query(query_text, filters, k) so the signature
matches the class API.


return JSONResponse({
"status": "success",
"results": results
})
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

@app.get("/api/files")
async def list_files():
"""List uploaded files"""
try:
upload_dir = "./data/uploads"
files = os.listdir(upload_dir) if os.path.exists(upload_dir) else []
return JSONResponse({"status": "success", "files": files})
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
21 changes: 21 additions & 0 deletions docker/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Docker Environment Configuration

# Backend Configuration
BACKEND_ENV=development
BACKEND_PORT=8000

# Frontend Configuration
FRONTEND_PORT=3000
NEXT_PUBLIC_API_URL=http://localhost:8000

# LM Studio Configuration
LM_STUDIO_URL=http://host.docker.internal:1234

# Database Configuration
CHROMA_DB_HOST=chromadb
CHROMA_DB_PORT=8000
VECTOR_DB_PATH=/app/vector_db

# Performance Configuration
GPU_ENABLED=true
BATCH_SIZE=32
22 changes: 22 additions & 0 deletions docker/Dockerfile.backend
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
tesseract-ocr \
ffmpeg \
&& rm -rf /var/lib/apt/lists/*

# Copy requirements
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Expose port
EXPOSE 8000

# Run API
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
18 changes: 18 additions & 0 deletions docker/Dockerfile.frontend
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FROM node:18-alpine

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install dependencies
RUN npm install

# Copy application
COPY . .

# Expose port
EXPOSE 3000

# Run development server
CMD ["npm", "run", "dev"]
59 changes: 59 additions & 0 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
version: '3.8'

services:
backend:
build:
context: ./backend
dockerfile: ../docker/Dockerfile.backend
ports:
- "8000:8000"
volumes:
- ./backend:/app
- ./data:/app/data
- ./vector_db:/app/vector_db
- ./models:/app/models
environment:
- ENVIRONMENT=development
- API_HOST=0.0.0.0
- API_PORT=8000
- LM_STUDIO_URL=http://host.docker.internal:1234
depends_on:
- chromadb
networks:
- neurax-network

frontend:
build:
context: ./frontend
dockerfile: ../docker/Dockerfile.frontend
ports:
- "3000:3000"
volumes:
- ./frontend:/app
- /app/node_modules
environment:
- NEXT_PUBLIC_API_URL=http://localhost:8000
- NEXT_PUBLIC_LM_STUDIO_URL=http://localhost:1234
depends_on:
- backend
networks:
- neurax-network

chromadb:
image: chromadb/chroma:latest
ports:
- "8001:8000"
volumes:
- ./vector_db:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
networks:
- neurax-network

networks:
neurax-network:
driver: bridge

volumes:
vector_db_data:
models_data:
28 changes: 28 additions & 0 deletions frontend/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{
"name": "neurax-frontend",
"version": "1.0.0",
"private": true,
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "next lint"
},
"dependencies": {
"next": "^14.0.0",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"axios": "^1.6.0",
"tailwindcss": "^3.3.0",
"postcss": "^8.4.0",
"autoprefixer": "^10.4.0"
},
"devDependencies": {
"typescript": "^5.0.0",
"@types/node": "^20.0.0",
"@types/react": "^18.2.0",
"@types/react-dom": "^18.2.0",
"eslint": "^8.0.0",
"eslint-config-next": "^14.0.0"
}
}
31 changes: 31 additions & 0 deletions frontend/pages/api/proxy.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Next.js API route for proxying requests to backend
export default function handler(req, res) {
const { method, body, query } = req

// Proxy to backend API
const backendUrl = process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000'
const targetUrl = `${backendUrl}${req.url.replace('/api/proxy', '')}`

// Forward request to backend
fetch(targetUrl, {
method: method,
headers: {
'Content-Type': 'application/json',
...req.headers
},
body: method !== 'GET' ? JSON.stringify(body) : undefined
})
.then(response => {
// Forward response status and headers
const headers = {}
response.headers.forEach((value, name) => {
headers[name] = value
})

res.status(response.status).json(headers)
})
.catch(error => {
console.error('Proxy error:', error)
res.status(500).json({ error: 'Proxy request failed' })
})
Comment on lines +2 to +30

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Proxy returns headers instead of response body and doesn't await the fetch.

Multiple issues prevent this proxy from functioning:

  1. Line 25: Returns headers object instead of the actual response body
  2. Missing async/await: The handler returns before fetch completes, causing empty/undefined responses
  3. Header forwarding: Spreading req.headers forwards problematic headers like host and content-length
Proposed fix
 // Next.js API route for proxying requests to backend
-export default function handler(req, res) {
+export default async function handler(req, res) {
   const { method, body, query } = req
   
   // Proxy to backend API
-  const backendUrl = process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000'
+  const backendUrl = process.env.BACKEND_URL || process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000'
   const targetUrl = `${backendUrl}${req.url.replace('/api/proxy', '')}`
   
-  // Forward request to backend
-  fetch(targetUrl, {
-    method: method,
-    headers: {
-      'Content-Type': 'application/json',
-      ...req.headers
-    },
-    body: method !== 'GET' ? JSON.stringify(body) : undefined
-  })
-  .then(response => {
-    // Forward response status and headers
-    const headers = {}
-    response.headers.forEach((value, name) => {
-      headers[name] = value
-    })
-    
-    res.status(response.status).json(headers)
-  })
-  .catch(error => {
+  try {
+    const response = await fetch(targetUrl, {
+      method: method,
+      headers: {
+        'Content-Type': 'application/json',
+      },
+      body: method !== 'GET' ? JSON.stringify(body) : undefined
+    })
+    
+    const data = await response.json()
+    res.status(response.status).json(data)
+  } catch (error) {
     console.error('Proxy error:', error)
     res.status(500).json({ error: 'Proxy request failed' })
-  })
+  }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export default function handler(req, res) {
const { method, body, query } = req
// Proxy to backend API
const backendUrl = process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000'
const targetUrl = `${backendUrl}${req.url.replace('/api/proxy', '')}`
// Forward request to backend
fetch(targetUrl, {
method: method,
headers: {
'Content-Type': 'application/json',
...req.headers
},
body: method !== 'GET' ? JSON.stringify(body) : undefined
})
.then(response => {
// Forward response status and headers
const headers = {}
response.headers.forEach((value, name) => {
headers[name] = value
})
res.status(response.status).json(headers)
})
.catch(error => {
console.error('Proxy error:', error)
res.status(500).json({ error: 'Proxy request failed' })
})
export default async function handler(req, res) {
const { method, body, query } = req
// Proxy to backend API
const backendUrl = process.env.BACKEND_URL || process.env.NEXT_PUBLIC_API_URL || 'http://localhost:8000'
const targetUrl = `${backendUrl}${req.url.replace('/api/proxy', '')}`
try {
const response = await fetch(targetUrl, {
method: method,
headers: {
'Content-Type': 'application/json',
},
body: method !== 'GET' ? JSON.stringify(body) : undefined
})
const data = await response.json()
res.status(response.status).json(data)
} catch (error) {
console.error('Proxy error:', error)
res.status(500).json({ error: 'Proxy request failed' })
}
}
🤖 Prompt for AI Agents
In `@frontend/pages/api/proxy.js` around lines 2 - 30, The handler function
currently fires fetch without awaiting and sends response headers instead of the
response body; update export default function handler(req, res) to be async,
await the fetch(targetUrl, ...) call, read and forward the backend response body
(use response.text() or response.buffer() depending on content-type) and set
res.status(response.status) with the correct body; also sanitize forwarded
headers in the fetch by copying req.headers but removing/overwriting hop-by-hop
and problematic headers like host and content-length (and optionally connection,
keep-alive, transfer-encoding) before passing them to fetch so the backend
receives clean headers.

}
Loading