Advanced AI Learning Companion for Psychology
Engineered for the OpenStax Psychology 2e Textbook
NeuroNauts is a state-of-the-art interactive learning platform designed to revolutionize how students interact with complex academic material. By leveraging a Cloud-Native Retrieval-Augmented Generation (RAG) architecture, it transforms the OpenStax Psychology 2e textbook into a dynamic, conversational knowledge base.
Unlike standard LLMs which frequently hallucinate or confidently invent incorrect academic facts, NeuroNauts provides hallucination-free answers by grounding every single response in highly-specific textbook segments. It rapidly serves high-quality Generation via Groq, accurate dense vector search via Zilliz Cloud, and seamlessly renders contextual infographics, charts, and scientific illustrations directly via Cloudinary.
| Feature | Description |
|---|---|
| βοΈ Cloud-Native Ecosystem | Built for production performance using Groq (Llama-3.3-70b-versatile), Zilliz Cloud, and Cloudinary CDN image hosting. |
| π High-Fidelity Retrieval | Employs Nomic-Embed-Text on Zilliz Cloud with strict relevance thresholds (Cosine Similarity > 0.3) ensuring zero-hallucinations. |
| π‘οΈ Resilient API Architecture | Wraps all downstream services (Database, CDN, LLM) in robust exception handling logic, rendering premium UI fallback banners for 401 Auth errors, 429 Rate Limits, and 500 Timeouts. |
| π Section-Aware Chunking | Intelligent data ingestion via Docling. Chunks never bleed across sections/chapters, ensuring perfect context integrity. |
| πΌοΈ Intelligent Image Lightbox | Using PyMuPDF, the agent identifies and securely extracts charts/diagrams, serving them from Cloudinary alongside the LLM's text. |
| π§ Context-Aware Memory | Handles complex follow-up questions (e.g., "what are parts of it?") by intelligently resolving pronouns against conversation history. |
| π Headless Eval Suite | Includes headless_eval.py to continuously measure Faithfulness and Answer Relevancy programmatically across the data pipeline. |
NeuroNauts evolved from a local-prototyped RAG into a highly-scalable cloud MVP. The system handles Ingestion, Retrieval, and Generation over distributed nodes to ensure millisecond-level inference times.
graph TD
subgraph "1. Engineering & Ingestion Scripts (One-Time Execution)"
A[Psychology 2e PDF] --> B[Docling Parser]
B --> C[Section-Aware Chunking]
C -->|Raw Images via PyMuPDF| D[Upload to Cloudinary CDN]
C -->|Text via SentenceTransformers| E[Push to Zilliz Cloud DB]
D --> F(image_url_map.json)
F --> G[fix_image_refs.py aligner]
end
subgraph "2. Cloud-Native Retrieval Engine (App)"
H[User Query] --> I[Context Window Management]
I --> J[Zilliz Dense Vector Search]
J --> K[Top-K Segment Extraction]
K --> L[Extract Cloudinary Image URLs]
end
subgraph "3. Contextual Generation & UI"
K --> M[Context Packaging]
M --> N[Groq API: Llama-3.3-70b]
N --> O[Streamlit UI Chatbot]
L --> O
O --> P[Frontend Error Sentinel Catching]
end
- Python 3.10+
- Keys for the following infrastructure:
- Groq API (For lightning-fast LLM generation)
- Zilliz Cloud (For Serverless Vector Search)
- Cloudinary (For Cloud Image CDN hosting)
# Clone the repo
git clone https://github.com/Omen-bit/WCEHackathon2026_NeuroNauts.git
cd WCEHackathon2026_NeuroNauts
# Create and activate environment
python -m venv .venv
# Windows: .venv\Scripts\activate
# Mac/Linux: source .venv/bin/activate
# Install dependencies
pip install -r requirements.txtCreate a .env file in the root directory and populate it with your cloud credentials:
# --- GROQ (LLM Gen) ---
GROQ_API_KEY="your-groq-key"
GROQ_MODEL="llama-3.3-70b-versatile"
# --- CLOUDINARY (Images) ---
CLOUDINARY_CLOUD_NAME="your-cloud-name"
CLOUDINARY_API_KEY="your-key"
CLOUDINARY_API_SECRET="your-secret"
# --- ZILLIZ (Vector DB) ---
ZILLIZ_URI="https://your-zilliz-cluster.cloud.zilliz.com"
ZILLIZ_TOKEN="your-zilliz-token"streamlit run app/app.pyWant to use NeuroNauts for a different textbook? It's incredibly easy to adapt our custom engineering pipeline for any PDF.
- Add Your Book: Place your new PDF in the
data/or root directory. - Run the Ingestion Pipeline:
This uses Docling to intelligently chunk your book specifically by academic headings, and uses PyMuPDF to rip out all the native high-res images to an
python pipeline/run_pipeline.py path/to/your_textbook.pdf
extracted_images/folder. - Upload Assets to CDN:
This securely streams your newly extracted textbook images into Cloudinary.
python scripts/upload_images_to_cloud.py
- Push to Vector DB:
This connects the generated cloud URLs to your dense vector DB chunks and pushes everything securely to Zilliz.
python scripts/migrate_to_zilliz.py python scripts/fix_image_refs.py
- Start Chatting: Your app is now an expert on your unique textbook!
WCEHackathon2026_NeuroNauts/
βββ app/ # Streamlit Frontend & Core RAG Logic
β βββ app.py # Main App, Prompt Engineering & UI Rendering
β βββ retrieve.py # Zilliz Database Connections & Search logic
β βββ generate.py # Groq API Abstraction Layer
β βββ headless_eval.py # Automated Headless Evaluation Script
βββ pipeline/ # Powerful Automated PDF Ingestion Pipeline
β βββ (Docling Parsers, PyMuPDF extractors, Recursive Chunking)
βββ scripts/ # Infrastructure Migration & Cleanup Toolkit
β βββ migrate_to_zilliz.py # Uplift script for moving local DB to Zilliz
β βββ upload_images_to_cloud.py # Asset migration to Cloudinary
β βββ fix_image_refs.py # Cloud DB JSON string serialization rectifier
βββ queries.json # Standardized testing metrics
βββ requirements.txt # Modern, cloud-native project dependencies
This project was built for the WCE Hackathon 2026 by Team NeuroNauts. We follow the MIT License and welcome community feedback.
Distributed under the MIT License. See LICENSE for more information.
Made with β€οΈ by Team NeuroNauts
