The Problem: Students miss out on thousands of dollars in benefits—scholarships, financial aid (FAFSA), university grants, and health resources—because the information is scattered across complex websites and privacy policies they rarely read. Using current AI tools to find these benefits often requires uploading sensitive personal data to third-party servers.
The Solution: We are building a Local-First Student Benefit Analyzer. Our application runs on the student's desktop, using an interactive form to collect profile data that is encrypted and stored locally. A companion browser extension identifies relevant university and scholarship domains, which are processed by stateless cloud workers (mapper and scraper) to find new opportunities without ever exposing the student's private profile. A local LLM matches scraped benefits against the student's profile entirely on-device.
- Josue Aranday
- John Payes
- Alejandro Salinas
- Kevin Gonzalez
Faculty Adviser: Pedro Fonseca
We utilize a Split-Architecture that keeps user data local while offloading heavy web crawling and scraping to stateless workers designed for cloud deployment.
- Interface: A modern, high-DPI desktop application built with CustomTkinter.
- Function: Users fill out a secure profile (GPA, major, financial needs, citizenship, etc.). This data is stored in an encrypted local database.
- Matching: A local LLM (Ollama with phi3:mini) compares the user's profile against scraped benefit data entirely on-device — personal data never leaves the machine.
- Chat: An AI chat page lets the user ask follow-up questions about their matched benefits using the same local LLM.
- Privacy-First Collection: The extension does not track full browsing history. It passively collects relevant
.eduand.govdomains from the user's browsing using simple heuristics. - Throttling: Each domain has a 1-week cooldown to avoid redundant submissions, with a 200-item queue cap and periodic retry.
- Native Messaging: The extension communicates only with the local Desktop App via Chrome Native Messaging. No browsing data is sent to the cloud.
- Protocol: Reads/writes length-prefixed JSON over stdin/stdout per the Chrome Native Messaging spec.
- Storage: Stores collected domains and browsing items in a local SQLite database (
local_benefits.db). - Setup:
setup_host.pydynamically configures the NM manifest and registry entry for the current machine.
- Purpose: Discovers all relevant pages on a given domain using a two-phase approach: sitemap pre-seeding (free URL discovery from sitemap.xml) followed by BFS crawl to find pages sitemaps missed.
- Output: Produces
mapped_pages.json— a mapping of domains to their discovered page URLs. - Design: Stateless worker with resource-aware tuning (CPU/RAM detection). Designed for cloud deployment; currently runs locally for development and testing.
- Purpose: A FastAPI app that scrapes mapped pages and extracts clean text content.
- Change Detection: Three-layer system — weekly pack cache, conditional HTTP headers (ETag / If-Modified-Since), and SHA-256 content hashing — so unchanged pages are skipped efficiently.
- Design: Ephemeral, stateless worker with minimal retention. Designed for cloud deployment as a serverless function; currently runs locally for development and testing.
- Purpose: Reads the user's profile from
answers.jsonand all scraped text fromscraped_output/, then sends chunks to the local Ollama LLM to identify matching benefits. - Output: Produces
matched_benefits.json— a deduplicated list of benefits with names, descriptions, eligibility, and source URLs. - Privacy: The LLM runs entirely on-device via Ollama (localhost:11434). No personal data is transmitted.
Browser Extension
→ Native Host (SQLite DB)
→ map.py (controller) → mapped_pages.json
→ scrape_all.py (controller) → scraped_output/
→ match.py (controller) → matched_benefits.json
→ GUI chat (Ollama-powered)
Each stage communicates via files or databases — no direct imports between components.
- Language: Python 3.10+
- GUI Framework: CustomTkinter
- Local Database: SQLite
- Local LLM: Ollama with phi3:mini (2.3 GB)
- Target: Google Chrome / Chromium
- Manifest: V3
- Mechanism: Chrome Native Messaging API
- API Framework: FastAPI + uvicorn
- HTML Parsing: BeautifulSoup4
- Page Discovery: BFS crawl + sitemap XML parsing
- Local Profiling: The user's financial and academic profile never leaves the local device. The cloud workers only see domains and URLs to scrape, not the reason why.
- Local LLM: Benefit matching and chat are powered by Ollama running on localhost. No data is sent to external AI services.
- Ephemeral Processing: Cloud scraping workers are stateless and do not persist request payloads or scraped content beyond processing.
- No Long-Term Log: The extension keeps only a local, short-term list of candidate domains. It does not upload browsing history.
- Python 3.10 or higher
- Google Chrome (for extension)
- Ollama installed with the phi3:mini model
- Git
-
Clone the Repository
git clone https://github.com/General-Zilver/LPBD.git cd LPBD -
Install Python Dependencies
pip install -r requirements.txt
-
Install Ollama and pull the model
ollama pull phi3:mini
-
Set up the Native Host
python native_host/setup_host.py
Follow the prompts to enter your Chrome extension ID.
-
Load the Browser Extension
- Open Chrome →
chrome://extensions/ - Toggle "Developer mode" (top right)
- Click "Load unpacked" and select the
browser_extension/folder - Copy the Extension ID and use it in step 4 if you haven't already
- Open Chrome →
Once the extension has collected some domains, run each stage in order:
# 1. Map domains from the extension's DB into page URLs
python map.py # maps all collected domains
python map.py --max-pages 50 # limit pages per domain for faster runs
# 2. Scrape all mapped pages (auto-starts the scraper API)
python scrape_all.py # default: 20 pages per domain
python scrape_all.py --all # scrape everything
python scrape_all.py --max-pages 5 # limit for quick demo
# 3. Match scraped benefits against the user's profile
python match.py # requires answers.json from the GUI questionnaire
# 4. Launch the desktop app
python GUI/start.py- Phase 1: UI Prototype — CustomTkinter desktop app with login, signup, questionnaire, and chat pages.
- Phase 2: Extension + Native Host — Manifest V3 extension with Chrome Native Messaging bridge and SQLite storage.
- Phase 3: Mapper — BFS + sitemap crawler for page discovery, with resource-aware worker tuning.
- Phase 4: Scraper — FastAPI ephemeral worker with three-layer change detection and weekly pack caching.
- Phase 5: Local LLM Integration — Ollama-powered benefit matching and chat, entirely on-device.
- Phase 6: Pipeline Controllers —
map.py,scrape_all.py, andmatch.pyfor end-to-end demo workflow. - Phase 7: Cloud Deployment — Deploy mapper and scraper as serverless cloud workers with weekly sync scheduling.
- Phase 8: Database Encryption — Implement SQLCipher encryption for the local profile database.
Distributed under the MIT License. See LICENSE for more information.