#📘 Multi-Document Embedding Search Engine with Caching

A semantic search engine that uses Machine Learning, NLP embeddings, and similarity search algorithms to retrieve the most relevant information across multiple documents. Includes a caching system to avoid repeated embedding generation and improve performance.

🚀 Features

Multi-document ingestion and preprocessing

Transformer-based embedding generation

Semantic search using cosine similarity

Efficient caching layer (avoids recomputation)

Fast and accurate AI-powered search results

Supports large documents through text chunking

Backend API + Streamlit UI

🧠 How It Works

Load multiple documents

Split into text chunks

Generate embeddings using ML models

Store embeddings in cache

User enters a query

Query embedding is compared with stored embeddings

Returns top relevant results based on similarity

🛠️ Tech Stack

Python

Embedding Models (Sentence Transformers / OpenAI)

NLP Preprocessing

Cosine Similarity

Pickle / SQLite DB Cache

Streamlit

FastAPI

📂 Project Structure project/ │── src/ │── appx.py # Backend server │── ui.py # User Interface (Streamlit) │── data/ # Ignored by Git │── cache/ │ ├── index_meta.pkl │ ├── embeddings_cache.db │ └── documents.index │── README.md │── requirements.txt │── .gitignore

🖥️ How to Run the Project ✅ 1. Start Backend (Windows) cd C:\Users\ssada\project python appx.py

Backend must remain open and running.

✅ 2. Start User Interface cd C:\Users\ssada\project streamlit run ui.py

✅ 3. Start API (FastAPI)

API documentation available at:

👉 http://127.0.0.1:8000/docs

📦 Cache Files Stored Here

The system stores embeddings and metadata in:

index_meta.pkl

embeddings_cache.db

documents.index

These files allow fast loading without recomputing embeddings.

📦 Installation pip install -r requirements.txt

▶️ Run the App

Streamlit UI:

streamlit run ui.py

Backend:

python appx.py

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Source_code.zip		Source_code.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

#📘 Multi-Document Embedding Search Engine with Caching

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

#📘 Multi-Document Embedding Search Engine with Caching

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages