Almous - Your Extensible AI Chat

Almous is a powerful and flexible AI backend designed to serve as the brain for advanced chat applications. It goes beyond simple Q&A by integrating multiple Large Language Model (LLM) providers, Retrieval-Augmented Generation (RAG) for document interaction, and an autonomous web search agent to provide answers based on real-time information from the internet.

Built with Python and Flask, Almous is modular, easy to extend, and ready to power your next-generation AI assistant.

✨ Core Features

Multi-Provider LLM Integration: Seamlessly switch between different LLM providers.
- 🚀 Groq: For incredibly fast inference speeds.
- 🧠 A4F (AI4Finance): For access to specialized models.
- 🎨 Pollinations.ai: For creative and diverse model options.
Retrieval-Augmented Generation (RAG): Chat with your documents.
- Upload files (PDFs, Markdown, etc.) via a simple API endpoint.
- Almous processes, chunks, and indexes the content in a ChromaDB vector store.
- Ask questions and get answers sourced directly from your documents.
Autonomous Search Agent: Get answers from the web.
- When activated, the AI first generates relevant search queries based on your prompt.
- It uses DuckDuckGo to perform searches and crawls the top results.
- The scraped web content is then used as a knowledge base to generate a comprehensive, up-to-date answer.
Real-time Streaming: Responses are streamed word-by-word using Server-Sent Events (SSE) for a responsive user experience.
Modular & Extensible Architecture: The codebase is organized into controllers, services, providers, and tools, making it easy to add new features, LLM providers, or tools.
Conversation Memory: Remembers the last few turns of the conversation to maintain context.

🛠️ Technology Stack

Category	Technology
Backend Framework	Flask
LLM Providers	Groq, A4F, Pollinations.ai
RAG & VectorDB	LangChain, ChromaDB
Embedding Models	Jina AI, Google Gemini (available)
Web Search & Crawling	`ddgs` (DuckDuckGo Search), `crawl4ai`
Data Validation	Pydantic
Document Processing	`markitdown` (converts various file types to Markdown)
Environment Mgmt	`python-dotenv`

🚀 Getting Started

Follow these instructions to get the Almous backend up and running on your local machine.

Prerequisites

Python 3.10+
pip package manager
Git

1. Clone the Repository

git clone https://github.com/Medamine-Bahassou/almous.git
cd almous/backend

2. Set Up a Virtual Environment

It's highly recommended to use a virtual environment to manage dependencies.

# Create the virtual environment
python -m venv venv

# Activate it
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

3. Install Dependencies

Installation command:

pip install -r requirements.txt

4. Configure Environment Variables

You'll need API keys for the different services Almous uses.

Create a file named .env in the backend directory.
Add your API keys to this file.

.env file example:

# Groq API Key (https://console.groq.com/keys)
GROQ_API_KEY="gsk_..."

# Pollinations API Key (https://pollinations.ai/)
POLLINATIONS_API_KEY="..."

# A4F API Key (https://a4f.com/)
A4F_API_KEY="..."

# Jina AI API Key for Embeddings (https://jina.ai/embeddings/)
JINA_API_KEY="jina_..."

5. Run the Application

Once the setup is complete, you can start the Flask server.

flask --app src/app.py run

The server will start, typically at http://127.0.0.1:5000.

API Endpoints

1. Chat Endpoint

This is the main endpoint for all interactions. It supports standard chat, RAG, and the search agent.

URL: /api/chat
Method: POST
Content-Type: application/json
Response: text/event-stream (streaming)

Request Body:

{
  "provider": "groq",
  "model": "llama3-70b-8192",
  "message": "What is Retrieval-Augmented Generation?",
  "system": "You are a helpful AI assistant.",
  "attachment": [],
  "tools": [],
  "stream": true
}

Field Descriptions:

provider (string, required): groq, a4f, or pollination.
model (string, required): The specific model ID for the chosen provider.
message (string, required): The user's prompt.
system (string, optional): A system prompt to guide the AI's behavior.
attachment (list, optional): A list of file paths (currently uses the server-side path of the last uploaded file). Leave as [] for non-RAG chat.
tools (list, optional): A list of tools to activate. Use ["search"] to enable the web search agent.
stream (boolean, optional): Should always be true for the streaming endpoint.

2. File Upload Endpoint

Use this endpoint to upload a document for RAG.

URL: /api/upload
Method: POST
Content-Type: multipart/form-data

Request Body:

A form field named file containing the document you want to upload.

Example curl command:

curl -X POST -F "file=@/path/to/your/document.pdf" http://127.0.0.1:5000/api/upload

Note: The current implementation clears the upload directory and saves only the latest file. This is suitable for single-user, single-document sessions.

3. Get Models Endpoint

Fetch the list of available models for a specific provider.

URL: /api/models
Method: GET
Query Parameters: provider (e.g., /api/models?provider=groq)

💡 How It Works

Standard Chat Flow

A request hits the /api/chat endpoint with no tools or attachments.
The chat_controller validates the request using ChatRequestDTO.
It calls chat_service_completion, passing the provider, model, and messages.
The service retrieves conversation memory and prepares the final prompt.
It invokes the completion method of the selected provider (GroqProvider, etc.).
The provider makes the API call and streams the response back to the client.

RAG Flow (Chat with Documents)

The user first uploads a file to the /api/upload endpoint.
The server saves the file in the tools/rag/data directory.
The user sends a prompt to /api/chat. The frontend should indicate which file to use.
The controller detects an attachment and calls chat_rag_service_completion.
This service triggers generate_data_store from build_database.py.
- The document is converted to Markdown (markitdown).
- The text is cleaned, split into chunks (LangChain).
- The chunks are converted to vector embeddings (JinaEmbeddings).
- The embeddings are stored in a temporary ChromaDB instance.
The user's query is used to perform a similarity search in the ChromaDB.
The most relevant chunks are retrieved and inserted into a prompt template as context.
This final, context-rich prompt is sent to the LLM to generate an answer.

Search Agent Flow

A request hits /api/chat with tools: ["search"].
The controller calls search_agent_service_completion.
Step 1 (Query Generation): The agent sends a request to the LLM with a specialized prompt, asking it to generate 1-2 concise search queries based on the user's message.
Step 2 (Search & Crawl): The agent parses the search queries and uses ddgs to get search results from DuckDuckGo. It then uses crawl4ai to scrape the content from the top links.
Step 3 (Index & Query): The scraped web content is treated like a document. It's indexed into a temporary ChromaDB instance on-the-fly, just like in the RAG flow.
Step 4 (Answer Generation): The original user message is used to query this new web-sourced vector database, and the LLM generates a final answer based on the retrieved real-time information.

📁 Project Structure

almous/backend/
├── src/
│   ├── controllers/
│   │   └── chat_controller.py   # Flask routes and API logic
│   ├── dtos/
│   │   └── chat_dto.py          # Pydantic data transfer objects
│   ├── providers/
│   │   ├── embed/
│   │   │   ├── jina.py          # Jina AI embedding provider
│   │   │   └── gemini_embed.py  # Google Gemini embedding provider
│   │   ├── a4f.py               # A4F LLM provider
│   │   ├── global_completion.py # Main completion logic and memory
│   │   ├── groq.py              # Groq LLM provider
│   │   └── pollination.py       # Pollinations.ai LLM provider
│   ├── services/
│   │   └── chat_service.py      # Business logic for chat, RAG, and search
│   └── tools/
│       ├── rag/
│       │   ├── data/            # Uploaded files for RAG
│       │   ├── db/chroma/       # ChromaDB vector stores
│       │   ├── build_database.py# Logic for processing and indexing docs
│       │   └── query_data.py    # Logic for querying the vector store
│       └── search/
│           ├── crawl.py         # Web crawling logic
│           ├── search.py        # DuckDuckGo search logic
│           └── search_agent.py  # Orchestrates the search agent flow
└── .env                         # Environment variables (you create this)

📈 Future Improvements

Persistent RAG Storage: Implement a more robust system for managing multiple documents and persistent ChromaDB collections.
Add More Tools: Integrate other tools like a code interpreter or calculator.
Enhanced Error Handling: Improve error reporting and resilience.
Unit & Integration Tests: Add a testing suite to ensure code quality.
Containerization: Add a Dockerfile for easy deployment with Docker.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
backend		backend
frontend		frontend
uploads		uploads
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Almous - Your Extensible AI Chat

✨ Core Features

🛠️ Technology Stack

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

4. Configure Environment Variables

5. Run the Application

API Endpoints

1. Chat Endpoint

2. File Upload Endpoint

3. Get Models Endpoint

💡 How It Works

Standard Chat Flow

RAG Flow (Chat with Documents)

Search Agent Flow

📁 Project Structure

📈 Future Improvements

About

Uh oh!

Releases

Packages

Languages

Medamine-Bahassou/almous

Folders and files

Latest commit

History

Repository files navigation

Almous - Your Extensible AI Chat

✨ Core Features

🛠️ Technology Stack

🚀 Getting Started

Prerequisites

1. Clone the Repository

2. Set Up a Virtual Environment

3. Install Dependencies

4. Configure Environment Variables

5. Run the Application

API Endpoints

1. Chat Endpoint

2. File Upload Endpoint

3. Get Models Endpoint

💡 How It Works

Standard Chat Flow

RAG Flow (Chat with Documents)

Search Agent Flow

📁 Project Structure

📈 Future Improvements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages