Almous is a powerful and flexible AI backend designed to serve as the brain for advanced chat applications. It goes beyond simple Q&A by integrating multiple Large Language Model (LLM) providers, Retrieval-Augmented Generation (RAG) for document interaction, and an autonomous web search agent to provide answers based on real-time information from the internet.
Built with Python and Flask, Almous is modular, easy to extend, and ready to power your next-generation AI assistant.
- Multi-Provider LLM Integration: Seamlessly switch between different LLM providers.
- 🚀 Groq: For incredibly fast inference speeds.
- 🧠 A4F (AI4Finance): For access to specialized models.
- 🎨 Pollinations.ai: For creative and diverse model options.
- Retrieval-Augmented Generation (RAG): Chat with your documents.
- Upload files (PDFs, Markdown, etc.) via a simple API endpoint.
- Almous processes, chunks, and indexes the content in a ChromaDB vector store.
- Ask questions and get answers sourced directly from your documents.
- Autonomous Search Agent: Get answers from the web.
- When activated, the AI first generates relevant search queries based on your prompt.
- It uses DuckDuckGo to perform searches and crawls the top results.
- The scraped web content is then used as a knowledge base to generate a comprehensive, up-to-date answer.
- Real-time Streaming: Responses are streamed word-by-word using Server-Sent Events (SSE) for a responsive user experience.
- Modular & Extensible Architecture: The codebase is organized into controllers, services, providers, and tools, making it easy to add new features, LLM providers, or tools.
- Conversation Memory: Remembers the last few turns of the conversation to maintain context.
| Category | Technology |
|---|---|
| Backend Framework | Flask |
| LLM Providers | Groq, A4F, Pollinations.ai |
| RAG & VectorDB | LangChain, ChromaDB |
| Embedding Models | Jina AI, Google Gemini (available) |
| Web Search & Crawling | ddgs (DuckDuckGo Search), crawl4ai |
| Data Validation | Pydantic |
| Document Processing | markitdown (converts various file types to Markdown) |
| Environment Mgmt | python-dotenv |
Follow these instructions to get the Almous backend up and running on your local machine.
- Python 3.10+
pippackage manager- Git
git clone https://github.com/Medamine-Bahassou/almous.git
cd almous/backendIt's highly recommended to use a virtual environment to manage dependencies.
# Create the virtual environment
python -m venv venv
# Activate it
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activateInstallation command:
pip install -r requirements.txtYou'll need API keys for the different services Almous uses.
- Create a file named
.envin thebackenddirectory. - Add your API keys to this file.
.env file example:
# Groq API Key (https://console.groq.com/keys)
GROQ_API_KEY="gsk_..."
# Pollinations API Key (https://pollinations.ai/)
POLLINATIONS_API_KEY="..."
# A4F API Key (https://a4f.com/)
A4F_API_KEY="..."
# Jina AI API Key for Embeddings (https://jina.ai/embeddings/)
JINA_API_KEY="jina_..."
Once the setup is complete, you can start the Flask server.
flask --app src/app.py runThe server will start, typically at http://127.0.0.1:5000.
This is the main endpoint for all interactions. It supports standard chat, RAG, and the search agent.
- URL:
/api/chat - Method:
POST - Content-Type:
application/json - Response:
text/event-stream(streaming)
Request Body:
{
"provider": "groq",
"model": "llama3-70b-8192",
"message": "What is Retrieval-Augmented Generation?",
"system": "You are a helpful AI assistant.",
"attachment": [],
"tools": [],
"stream": true
}Field Descriptions:
provider(string, required):groq,a4f, orpollination.model(string, required): The specific model ID for the chosen provider.message(string, required): The user's prompt.system(string, optional): A system prompt to guide the AI's behavior.attachment(list, optional): A list of file paths (currently uses the server-side path of the last uploaded file). Leave as[]for non-RAG chat.tools(list, optional): A list of tools to activate. Use["search"]to enable the web search agent.stream(boolean, optional): Should always betruefor the streaming endpoint.
Use this endpoint to upload a document for RAG.
- URL:
/api/upload - Method:
POST - Content-Type:
multipart/form-data
Request Body:
- A form field named
filecontaining the document you want to upload.
Example curl command:
curl -X POST -F "file=@/path/to/your/document.pdf" http://127.0.0.1:5000/api/uploadNote: The current implementation clears the upload directory and saves only the latest file. This is suitable for single-user, single-document sessions.
Fetch the list of available models for a specific provider.
- URL:
/api/models - Method:
GET - Query Parameters:
provider(e.g.,/api/models?provider=groq)
- A request hits the
/api/chatendpoint with no tools or attachments. - The
chat_controllervalidates the request usingChatRequestDTO. - It calls
chat_service_completion, passing the provider, model, and messages. - The service retrieves conversation memory and prepares the final prompt.
- It invokes the
completionmethod of the selected provider (GroqProvider, etc.). - The provider makes the API call and streams the response back to the client.
- The user first uploads a file to the
/api/uploadendpoint. - The server saves the file in the
tools/rag/datadirectory. - The user sends a prompt to
/api/chat. The frontend should indicate which file to use. - The controller detects an attachment and calls
chat_rag_service_completion. - This service triggers
generate_data_storefrombuild_database.py.- The document is converted to Markdown (
markitdown). - The text is cleaned, split into chunks (
LangChain). - The chunks are converted to vector embeddings (
JinaEmbeddings). - The embeddings are stored in a temporary ChromaDB instance.
- The document is converted to Markdown (
- The user's query is used to perform a similarity search in the ChromaDB.
- The most relevant chunks are retrieved and inserted into a prompt template as context.
- This final, context-rich prompt is sent to the LLM to generate an answer.
- A request hits
/api/chatwithtools: ["search"]. - The controller calls
search_agent_service_completion. - Step 1 (Query Generation): The agent sends a request to the LLM with a specialized prompt, asking it to generate 1-2 concise search queries based on the user's message.
- Step 2 (Search & Crawl): The agent parses the search queries and uses
ddgsto get search results from DuckDuckGo. It then usescrawl4aito scrape the content from the top links. - Step 3 (Index & Query): The scraped web content is treated like a document. It's indexed into a temporary ChromaDB instance on-the-fly, just like in the RAG flow.
- Step 4 (Answer Generation): The original user message is used to query this new web-sourced vector database, and the LLM generates a final answer based on the retrieved real-time information.
almous/backend/
├── src/
│ ├── controllers/
│ │ └── chat_controller.py # Flask routes and API logic
│ ├── dtos/
│ │ └── chat_dto.py # Pydantic data transfer objects
│ ├── providers/
│ │ ├── embed/
│ │ │ ├── jina.py # Jina AI embedding provider
│ │ │ └── gemini_embed.py # Google Gemini embedding provider
│ │ ├── a4f.py # A4F LLM provider
│ │ ├── global_completion.py # Main completion logic and memory
│ │ ├── groq.py # Groq LLM provider
│ │ └── pollination.py # Pollinations.ai LLM provider
│ ├── services/
│ │ └── chat_service.py # Business logic for chat, RAG, and search
│ └── tools/
│ ├── rag/
│ │ ├── data/ # Uploaded files for RAG
│ │ ├── db/chroma/ # ChromaDB vector stores
│ │ ├── build_database.py# Logic for processing and indexing docs
│ │ └── query_data.py # Logic for querying the vector store
│ └── search/
│ ├── crawl.py # Web crawling logic
│ ├── search.py # DuckDuckGo search logic
│ └── search_agent.py # Orchestrates the search agent flow
└── .env # Environment variables (you create this)
- Persistent RAG Storage: Implement a more robust system for managing multiple documents and persistent ChromaDB collections.
- Add More Tools: Integrate other tools like a code interpreter or calculator.
- Enhanced Error Handling: Improve error reporting and resilience.
- Unit & Integration Tests: Add a testing suite to ensure code quality.
- Containerization: Add a
Dockerfilefor easy deployment with Docker.






