DakiKobo — AI Agricultural Advisor for Burkina Faso 🌾

title

DakiKobo

sdk

docker

app_port

7860

suggested_hardware

cpu-basic

startup_duration_timeout

1h

preload_from_hub

sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

DakiKobo — AI Agricultural Advisor for Burkina Faso 🌾

DakiKobo is a French-language AI assistant for smallholder farmers in Burkina Faso. It uses a Retrieval-Augmented Generation (RAG) pipeline grounded in agricultural reference documents (FAO, AGRA, WFP and technical guides for the Sahel and Sudanian Savanna zones) so answers stay accurate and source-backed rather than invented.

The focus crops are mil (millet), sorgho (sorghum), maïs (maize), niébé (cowpea) and arachide (groundnut). All output — answers, UI labels and voice — is in French, and the interface is mobile-first for use on phones.

Features

Grounded French answers — RAG over a local document corpus; off-topic questions fall back to an honest "je ne sais pas" instead of hallucinating.
Source citations — each answer shows which document(s) it was drawn from.
Fast inference — Groq-hosted llama-3.3-70b-versatile.
Multilingual retrieval — paraphrase-multilingual-MiniLM-L12-v2 embeddings for good French matching, stored in a persistent ChromaDB (built once, fast on restart).
Hosted warm-up — the Docker Space can prepare RAG in the background after startup so the first public question is less likely to pay the full indexing cost.
Voice output (TTS) — answers can auto-play in French via gTTS and be replayed from their answer bubble.
Voice input (STT) — records a short browser audio clip and transcribes it with Groq Whisper, with native browser speech recognition as a fallback.
Quota-safe public examples — one-tap demo answers for text, fertilizer guidance and a sample image case, without spending live API calls.
Focused mobile UI — examples stay visible, while weather and soil tools sit behind an Outils drawer so the conversation remains the main workspace.
Trust panel — a compact Sources & limites dialog explains evidence, approximate signals, and required field confirmation.
Deterministic fertilizer doses — source-grounded INERA/Burkina recommendations (never invented), with a "confirmez avec votre agent" disclaimer.
Leaf disease screening (optional) — upload a leaf photo for a hedged French screening via Gemini Vision, with a "ceci n'est pas un diagnostic" disclaimer (requires a Gemini API key).
Weather-aware field signals — Open-Meteo rainfall, ET0, soil moisture and short-term forecast cards for selected Burkina Faso locations.
Soil-aware fertilizer context — SoilGrids texture, organic carbon, pH and retention-risk classes combined with deterministic fertilizer guidance.
Feedback capture — 👍 / 👎 under each answer, logged to data/feedback.csv (no database).
Mobile-first responsive UI — fills the screen on phones, input pinned to the bottom.

Tech stack

Component	Technology
Web framework	Flask
LLM inference	Groq — `llama-3.3-70b-versatile`
RAG orchestration	LangChain (core + community)
Embeddings	`sentence-transformers` — multilingual MiniLM L12
Vector store	ChromaDB (persistent)
Knowledge ingestion	Reviewed Markdown primary; PyPDF2 PDF fallback
Text-to-speech	gTTS (French)
Weather data	Open-Meteo Forecast API
Soil indicators	SoilGrids REST API

Setup

Prerequisites

Python 3.10 or 3.11
A Groq API key (from the Groq console)
uv recommended for fast environments (plain venv + pip also works)

1. Create the environment and install dependencies

Using uv (recommended):

uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -r requirements.txt

Or with standard tools:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

2. Configure your API key

Secrets are loaded from a .env file — never put keys in source code.

cp .env.example .env

Then edit .env and set your key:

GROQ_API_KEY=your_real_key_here

Optional — leaf disease screening needs a Google Gemini key (from Google AI Studio):

GEMINI_API_KEY=your_gemini_api_key_here

Verify it works anytime with:

python scripts/test_gemini.py

Optional overrides (defaults in config.py are fine for development):

# LLM_MODEL=llama-3.3-70b-versatile
# APP_VERSION=0.1.0
# LOG_LEVEL=INFO
# LLM_MAX_TOKENS=512
# LLM_TEMPERATURE=0.1
# STT_MODEL=whisper-large-v3-turbo
# STT_LANGUAGE=fr
# MAX_AUDIO_UPLOAD_MB=5.0
# GEMINI_MODEL=gemini-2.5-flash
# FLASK_DEBUG=true
# PREFER_MARKDOWN_KB=true # use Data/markdown before PDF fallback
# REBUILD_VECTORSTORE=true   # force a fresh index rebuild

3. Add knowledge documents

Place reviewed Markdown documents under Data/markdown/. DakiKobo uses this Markdown corpus first because it is smaller and cleaner than extracting PDFs at startup. Keep original Burkina Faso agriculture PDFs anywhere under Data/ as source files and as a fallback; subfolders are discovered recursively.

4. Run

python app.py

Open http://127.0.0.1:5000 in your browser.

First run: DakiKobo builds the vector index from the reviewed Markdown corpus and saves it to chroma_db/. On CPU this one-time build can take several minutes. Subsequent starts load the saved index and are fast. To rebuild later (e.g. after adding documents or changing the embedding model), start with REBUILD_VECTORSTORE=true to replace the saved index.

Usage

Ask a question about Burkina Faso agriculture in French.
Quick chips above the input send common questions in one tap.
Voice output: tick "Activer la lecture vocale" to hear answers read aloud.
Voice input: tap the microphone, speak, then tap again to stop or wait for auto-stop.
Feedback: use 👍 / 👎 under an answer — entries are appended to data/feedback.csv.

Configuration reference

All tunables live in config.py (overridable via environment variables where shown):

Setting	Default	Purpose
`APP_VERSION`	`0.1.0`	Version string returned by `/version`
`LOG_LEVEL`	`INFO`	Structured JSON application log level
`LLM_MODEL`	`llama-3.3-70b-versatile`	Groq chat model
`EMBEDDING_MODEL`	`paraphrase-multilingual-MiniLM-L12-v2`	Sentence-transformer for retrieval
`SIMILARITY_THRESHOLD`	`0.2`	Min relevance to use a chunk (else fallback)
`CITATION_SCORE_MARGIN`	`0.12`	Drop secondary source cards far below the best match
`MAX_RAG_SOURCES`	`2`	Maximum RAG source cards shown per answer
`CHUNK_SIZE` / `CHUNK_OVERLAP`	`500` / `100`	Document splitting
`VECTORSTORE_DIR`	`chroma_db`	Persisted index location (git-ignored)
`RAG_WARMUP_ON_START`	`false` locally, `true` in Docker	Background RAG warm-up on hosted startup
`DATA_FOLDER`	`Data`	Root folder for source documents
`MARKDOWN_FOLDER`	`Data/markdown`	Reviewed Markdown corpus for RAG
`PREFER_MARKDOWN_KB`	`true`	Use Markdown first; fallback to PDFs if needed
`TTS_LANGUAGE`	`fr`	Voice output language
`TTS_TIMEOUT_SECONDS`	`8.0`	Max wait for gTTS before returning no audio
`STT_MODEL`	`whisper-large-v3-turbo`	Groq model for voice input transcription
`STT_LANGUAGE`	`fr`	Voice input language hint
`MAX_AUDIO_UPLOAD_MB`	`5.0`	Maximum uploaded voice recording size
`REQUEST_COOLDOWN_SECONDS`	`2.0`	Per-session cooldown for `/ask` requests
`VOICE_COOLDOWN_SECONDS`	`2.0`	Per-session cooldown for voice transcription
`IMAGE_COOLDOWN_SECONDS`	`6.0`	Per-session cooldown for image screening
`MAX_IMAGE_UPLOAD_MB`	`5.0`	Maximum uploaded image size

Project layout

dakikobo/
├── app.py               # Flask entry point + routes (/ , /ask , /feedback)
├── config.py            # Central configuration
├── core/
│   ├── llm_chain.py     # LLM + RetrievalQA setup and French prompt
│   └── rag_pipeline.py  # Markdown/PDF ingestion, embeddings, Chroma, TTS
├── templates/index.html # Chat UI
├── static/              # CSS, JS, images, generated audio
├── Data/                # Source PDFs + reviewed Markdown knowledge corpus
└── requirements.txt

Notes

.env, chroma_db/, generated audio and data/feedback.csv are git-ignored.
/healthz reports readiness; /version reports app version, commit if exposed by the host, and key runtime config flags.
Runtime logs are JSON lines with route, status, latency, model/feature, failure type and confidence where available. Raw questions, answers, images and audio are not logged.
This tool gives general guidance; users should confirm specifics (e.g. fertilizer doses) with a local agricultural extension agent.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Data		Data
core		core
models_sugestions		models_sugestions
scripts		scripts
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHATGPT_PROJECT_SETUP.md		CHATGPT_PROJECT_SETUP.md
DATA_SOURCES.md		DATA_SOURCES.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
IMPLEMENTATION_PLAN.md		IMPLEMENTATION_PLAN.md
KNOWLEDGE_BASE_INDEX.md		KNOWLEDGE_BASE_INDEX.md
MULTI_MODEL_PROMPT.md		MULTI_MODEL_PROMPT.md
PHD_ROADMAP_composer_2.5_suggestion.md		PHD_ROADMAP_composer_2.5_suggestion.md
PROJECT_STATE.md		PROJECT_STATE.md
Procfile		Procfile
README.md		README.md
TODO.md		TODO.md
app.py		app.py
config.py		config.py
gemini-suggestions.md		gemini-suggestions.md
organize.sh		organize.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DakiKobo — AI Agricultural Advisor for Burkina Faso 🌾

Features

Tech stack

Setup

Prerequisites

1. Create the environment and install dependencies

2. Configure your API key

3. Add knowledge documents

4. Run

Usage

Configuration reference

Project layout

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DakiKobo — AI Agricultural Advisor for Burkina Faso 🌾

Features

Tech stack

Setup

Prerequisites

1. Create the environment and install dependencies

2. Configure your API key

3. Add knowledge documents

4. Run

Usage

Configuration reference

Project layout

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages