Transform trending topics into golden blog posts with just a few keywords!
I wanted to write SEO-friendly blog posts for my portfolio website pratim.me about the latest technology trends, especially in AI and software engineering. However, keyword research, content research, and writing well-structured, SEO-friendly blog posts is a time-consuming process. As a lazy engineer, I decided to leverage an LLM to help automate this. And thus, BlogForge was created!
-
Keyword Research & Search
- Takes trending Google search keywords (up to 5 at a time).
- Performs Google searches using googlesearch-python, focusing on blog posts related to these topics.
-
Content Crawling
- Crawls the first n pages (configurable, currently set to 7) using Jina AI's Public API.
- Alternatively, uses Crawl4AI's API (configured via Docker Compose) for self-hosted crawling.
-
Data Processing & Storage
- Chunks the crawled data.
- Embeds them using Google Gemini’s embeddings.
- Stores the processed data in a local ChromaDB vector store.
-
AI-Generated Blog Writing
- Supports multiple LLM providers:
- Gemini via Google AI
- LLaMA models via Groq
- DeepSeek models via DeepSeek AI
- Local models via Ollama
- Generates structured, SEO-optimized blog posts using your preferred LLM
- Supports multiple LLM providers:
-
Interactive Refinement
- Provides a Streamlit-based UI (Streamlit) to allow users to refine the blog post interactively.
- Allows users to chat with the AI to fine-tune content.
-
Session Management & Persistence
- Uses Supabase DB for storing crawled website content and user chat history.
-
Install pipenv:
pip install pipenv
pipenv shell
pipenv install-
Copy the example environment file:
cp example.env .env
-
Populate
.envwith required API keys and settings:GOOGLE_API_KEY- Required for Google Search and Gemini modelsGROQ_API_KEY- Required for Groq LLM modelsDEEP_SEEK_API_KEY- Required for DeepSeek modelsJINA_API_KEY- Required when using Jina for crawlingSUPABASE_URLandSUPABASE_KEY- Required for database functionalityLLM_TO_USE- Choose your LLM provider: "GEMINI", "GROQ", "DEEPSEEK", or "OLLAMA"LLM_MODEL- Specify the model name for your chosen provider
pipenv run start- Set
CRAWLER_PROVIDER=CRAWL4AIin.env. - Generate a random secret and set it as
CRAWL4AI_API_TOKEN.
├── Pipfile
├── Pipfile.lock
├── README.md
├── __init__.py
├── app.py
├── assets
│ └── logo.webp
├── config.py
├── crawler
│ ├── __init__.py
│ ├── crawler.py
│ └── processor.py
├── db
│ ├── __init__.py
│ ├── supabase.py
│ └── vector_store.py
├── docker-compose.yaml
├── main.py
├── rag
│ ├── __init__.py
│ ├── chains.py
│ ├── chat_history.py
│ ├── embeddings.py
│ ├── llm.py
│ └── prompts.py
├── tools
│ ├── __init__.py
│ ├── crawl4ai.py
│ ├── jina.py
│ └── search.py
└── utils
├── __init__.py
├── helpers.py
└── logger.py- ✅ Enable Classic LLM Chat-like streaming.
- ✅ Improve overall performance.
- ✅ Enable Docker deployment.
We welcome contributions! To contribute:
- Fork the repository.
- Create a new branch (
git checkout -b feature-branch). - Commit your changes (
git commit -m 'Add new feature'). - Push to your branch (
git push origin feature-branch). - Create a pull request.
Please make sure your contributions adhere to best coding practices and include necessary documentation.
This project is licensed under the MIT License.
Big thanks to:
- Jina AI for providing free API access.
- Google AI for Gemini models and embeddings.
- Groq for high-performance LLaMA model inference.
- DeepSeek for their powerful AI models.
- Ollama for local model support.
- Supabase for database storage.
- ChromaDB for the vector store.
- Streamlit for the frontend framework.
Happy Blogging! 🚀
