Sidekick AI is a goal-driven, self-evaluating AI agent designed to work like a real co-worker — not a chatbot.
It was built to solve a core limitation of modern LLMs: they generate answers, but they don’t complete work reliably.
Sidekick focuses on finishing tasks, validating results, and iterating until success.
Even in the era of powerful LLMs:
- Responses are often one-shot
- Errors go unchecked
- Users must manually say “retry”, “fix this”, “go deeper”
- Complex tasks require constant supervision
Sidekick was built to:
- Work toward an explicit success criteria
- Use real tools instead of hallucinating
- Evaluate its own output
- Ask for clarification only when genuinely needed
In short:
Less prompting. More completion.
User Input + Success Criteria
↓
Worker LLM (Task Executor)
↓
Tool Usage (Search, Browser, Python, Notion, Files)
↓
Worker LLM (Refinement)
↓
Evaluator LLM (Judges Output)
↓
Done OR Iterate OR Ask User
-
Worker LLM
- Plans and executes tasks
- Decides when to use tools
- Refines responses iteratively
-
Evaluator LLM
- Checks output against success criteria
- Provides structured feedback
- Decides if task is complete or stuck
-
LangGraph
- Controls deterministic agent flow
- Enables retry loops and state handling
-
Tooling Layer
- Playwright (real browsing)
- Web search (fresh data)
- Python REPL (computation)
- Notion (long-term memory)
- File system access
| Traditional Chatbots | Sidekick AI |
|---|---|
| One-shot answers | Iterative task completion |
| Hallucinated browsing | Real browser automation |
| No validation | Self-evaluation loop |
| User-driven retries | Agent-driven refinement |
| Stateless | Persistent memory |
Sidekick behaves like a junior engineer or research assistant, not a text generator.
- Research & note-taking automation
- Knowledge base building (Notion)
- Engineering/debugging workflows
- Data analysis via Python
- Personal AI co-worker
- Long, multi-step tasks with quality control
- Replace evaluator with rule-based checks where possible
- Cache intermediate reasoning and tool outputs
- Use smaller models for evaluation steps
- Parallelize tool calls
- Reduce redundant LLM invocations
- Stream partial results
-
Replace OpenAI with:
- LLaMA / Mistral / Qwen (via vLLM)
- Local inference for private tasks
-
Hybrid routing: small models first, large models only if needed
- Email automation
- Calendar & reminders
- Database / SQL access
- GitHub & Jira integration
- Cloud storage & CRMs
Sidekick AI is not built to talk more — it’s built to work better.
This project represents the shift from LLM chatbots to agentic AI systems designed for real productivity.