An agentic AI system that helps users find the optimal GPU deployment strategy on GMI Cloud. Powered by Kimi K2.5 with autonomous tool use and multi-step reasoning.
Unlike a simple chatbot or dashboard, this system:
- Autonomously decides which tools to call based on your natural language input
- Multi-step reasoning — chains multiple tool calls to build a complete analysis
- Adaptive — asks clarifying questions or makes reasonable assumptions
- Conversational — maintains context across turns for follow-up questions
User (natural language) → Kimi K2.5 (reasoning + planning)
↕ (function calling)
Tool Execution Engine
↕ (structured results)
Kimi K2.5 (synthesis)
→ Recommendation + Charts
| Tool | Description |
|---|---|
get_gpu_catalog |
Browse available GPUs with specs and pricing |
calculate_cost |
Compute monthly costs for specific configurations |
find_cheapest |
Find optimal option within a budget constraint |
compare_deployment_modes |
Find serverless vs dedicated crossover points |
generate_scaling_plan |
Plan infrastructure for traffic growth |
visualize_cost_comparison |
Generate interactive cost comparison charts |
Online (Vercel): https://gmi-gpu-optimizer.vercel.app
Local (Streamlit):
pip install -r requirements.txt
export GMI_API_KEY="your-gmi-api-key"
streamlit run app.pyThen try:
- "I need to deploy a Llama 70B model for a chatbot, expecting 50 QPS with a $15k/month budget"
- "Compare serverless vs dedicated for a 7B model with 5 QPS"
- "Help me plan scaling from 10 QPS to 200 QPS for a 34B code assistant"
- Agent LLM: Kimi K2.5 (Moonshot AI) via GMI Cloud Inference API
- Tool Use: OpenAI-compatible function calling
- Frontend: Streamlit (local) / Vanilla HTML+JS (Vercel)
- Charts: Plotly
- Deployment: Vercel (Python serverless + static frontend)
- API: GMI Cloud (
api.gmi-serving.com)
| GPU | VRAM | Price |
|---|---|---|
| H200 SXM | 141GB HBM3e | $3.99/hr |
| H100 SXM | 80GB HBM3 | $2.49/hr |
| A100 SXM | 80GB HBM2e | $1.89/hr |
| L40S | 48GB GDDR6X | $1.29/hr |
| Serverless | Auto | $0.35/$1.10 per 1M tokens |
GMI Cloud Hackathon 2026