Skip to content

bojiang3/gmi-gpu-optimizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡ GMI GPU Cost Optimizer Agent

An agentic AI system that helps users find the optimal GPU deployment strategy on GMI Cloud. Powered by Kimi K2.5 with autonomous tool use and multi-step reasoning.

Demo

What Makes This an Agent?

Unlike a simple chatbot or dashboard, this system:

  • Autonomously decides which tools to call based on your natural language input
  • Multi-step reasoning — chains multiple tool calls to build a complete analysis
  • Adaptive — asks clarifying questions or makes reasonable assumptions
  • Conversational — maintains context across turns for follow-up questions

Agent Architecture

User (natural language) → Kimi K2.5 (reasoning + planning)
                              ↕ (function calling)
                         Tool Execution Engine
                              ↕ (structured results)
                         Kimi K2.5 (synthesis)
                              → Recommendation + Charts

Agent Tools

Tool Description
get_gpu_catalog Browse available GPUs with specs and pricing
calculate_cost Compute monthly costs for specific configurations
find_cheapest Find optimal option within a budget constraint
compare_deployment_modes Find serverless vs dedicated crossover points
generate_scaling_plan Plan infrastructure for traffic growth
visualize_cost_comparison Generate interactive cost comparison charts

Quick Start

Online (Vercel): https://gmi-gpu-optimizer.vercel.app

Local (Streamlit):

pip install -r requirements.txt
export GMI_API_KEY="your-gmi-api-key"
streamlit run app.py

Then try:

  • "I need to deploy a Llama 70B model for a chatbot, expecting 50 QPS with a $15k/month budget"
  • "Compare serverless vs dedicated for a 7B model with 5 QPS"
  • "Help me plan scaling from 10 QPS to 200 QPS for a 34B code assistant"

Tech Stack

  • Agent LLM: Kimi K2.5 (Moonshot AI) via GMI Cloud Inference API
  • Tool Use: OpenAI-compatible function calling
  • Frontend: Streamlit (local) / Vanilla HTML+JS (Vercel)
  • Charts: Plotly
  • Deployment: Vercel (Python serverless + static frontend)
  • API: GMI Cloud (api.gmi-serving.com)

GPU Options (Mock Pricing)

GPU VRAM Price
H200 SXM 141GB HBM3e $3.99/hr
H100 SXM 80GB HBM3 $2.49/hr
A100 SXM 80GB HBM2e $1.89/hr
L40S 48GB GDDR6X $1.29/hr
Serverless Auto $0.35/$1.10 per 1M tokens

Built At

GMI Cloud Hackathon 2026

About

⚡ Agentic AI GPU cost optimizer with autonomous tool use — Powered by Kimi K2.5 on GMI Cloud | Built at GMI Hackathon 2026

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors