A production-ready Rust CLI tool that scaffolds modern, fully-functional FastAPI-based machine learning APIs with built-in templates, automatic dependency management, and smart project initialization.
3 Built-in Templates
- Minimal β Single model, simple request/response (ideal for single detectors)
- Multi-model β Model registry pattern for managing multiple models
- Async-heavy β Queue-based inference with background job processing (Redis-ready)
Framework Support
- PyTorch (
--framework torch) - ONNX Runtime (
--framework onnx) - TensorFlow Lite (
--framework tflite) - Automatic dependency injection based on framework selection
Smart Project Setup
- Auto-creates Python virtual environment
- Auto-installs all dependencies
- Optional
--no-setupflag to scaffold files only
Production-Ready Boilerplate
- Pydantic models for type-safe request/response handling
- FastAPI lifespan events for model loading/cleanup
- Docker support (Dockerfile included)
- Environment configuration (.env.example)
- Comprehensive README in generated projects
Model Management
- Add new models to existing projects with
penguin add model - Each model gets its own inference stub and utilities
- Template-based for consistency
Project Initialization
- Initialize existing FastAPI projects with
penguin init - Automatic venv + dependency setup
- Validates project structure
cargo install penguin-ml
penguin new my-detector --template minimal --framework torch
cd my-detector && source venv/bin/activate && python src/main.pyVisit http://localhost:8000/docs to see your API!
cargo install penguin-mlThen verify installation:
penguin --helpgit clone https://github.com/yourusername/penguin
cd penguin
cargo install --path .Download the latest release from GitHub Releases:
chmod +x penguin
mv penguin ~/.local/bin/ # or /usr/local/bin/# Interactive mode (prompts for template and framework)
penguin new my-detector
# With explicit options
penguin new my-detector --template minimal --framework torch
# Skip venv setup (manual setup later)
penguin new my-detector --template multi-model --framework onnx --no-setupcd my-detector
source venv/bin/activate
python src/main.pyVisit http://localhost:8000/docs to see interactive API documentation (Swagger UI).
penguin add model my_custom_detectorThis creates:
src/models/my_custom_detector/
βββ inference.py # Model loading and inference logic
βββ utils.py # Preprocessing/postprocessing utilities
βββ __init__.py
If you have an existing FastAPI project and want to add venv + dependencies:
cd /path/to/existing/project
penguin initScaffold a new FastAPI ML project.
Options:
-t, --template <TEMPLATE>β Choose template:minimal,multi-model,async-heavy-f, --framework <FRAMEWORK>β Choose framework:torch,onnx,tflite--no-setupβ Skip venv creation and dependency installation
Examples:
penguin new detector-api --template minimal --framework torch
penguin new multi-model-api --template multi-model --framework onnx --no-setup
penguin new queue-api --template async-heavy --framework tfliteAdd a new model to an existing penguin project.
Usage:
cd your-project
penguin add model yolo_v5
penguin add model faster_rcnnOutput:
src/models/<model-name>/inference.pyβ Model loading and prediction logicsrc/models/<model-name>/utils.pyβ Helper functionssrc/models/<model-name>/__init__.pyβ Package initialization
Initialize venv and dependencies in an existing FastAPI project.
Usage:
cd existing-fastapi-project
penguin initRequirements:
- Project must contain
src/app.pyorsrc/main.py requirements.txtmust exist
Best for single-model inference tasks.
API Endpoints:
GET /β Root endpointGET /api/healthβ Health checkPOST /api/predictβ Run inference
Structure:
src/
βββ main.py # Entry point
βββ app.py # FastAPI app factory
βββ api/
β βββ routes.py # API endpoints
β βββ schemas.py # Pydantic models
βββ core/
β βββ config.py # Settings & configuration
βββ models/
βββ model_1/
βββ inference.py # Model inference
βββ utils.py # Utility functions
Best for projects with multiple detectors/models.
API Endpoints:
GET /api/healthβ Health checkGET /api/modelsβ List available modelsPOST /api/predict/{model_name}β Run inference with specific model
Key Components:
ModelRegistryβ Central model management/api/modelsendpoint shows registered models- Route-based model selection
Structure:
src/
βββ main.py
βββ app.py
βββ api/
β βββ routes.py
β βββ schemas.py
βββ core/
β βββ config.py
β βββ model_registry.py # Model management
βββ models/
βββ detector_1/
βββ detector_2/
Best for long-running inference tasks (image processing, batch jobs, etc.).
API Endpoints:
GET /api/healthβ Health checkPOST /api/submit-jobβ Submit async inference jobGET /api/job-status/{job_id}β Check job progressGET /api/results/{job_id}β Retrieve job results (when ready)
For Production: Replace in-memory queue with Redis or RabbitMQ for persistence and multi-worker support.
Structure:
src/
βββ main.py
βββ app.py
βββ api/
β βββ routes.py
β βββ schemas.py
βββ core/
β βββ config.py
β βββ job_queue.py # Queue management
βββ workers/
βββ inference.py # Background job processing
Includes:
torch==2.11.0torchvision==0.26.0numpy==2.4.4
Example:
penguin new torch-detector --template minimal --framework torchIncludes:
onnx==1.21.0onnxruntime==1.24.4numpy==2.4.4
Example:
penguin new onnx-api --template multi-model --framework onnxIncludes:
tensorflow==2.21.0numpy==2.4.4
Example:
penguin new tflite-server --template async-heavy --framework tfliteEach generated project follows this structure:
my-project/
βββ src/
β βββ main.py # Entry point
β βββ app.py # FastAPI app factory
β βββ api/
β β βββ __init__.py
β β βββ routes.py # API endpoints
β β βββ schemas.py # Pydantic models
β βββ core/
β β βββ __init__.py
β β βββ config.py # Settings (pydantic-settings)
β βββ models/
β βββ model_1/ # Your model(s)
β βββ __init__.py
β βββ inference.py # Model class & predict()
β βββ utils.py # Preprocessing/postprocessing
βββ requirements.txt # Python dependencies (framework-aware)
βββ .env.example # Environment template
βββ Dockerfile # Production containerization
βββ README.md # Project-specific docs
βββ venv/ # Virtual environment (created by penguin new)
# Create minimal project with PyTorch
penguin new face-detector --template minimal --framework torch
cd face-detector
source venv/bin/activate
# Edit src/models/model_1/inference.py to load your face detection model
# Edit src/api/routes.py to handle face detection input/output
# Start server
python src/main.pyVisit http://localhost:8000/docs and test the /api/predict endpoint.
# Create multi-model project
penguin new detection-system --template multi-model --framework onnx
cd detection-system
source venv/bin/activate
# Add multiple detectors
penguin add model person_detector
penguin add model vehicle_detector
penguin add model pose_estimator
# Update src/core/config.py to register models
# Update each model's inference.py with actual ONNX logic
# Start server
python src/main.pyNow you can:
curl http://localhost:8000/api/models
# {"models": ["person_detector", "vehicle_detector", "pose_estimator"], "count": 3}
curl -X POST http://localhost:8000/api/predict/person_detector \
-H "Content-Type: application/json" \
-d '{"data": "base64-encoded-image"}'# Create async-heavy project for long-running jobs
penguin new batch-processor --template async-heavy --framework tflite --no-setup
cd batch-processor
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Edit src/workers/inference.py for actual inference logic
python src/main.pyUsage:
# Submit a job
JOB=$(curl -X POST http://localhost:8000/api/submit-job \
-H "Content-Type: application/json" \
-d '{"data": "input"}' | jq -r .job_id)
# Poll for status
curl http://localhost:8000/api/job-status/$JOB
# Get results when complete
curl http://localhost:8000/api/results/$JOBEach project includes a .env.example file:
cp .env.example .env
# Edit .env with your configurationCommon Variables:
HOSTβ Server host (default:0.0.0.0)PORTβ Server port (default:8000)DEBUGβ Enable debug mode (default:false)RELOADβ Auto-reload on code changes (default:true)
See .env.example for template-specific variables.
Edit src/core/config.py to customize model paths and settings:
class Settings(BaseSettings):
model_name: str = "model_1"
model_path: str = "models/model_1" # Change this
# ... other settingsEach generated project includes a Dockerfile:
# Build image
docker build -t my-detector .
# Run container
docker run -p 8000:8000 my-detector
# With environment variables
docker run -p 8000:8000 \
-e PORT=8000 \
-e RELOAD=false \
my-detectorSolution:
# Install Python 3.8+
# macOS
brew install python@3.11
# Ubuntu/Debian
sudo apt-get install python3.11 python3.11-venv
# Then try again
penguin new my-projectSolution: Use --no-setup and install manually:
penguin new my-project --framework torch --no-setup
cd my-project
python -m venv venv
source venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txtSolution: Edit src/models/<model>/inference.py and implement the _load_model() method:
def _load_model(self):
"""Load your actual model here."""
import torch
self.model = torch.load("path/to/model.pth")Solution: Remove and recreate:
rm -rf venv
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtgit clone https://github.com/yourusername/penguin
cd penguin
cargo build --releaseBinary will be at: target/release/penguin
cargo testThe project is already published on crates.io:
# Install from crates.io
cargo install penguin-ml
# To publish updates:
# 1. Update version in Cargo.toml
# 2. cargo publishCurrent version: v0.1.0 (penguin-ml on crates.io)
Contributions welcome! Areas for improvement:
- Remote template registry support
- Additional framework templates (JAX, Hugging Face Transformers)
- Advanced monitoring/metrics
- WebSocket support for streaming inference
- Multi-GPU deployment helpers
scaffolder.rsβ Template rendering via Tera, file generationdependencies.rsβ Framework-specific dependency mappingpython_env.rsβ Virtual environment and pip integrationprogress.rsβ CLI progress spinners and feedback
All templates are compiled into the binary using include_str!() β zero external dependencies.
- Binary size: ~10 MB (optimized)
- Startup: < 100ms
- Project scaffold: < 1 second
- Dependency install: 1-5 minutes (network dependent)
MIT
For issues, questions, or feature requests:
- Open a GitHub issue
- Check existing issues
- Review generated project READMEs for template-specific docs
- Remote template registry for custom templates
-
penguin doctorβ Environment diagnostics - Hot-reload file watcher
- Multiple Python version support
- Pre-commit hooks scaffolding
- CI/CD pipeline templates (GitHub Actions, GitLab CI)
- Kubernetes deployment helpers
- Monitoring integration (Prometheus, DataDog)
Happy building with Penguin!