Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.
-
Updated
Nov 12, 2025 - Python
Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.
Concurrency-aware effective-cost meter for self-hosted vLLM inference — companion artifact to arXiv:2606.11690 (Beyond Per-Token Pricing).
Triton inference benchmark with telemetry, correctness gates, and cost-to-serve modeling
Driver-based model to estimate infrastructure cost impact of product experiments (Lambda, DynamoDB, CloudWatch)
A full-stack GPU profiling and simulation framework that bridges high-level Python ML code with low-level hardware metrics (SM Banks, Tensor Cores) for precise performance analysis.
Distributed engineering cost modeling and team topology pricing platform for CTO decision making.
Quantifies vehicle design complexity cost using ML and portfolio optimization. Gradient Boosting achieves R²=0.93 on MSRP prediction across 11,914 vehicles.
Reproducible microbenchmark for modeling domain crossing energy in heterogeneous compute systems.
Add a description, image, and links to the cost-modeling topic page so that developers can more easily learn about it.
To associate your repository with the cost-modeling topic, visit your repo's landing page and select "manage topics."