-
Notifications
You must be signed in to change notification settings - Fork 600
Description
Hi @abdulfatir and the Chronos team,
First, I sincerely apologize for the disruption caused by my previous PRs (#454, #456). I understand that opening significant architectural changes without prior discussion creates unnecessary noise, especially when they deviate from the project's core roadmap.
I am currently deploying Chronos in a high-throughput production environment and have identified two specific bottlenecks. I wanted to share my findings and ask if architectural support for these use cases aligns with your long-term goals.
1. High-Throughput Inference (Removing the CPU-GPU Sync)
I profiled the predict() loop and noticed that moving tensors between CPU and GPU at every generation step acts as a significant bottleneck for low-latency applications.
- Experiment: I implemented a generation loop that keeps the context and predictions entirely on VRAM until completion.
- Result: On local benchmarks (MPS/CUDA), this yielded a ~5x improvement in throughput for batch inference.
- Proposal: Instead of modifying the core
ChronosModel, would you be open to an optionalChronosFastPipeline(or similar utility) specifically designed for production inference where latency is critical?
2. Static Covariates for Fine-Tuning
I reviewed the discussion in #352 and understand that pretrained checkpoints do not support static covariates. However, for users fine-tuning on retail datasets (where item metadata is constant), repeating static features across the temporal dimension significantly increases memory usage.
- Proposal: Would you consider accepting a
static_embeddingmodule in the architecture that is disabled by default? - Benefit: This would allow advanced users to fine-tune custom models with metadata efficiently, without breaking compatibility for users of the pretrained checkpoints.
I am happy to keep these optimizations in my own fork if they are out of scope, but I wanted to offer them properly in case they benefit the community.
Thanks for your work on this SOTA model.