Skip to content

[RFC] Production Optimizations: GPU-Resident Inference & Fine-Tuning Efficiency #460

@JorgeEmiliano80

Description

@JorgeEmiliano80

Hi @abdulfatir and the Chronos team,

First, I sincerely apologize for the disruption caused by my previous PRs (#454, #456). I understand that opening significant architectural changes without prior discussion creates unnecessary noise, especially when they deviate from the project's core roadmap.

I am currently deploying Chronos in a high-throughput production environment and have identified two specific bottlenecks. I wanted to share my findings and ask if architectural support for these use cases aligns with your long-term goals.

1. High-Throughput Inference (Removing the CPU-GPU Sync)

I profiled the predict() loop and noticed that moving tensors between CPU and GPU at every generation step acts as a significant bottleneck for low-latency applications.

  • Experiment: I implemented a generation loop that keeps the context and predictions entirely on VRAM until completion.
  • Result: On local benchmarks (MPS/CUDA), this yielded a ~5x improvement in throughput for batch inference.
  • Proposal: Instead of modifying the core ChronosModel, would you be open to an optional ChronosFastPipeline (or similar utility) specifically designed for production inference where latency is critical?

2. Static Covariates for Fine-Tuning

I reviewed the discussion in #352 and understand that pretrained checkpoints do not support static covariates. However, for users fine-tuning on retail datasets (where item metadata is constant), repeating static features across the temporal dimension significantly increases memory usage.

  • Proposal: Would you consider accepting a static_embedding module in the architecture that is disabled by default?
  • Benefit: This would allow advanced users to fine-tune custom models with metadata efficiently, without breaking compatibility for users of the pretrained checkpoints.

I am happy to keep these optimizations in my own fork if they are out of scope, but I wanted to offer them properly in case they benefit the community.

Thanks for your work on this SOTA model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions