Hi,
Thanks for created the everything gpt tutorial. It's been very helpful. I'm noticing that the Onynx optimized model is taking a long time to generate more than 200 token. By a long time I mean anywhere from 10m-15m or even timing out. This happens even on the colab using the V100. When trying to generate a token on AWS using the configuration you mentioned, it usually times out. I would love to get your thoughts on a resolution for this.
Hi,
Thanks for created the everything gpt tutorial. It's been very helpful. I'm noticing that the Onynx optimized model is taking a long time to generate more than 200 token. By a long time I mean anywhere from 10m-15m or even timing out. This happens even on the colab using the V100. When trying to generate a token on AWS using the configuration you mentioned, it usually times out. I would love to get your thoughts on a resolution for this.