Onnx Optimization is taking a long time to generate tokens

Hi,

Thanks for created the everything gpt tutorial. It's been very helpful. I'm noticing that the Onynx optimized model is taking a long time to generate more than 200 token. By a long time I mean anywhere from 10m-15m or even timing out. This happens even on the colab using the V100. When trying to generate a token on AWS using the configuration you mentioned, it usually times out. I would love to get your thoughts on a resolution for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onnx Optimization is taking a long time to generate tokens #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Onnx Optimization is taking a long time to generate tokens #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions