Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
ab54824
Added ONNX exporter class to export model to ONNX format
DimaBir Oct 2, 2023
c5c5414
Merge remote-tracking branch 'origin/dev' into dev
DimaBir Oct 2, 2023
8ed72b9
Fixed import typo
DimaBir Oct 2, 2023
6451458
Fixed docstring
DimaBir Oct 2, 2023
2dc2560
Fixed docstring
DimaBir Oct 2, 2023
9019460
Merge remote-tracking branch 'origin/dev' into dev
DimaBir Oct 2, 2023
758f964
Fixed typo
DimaBir Oct 2, 2023
e186b66
Fixed typo
DimaBir Oct 2, 2023
1a6cf6e
Fixed typo
DimaBir Oct 2, 2023
47d6481
Print ONNX model
DimaBir Oct 2, 2023
74577ea
trying set to train to print BN
DimaBir Oct 2, 2023
d00d16f
Removed Conv + BN fusion in exporting PyTorch to ONNX
DimaBir Oct 2, 2023
b07c628
Removed Conv + BN fusion in exporting PyTorch to ONNX
DimaBir Oct 2, 2023
8e9d013
Add ONNX Inference
DimaBir Oct 2, 2023
f04011d
Updated dockerfile include packages
DimaBir Oct 2, 2023
7e04d3b
Fixed ONNX Inference
DimaBir Oct 2, 2023
85aebba
Fixed ONNX input
DimaBir Oct 2, 2023
7bb680d
Fixed ONNX input
DimaBir Oct 2, 2023
3dc11ca
Fixed ONNX input
DimaBir Oct 2, 2023
6e021cd
Fixed ONNX input
DimaBir Oct 2, 2023
5ac5b47
Fixed ONNX input
DimaBir Oct 2, 2023
69f3c9b
Fixed ONNX input
DimaBir Oct 2, 2023
d0f8936
Fixed ONNX input
DimaBir Oct 2, 2023
d665ae4
Fixed ONNX input
DimaBir Oct 2, 2023
f0049d8
Added abstract benchmark class
DimaBir Oct 2, 2023
647d811
Fixed ONNXBenchmark param
DimaBir Oct 2, 2023
125d825
Fixed ONNXBenchmark param
DimaBir Oct 2, 2023
914da47
Fixed ONNXBenchmark param
DimaBir Oct 2, 2023
f3c162d
Fixed ONNXBenchmark param
DimaBir Oct 2, 2023
ff84523
Applied black formatting
DimaBir Oct 2, 2023
057b899
Added image cat3
DimaBir Oct 2, 2023
dd81da6
Enabling optimization for ONNX exporter, using GPU
DimaBir Oct 2, 2023
284671b
Enabling optimization for ONNX exporter, using GPU
DimaBir Oct 2, 2023
e8427c7
Enabling optimization for ONNX exporter, using GPU
DimaBir Oct 2, 2023
5acb999
Fixed ONNX benchmark error
DimaBir Oct 2, 2023
e9b08e9
Fixed ONNX benchmark error
DimaBir Oct 2, 2023
3bb4f70
Fixed ONNX benchmark error
DimaBir Oct 2, 2023
dc574bd
Added requirement, reformatted code
DimaBir Oct 2, 2023
f7a8779
Fixed typo in default image extension
DimaBir Oct 2, 2023
eda33aa
Update inference images
DimaBir Oct 2, 2023
df0bf5f
Update inference images
DimaBir Oct 2, 2023
e1fefc2
Updated README.md
DimaBir Oct 2, 2023
759e92f
Updated README.md
DimaBir Oct 2, 2023
88c2901
Merge branch 'main' into dev
DimaBir Oct 2, 2023
2e5af4c
Added OV (OpenVINO) exporter model and benchmark
DimaBir Oct 5, 2023
15ba744
Update arg parse to handle runtime modes (ONNX, OV, CUDA)
DimaBir Oct 5, 2023
8a1eb3e
Updated dockerfile and requirements for torch-tensorrt
DimaBir Oct 5, 2023
720d793
Fixed OpenVINO exports
DimaBir Oct 5, 2023
75f65cb
Fixed OpenVINO exports
DimaBir Oct 5, 2023
e54f8ab
Fixed OpenVINO exports
DimaBir Oct 5, 2023
0299d79
Fixed OpenVINO typo
DimaBir Oct 5, 2023
8733657
Fixed OpenVINO typo
DimaBir Oct 5, 2023
ddd45a7
Fixed OpenVINO typo
DimaBir Oct 5, 2023
d4df36d
refactored OV code
DimaBir Oct 6, 2023
2a433d0
Fixed typo in the command
DimaBir Oct 6, 2023
3a16603
Updated imports
DimaBir Oct 6, 2023
8bfbcba
Fixed OpenVINO inference by adjusting to the newer API's CompiledMode…
DimaBir Oct 6, 2023
3d2cdb5
Fix
DimaBir Oct 6, 2023
25b98fa
Fix and reformat
DimaBir Oct 6, 2023
1a3a6a9
Fix and reformat
DimaBir Oct 6, 2023
3d1cb87
Make prediction with OVModel
DimaBir Oct 6, 2023
605b195
Update Dockerfile
DimaBir Oct 6, 2023
858859b
Update Dockerfile
DimaBir Oct 6, 2023
17d7114
Update Dockerfile
DimaBir Oct 6, 2023
df31b4e
Update Dockerfile
DimaBir Oct 6, 2023
8649bc1
Update Dockerfile
DimaBir Oct 6, 2023
b3262f9
Update Dockerfile
DimaBir Oct 6, 2023
24766d0
Update Dockerfile
DimaBir Oct 6, 2023
1ea7673
Update Dockerfile
DimaBir Oct 6, 2023
bcc1e01
Fix: Use correct attribute to retrieve input name in OpenVINO's Compi…
DimaBir Oct 6, 2023
1bc6d5e
Added: ploting graph and running all
DimaBir Oct 6, 2023
e263f04
Added: ploting graph and running all
DimaBir Oct 6, 2023
106e9b8
FIX: OVExport
DimaBir Oct 6, 2023
f6ce7e7
FIX: OVExport
DimaBir Oct 6, 2023
7ece256
FIX: OVExport
DimaBir Oct 6, 2023
e7d5b2a
Optimized the `make_prediction` function for clarity and stability
DimaBir Oct 6, 2023
5ffd98a
Revert
DimaBir Oct 6, 2023
a7857b4
Revert
DimaBir Oct 6, 2023
99e8a51
Revert
DimaBir Oct 6, 2023
7e17ef9
Revert
DimaBir Oct 6, 2023
522fe55
Added run_all_benchmark
DimaBir Oct 6, 2023
0cf90b8
Added run_all_benchmark
DimaBir Oct 6, 2023
c0b21d2
Added run_all_benchmark
DimaBir Oct 6, 2023
d99b076
Updated run_all_benchmark
DimaBir Oct 6, 2023
621c0e7
Updated run_all_benchmark
DimaBir Oct 6, 2023
01caadd
Updated run_all_benchmark
DimaBir Oct 6, 2023
386ecf5
Updated run_all_benchmark
DimaBir Oct 6, 2023
04dd216
Enhanced benchmark visualization with value labels on bars.
DimaBir Oct 6, 2023
c2e6251
Enhanced benchmark visualization with value labels on bars.
DimaBir Oct 6, 2023
1c1d000
Changed README.md
DimaBir Oct 6, 2023
8f66fd0
Updated README.md
DimaBir Oct 6, 2023
3960263
Updated README.md
DimaBir Oct 6, 2023
5aa4fe0
Updated README.md
DimaBir Oct 6, 2023
3ea712b
Merge branch 'master' into dev
DimaBir Oct 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,21 @@ FROM nvcr.io/nvidia/tensorrt:23.08-py3
# Install system packages
RUN apt-get update && apt-get install -y \
python3-pip \
git
git \
libjpeg-dev \
libpng-dev

# Copy the requirements.txt file into the container
COPY requirements.txt /workspace/requirements.txt

# Install Python packages
RUN pip3 install --no-cache-dir -r /workspace/requirements.txt

# Install torch-tensorrt from the special location
RUN pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases

# Set the working directory
WORKDIR /workspace

# Copy local project files to /workspace in the image
COPY . /workspace

# Install Python packages
RUN pip3 install --no-cache-dir -r /workspace/requirements.txt
COPY . /workspace
175 changes: 137 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,31 @@
# ResNet-50 Inference with ONNX/TensorRT

<img src="./inference/logo.png" width="60%">

## Table of Contents
1. [Overview](#overview)
2. [Requirements](#requirements)
3. [Steps to Run](#steps-to-run)
4. [Example Command](#example-command)
5. [Inference Benchmark Results](#inference-benchmark-results)
- [Example of Results](#example-of-results)
- [Explanation of Results](#explanation-of-results)
6. [ONNX Exporter](#onnx-exporter) ![New](https://img.shields.io/badge/-New-red)
7. [Author](#author)
8. [References](#references)
- [Steps to Run](#steps-to-run)
- [Example Command](#example-command)
5. [RESULTS](#results) ![Static Badge](https://img.shields.io/badge/update-yellow)
- [Results explanation](#results-explanation)
- [Example Input](#example-input)
6. [Benchmark Implementation Details](#benchmark-implementation-details) ![New](https://img.shields.io/badge/-New-red)
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
- [ONNX](#onnx)
- [OpenVINO](#openvino)
7. [Used methodologies](#used-methodologies) ![New](https://img.shields.io/badge/-New-red)
- [TensorRT Optimization](#tensorrt-optimization)
- [ONNX Exporter](#onnx-exporter)
- [OV Exporter](#ov-exporter)
10. [Author](#author)
11. [References](#references)


<img src="./inference/plot.png" width="70%">

## Overview
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX or NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CPU (ONNX), CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.
This project demonstrates how to perform inference with a PyTorch model and optimize it using ONNX, OpenVINO, NVIDIA TensorRT. The script loads a pre-trained ResNet-50 model from torchvision, performs inference on a user-provided image, and prints the top-K predicted classes. Additionally, the script benchmarks the model's performance in the following configurations: CPU, CUDA, TensorRT-FP32, and TensorRT-FP16, providing insights into the speedup gained through optimization.

## Requirements
- This repo cloned
Expand All @@ -21,7 +34,7 @@ This project demonstrates how to perform inference with a PyTorch model and opti
- Python 3.x
- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide) (for running the Docker container with GPU support)

## Steps to Run
### Steps to Run

```sh
# 1. Build the Docker Image
Expand All @@ -37,46 +50,132 @@ python src/main.py
### Arguments
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
- `--onnx`: (Optional) Specifies if we want export ResNet50 model to ONNX and run benchmark only for this model
- `--mode`: Specifies the mode for exporting and running the model. Choices are: `onnx`, `ov`, `all`.

## Example Command
### Example Command
```sh
python src/main.py --image_path ./inference/cat3.jpg --topk 3 --onnx
python src/main.py --topk 3 --mode=all
```

This command will run predictions on the image at the specified path and show the top 3 predictions using both PyTorch and ONNX Runtime models. For the default 5 top predictions, omit the --topk argument or set it to 5.
This command will run predictions on the default image (`./inference/cat3.jpg`), show the top 3 predictions and run all models (PyTorch CPU, CUDA, ONNX, OV, TRT-FP16, TRT-FP32). At the end results plot will be saved to `./inference/plot.png`

## Inference Benchmark Results
## RESULTS
### Inference Benchmark Results
<img src="./inference/plot.png" width="70%">

The results of the predictions and benchmarks are saved to `model.log`. This log file contains information about the predicted class for the input image and the average batch time for the different configurations during the benchmark.
### Results explanation
- `PyTorch_cpu: 973.52 ms` indicate the average batch time when running `PyTorch` model on `CPU` device.
- `PyTorch_cuda: 41.11 ms` indicate the average batch time when running `PyTorch` model on `CUDA` device.
- `TRT_fp32: 19.10 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
- `TRT_fp16: 7.22 ms` indicate the average batch time when running the model with `TensorRT` using `float16` precision.
- `ONNX: 15.38 ms` indicate the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
- `OpenVINO: 14.04 ms` indicate the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.

### Example of Results
Here is an example of the contents of `model.log` after running predictions and benchmarks on this image:
### Example Input
Here is an example of the input image to run predictions and benchmarks on:

<img src="./inference/cat3.jpg" width="20%">

## Benchmark Implementation Details
Here you can see flow for each model and benchmark.

### PyTorch CPU & CUDA
In the provided code, we perform inference using the native PyTorch framework on both CPU and GPU (CUDA) configurations. This serves as a baseline to compare the performance improvements gained from other optimization techniques.

#### Flow:
1. The ResNet-50 model is loaded from torchvision and, if available, transferred to the GPU.
2. Inference is performed on the provided image using the specified model.
3. Benchmark results, including average inference time, are logged for both the CPU and CUDA setups.

### TensorRT FP32 & FP16
TensorRT offers significant performance improvements by optimizing the neural network model. In this code, we utilize TensorRT's capabilities to run benchmarks in both FP32 (single precision) and FP16 (half precision) modes.

#### Flow:
1. Load the ResNet-50 model.
2. Convert the PyTorch model to TensorRT format with the specified precision.
3. Perform inference on the provided image.
4. Log the benchmark results for the specified TensorRT precision mode.

### ONNX
The code includes an exporter that converts the PyTorch ResNet-50 model to ONNX format, allowing it to be inferred using ONNX Runtime. This provides a flexible, cross-platform solution for deploying the model.

#### Flow:
1. The ResNet-50 model is loaded.
2. Using the ONNX exporter utility, the PyTorch model is converted to ONNX format.
3. ONNX Runtime session is created.
4. Inference is performed on the provided image using the ONNX model.
5. Benchmark results are logged for the ONNX model.

### OpenVINO
OpenVINO is a toolkit from Intel that optimizes deep learning model inference for Intel CPUs, GPUs, and other hardware. In the code, we convert the ONNX model to OpenVINO's format and then run benchmarks using the OpenVINO runtime.

#### Flow:
1. The ONNX model (created in the previous step) is loaded.
2. Convert the ONNX model to OpenVINO's IR format.
3. Create an inference engine using OpenVINO's runtime.
4. Perform inference on the provided image using the OpenVINO model.
5. Benchmark results, including average inference time, are logged for the OpenVINO model.

## Used methodologies
### TensorRT Optimization
TensorRT is a high-performance deep learning inference optimizer and runtime library developed by NVIDIA. It is designed for optimizing and deploying trained neural network models on production environments. This project supports TensorRT optimizations in both FP32 (single precision) and FP16 (half precision) modes, offering different trade-offs between inference speed and model accuracy.

#### Features
- **Performance Boost**: TensorRT can significantly accelerate the inference of neural network models, making it suitable for deployment in resource-constrained environments.
- **Precision Modes**: Supports FP32 for maximum accuracy and FP16 for faster performance with a minor trade-off in accuracy.
- **Layer Fusion**: TensorRT fuses layers and tensors in the neural network to reduce memory access overhead and improve execution speed.
- **Dynamic Tensor Memory**: Efficiently handles varying batch sizes without re-optimization.

#### Usage
To employ TensorRT optimizations in the project, use the `--mode all` argument when running the main script.
This will initiate all models including PyTorch models that will be compiled to TRT model with `FP16` and `FP32` precision modes. Then, in one of the steps, will run inference on the specified image using the TensorRT-optimized model.
Example:
```sh
python src/main.py --mode all
```
My prediction: %33 tabby
My prediction: %26 Egyptian cat
Running Benchmark for CPU
Average batch time: 942.47 ms
Average ONNX inference time: 15.59 ms
Running Benchmark for CUDA
Average batch time: 41.02 ms
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32
Average batch time: 19.20 ms
Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16
Average batch time: 7.25 ms
#### Requirements
Ensure you have the TensorRT library and the torch_tensorrt package installed in your environment. Also, for FP16 optimizations, it's recommended to have a GPU that supports half-precision arithmetic (like NVIDIA GPUs with Tensor Cores).

### ONNX Exporter
ONNX Model Exporter (`ONNXExporter`) utility is incorporated within this project to enable the conversion of the native PyTorch model into the ONNX format.
Using the ONNX format, inference and benchmarking can be performed with the ONNX Runtime, which offers platform-agnostic optimizations and is widely supported across numerous platforms and devices.

#### Features
- **Standardized Format**: ONNX provides an open-source format for AI models. It defines an extensible computation graph model, as well as definitions of built-in operators and standard data types.
- **Interoperability**: Models in ONNX format can be used across a variety of frameworks, tools, runtimes, and compilers.
- **Optimizations**: The ONNX Runtime provides performance optimizations for both cloud and edge devices.

#### Usage
To leverage the `ONNXExporter` and conduct inference using the ONNX Runtime, utilize the `--mode onnx` argument when executing the main script.
This will initiate the conversion process and then run inference on the specified image using the ONNX model.
Example:
```sh
python src/main.py --mode onnx
```

### Explanation of Results
- First k lines show the topk predictions. For example, `My prediction: %33 tabby` displays the highest confidence prediction made by the model for the input image, confidence level (`%33`), and the predicted class (`tabby`).
- The following lines provide information about the average batch time for running the model in different configurations:
- `Running Benchmark for CPU` and `Average batch time: 942.47 ms` indicate the average batch time when running the model on the CPU.
- `Average ONNX inference time: 15.59 ms` indicate the average batch time when running the ONNX model on the CPU.
- `Running Benchmark for CUDA` and `Average batch time: 41.02 ms` indicate the average batch time when running the model on CUDA.
- `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float32` and `Average batch time: 19.20 ms` show the average batch time when running the model with TensorRT using `float32` precision.
- `Compiling and Running Inference Benchmark for TensorRT with precision: torch.float16` and `Average batch time: 7.25 ms` indicate the average batch time when running the model with TensorRT using `float16` precision.
#### Requirements
Ensure the ONNX library is installed in your environment to use the ONNXExporter. Additionally, if you want to run inference using the ONNX model, make sure you have the ONNX Runtime installed.

### OV Exporter
OpenVINO Model Exporter utility (`OVExporter`) has been integrated into this project to facilitate the conversion of the ONNX model to the OpenVINO format.
This enables inference and benchmarking using OpenVINO, a framework optimized for Intel hardware, providing substantial speed improvements especially on CPUs.

#### Features
- **Model Optimization**: Converts the ONNX model to OpenVINO's Intermediate Representation (IR) format. This optimized format allows for faster inference times on Intel hardware.
- **Versatility**: OpenVINO can target a variety of Intel hardware devices such as CPUs, integrated GPUs, FPGAs, and VPUs.
- **Ease of Use**: The `OVExporter` provides a seamless transition from ONNX to OpenVINO, abstracting the conversion details and providing a straightforward interface.

#### Usage
To utilize `OVExporter` and perform inference using OpenVINO, use the `--mode ov` argument when running the main script.
This will trigger the conversion process and subsequently run inference on the provided image using the optimized OpenVINO model.
Example:
```sh
python src/main.py --mode ov
```

#### Requirements
Ensure you have the OpenVINO Toolkit installed and the necessary dependencies set up to use OpenVINO's model optimizer and inference engine.


## ONNX Exporter
The ONNX Exporter utility is integrated into this project to allow the conversion of the PyTorch model to ONNX format, enabling inference and benchmarking using ONNX Runtime. The ONNX model can provide hardware-agnostic optimizations and is widely supported across various platforms and devices.
Expand Down
Binary file added inference/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added inference/plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
torch
torchvision
torch-tensorrt
pandas
Pillow
numpy
packaging
onnx
onnxruntime
onnxruntime
openvino==2023.1.0.dev20230811
seaborn
matplotlib
70 changes: 67 additions & 3 deletions src/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import torch.backends.cudnn as cudnn
import logging
import onnxruntime as ort
import openvino as ov

# Configure logging
logging.basicConfig(filename="model.log", level=logging.INFO)
Expand All @@ -22,7 +23,7 @@ def __init__(self, nruns: int = 100, nwarmup: int = 50):
self.nwarmup = nwarmup

@abstractmethod
def run(self) -> None:
def run(self):
"""
Abstract method to run the benchmark.
"""
Expand Down Expand Up @@ -58,7 +59,7 @@ def __init__(

cudnn.benchmark = True # Enable cuDNN benchmarking optimization

def run(self) -> None:
def run(self):
"""
Run the benchmark with the given model, input shape, and other parameters.
Log the average batch time and print the input shape and output feature size.
Expand Down Expand Up @@ -93,6 +94,7 @@ def run(self) -> None:
print(f"Input shape: {input_data.size()}")
print(f"Output features size: {features.size()}")
logging.info(f"Average batch time: {np.mean(timings) * 1000:.2f} ms")
return np.mean(timings) * 1000


class ONNXBenchmark(Benchmark):
Expand All @@ -113,7 +115,8 @@ def __init__(
self.nwarmup = nwarmup
self.nruns = nruns

def run(self) -> None:

def run(self):
print("Warming up ...")
# Adjusting the batch size in the input shape to match the expected input size of the model.
input_shape = (1,) + self.input_shape[1:]
Expand All @@ -133,3 +136,64 @@ def run(self) -> None:

avg_time = np.mean(timings) * 1000
logging.info(f"Average ONNX inference time: {avg_time:.2f} ms")
return avg_time


class OVBenchmark(Benchmark):
def __init__(
self, model: ov.frontend.FrontEnd, input_shape: Tuple[int, int, int, int]
):
"""
Initialize the OVBenchmark with the OpenVINO model and the input shape.

:param model: ov.frontend.FrontEnd
The OpenVINO model.
:param input_shape: Tuple[int, int, int, int]
The shape of the model input.
"""
self.ov_model = model
self.core = ov.Core()
self.compiled_model = None
self.input_shape = input_shape
self.warmup_runs = 50
self.num_runs = 100
self.dummy_input = np.random.randn(*input_shape).astype(np.float32)

def warmup(self):
"""
Compile the OpenVINO model for optimal execution on available hardware.
"""
self.compiled_model = self.core.compile_model(self.ov_model, "AUTO")

def inference(self, input_data) -> dict:
"""
Perform inference on the input data using the compiled OpenVINO model.

:param input_data: np.ndarray
The input data for the model.
:return: dict
The model's output as a dictionary.
"""
outputs = self.compiled_model(inputs={"input": input_data})
return outputs

def run(self):
"""
Run the benchmark on the OpenVINO model. It first warms up by compiling the model and then measures
the average inference time over a set number of runs.
"""
# Warm-up runs
logging.info("Warming up ...")
for _ in range(self.warmup_runs):
self.warmup()

# Benchmarking
total_time = 0
for _ in range(self.num_runs):
start_time = time.time()
_ = self.inference(self.dummy_input)
total_time += time.time() - start_time

avg_time = total_time / self.num_runs
logging.info(f"Average inference time: {avg_time * 1000:.2f} ms")
return avg_time * 1000
Loading