diff --git a/README.md b/README.md
index baf1376..1820b10 100644
--- a/README.md
+++ b/README.md
@@ -6,25 +6,18 @@
2. [Requirements](#requirements)
- [Steps to Run](#steps-to-run)
- [Example Command](#example-command)
-3. [GPU-CUDA Results](#gpu-cuda-results) 
- - [Results explanation](#results-explanation)
- - [Example Input](#example-input)
- - [Example prediction results](#example-prediction-results)
- - [PC Setup](#pc-setup)
-4. [Benchmark Implementation Details](#benchmark-implementation-details) 
+3. [CPU Results](#cpu-results) 
+4. [GPU (CUDA) Results](#gpu-cuda-results) 
+5. [Benchmark Implementation Details](#benchmark-implementation-details) 
- [PyTorch CPU & CUDA](#pytorch-cpu--cuda)
- [TensorRT FP32 & FP16](#tensorrt-fp32--fp16)
- [ONNX](#onnx)
- [OpenVINO](#openvino)
-5. [Extra](#extra) 
- - [Linux Server Inference](#linux-server-inference)
- - [Prediction results](#prediction-results)
- - [PC Setup Linux](#pc-setup-linux)
6. [Author](#author)
7. [References](#references)
-
+
## Overview
This project showcases inference with a PyTorch ResNet-50 model and its optimization using ONNX, OpenVINO, and NVIDIA TensorRT. The script infers a user-specified image and displays top-K predictions. Benchmarking covers configurations like PyTorch CPU, ONNX CPU, OpenVINO CPU, PyTorch CUDA, TensorRT-FP32, and TensorRT-FP16.
@@ -50,13 +43,13 @@ Refer to the [Steps to Run](#steps-to-run) section for Docker instructions.
1. **CPU Deployment**:
For systems without a GPU or CUDA support, simply use the default base image.
```bash
- docker build -t my_image_cpu .
+ docker build -t cpu_img .
```
2. **GPU Deployment**:
If your system has GPU and CUDA support, you can use the TensorRT base image to leverage GPU acceleration.
```bash
- docker build --build-arg ENVIRONMENT=gpu --build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt:23.08-py3 -t my_project_image_gpu .
+ docker build --build-arg ENVIRONMENT=gpu --build-arg BASE_IMAGE=nvcr.io/nvidia/tensorrt:23.08-py3 -t gpu_img .
```
### Running the Docker Container
@@ -78,7 +71,7 @@ python main.py [--mode all]
### Arguments
- `--image_path`: (Optional) Specifies the path to the image you want to predict.
- `--topk`: (Optional) Specifies the number of top predictions to show. Defaults to 5 if not provided.
-- `--mode`: (Optional) Specifies the model's mode for exporting and running. Choices are: `onnx`, `ov`, `cuda`, and `all`. If not provided, it defaults to `all`.
+- `--mode`: (Optional) Specifies the model's mode for exporting and running. Choices are: `onnx`, `ov`, `cpu`, `cuda`, `tensorrt`, and `all`. If not provided, it defaults to `all`.
### Example Command
```sh
@@ -87,17 +80,33 @@ python main.py --topk 3 --mode=all --image_path="./inference/train.jpg"
This command will run predictions on the chosen image (`./inference/train.jpg`), show the top 3 predictions, and run all available models. Note: plot created only for `--mode=all` and results plotted and saved to `./inference/plot.png`
-## GPU-CUDA Results
+## CPU Results
+
+
+### Prediction results
+```
+#1: 15% Egyptian cat
+#2: 14% tiger cat
+#3: 9% tabby
+#4: 2% doormat
+#5: 2% lynx
+```
+### PC Setup Linux
+- CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
+- RAM: 16 GB
+- GPU: None
+
+## GPU (CUDA) Results
### Inference Benchmark Results
-
+
### Results explanation
- - `PyTorch_cpu: 978.71 ms` indicates the average batch time when running the `PyTorch` model on `CPU` device.
- - `PyTorch_cuda: 30.11 ms` indicates the average batch time when running the `PyTorch` model on the `CUDA` device.
- - `TRT_fp32: 19.20 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
- - `TRT_fp16: 7.32 ms` indicates the average batch time when running the model with `TensorRT` using `float16` precision.
- - `ONNX: 15.95 ms` indicates the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
- - `OpenVINO: 13.37 ms` indicates the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.
+ - `PyTorch_cpu: 32.83 ms` indicates the average batch time when running the `PyTorch` model on `CPU` device.
+ - `PyTorch_cuda: 5.59 ms` indicates the average batch time when running the `PyTorch` model on the `CUDA` device.
+ - `TRT_fp32: 1.69 ms` shows the average batch time when running the model with `TensorRT` using `float32` precision.
+ - `TRT_fp16: 1.69 ms` indicates the average batch time when running the model with `TensorRT` using `float16` precision.
+ - `ONNX: 16.01 ms` indicates the average batch inference time when running the `PyTorch` converted to the `ONNX` model on the `CPU` device.
+ - `OpenVINO: 15.65 ms` indicates the average batch inference time when running the `ONNX` model converted to `OpenVINO` on the `CPU` device.
### Example Input
Here is an example of the input image to run predictions and benchmarks on:
@@ -158,33 +167,6 @@ OpenVINO is a toolkit from Intel that optimizes deep learning model inference fo
4. Perform inference on the provided image using the OpenVINO model.
5. Benchmark results, including average inference time, are logged for the OpenVINO model.
-## Extra
-### Linux Server Inference
-
-
-### Prediction results
-`model.log` file content
-```
-Running prediction for OV model
-#1: 15% Egyptian cat
-#2: 14% tiger cat
-#3: 9% tabby
-#4: 2% doormat
-#5: 2% lynx
-
-
-Running prediction for ONNX model
-#1: 15% Egyptian cat
-#2: 14% tiger cat
-#3: 9% tabby
-#4: 2% doormat
-#5: 2% lynx
-```
-### PC Setup Linux
-- CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
-- RAM: 16 GB
-- GPU: None
-
## Author
[DimaBir](https://github.com/DimaBir)
diff --git a/benchmark/__init__.py b/benchmark/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/benchmark/benchmark_models.py b/benchmark/benchmark_models.py
deleted file mode 100644
index c56064b..0000000
--- a/benchmark/benchmark_models.py
+++ /dev/null
@@ -1,247 +0,0 @@
-import time
-from typing import Tuple
-
-from abc import ABC, abstractmethod
-import numpy as np
-import torch
-import torch.backends.cudnn as cudnn
-import logging
-import onnxruntime as ort
-import openvino as ov
-
-# Configure logging
-logging.basicConfig(filename="model.log", level=logging.INFO)
-
-
-class Benchmark(ABC):
- """
- Abstract class representing a benchmark.
- """
-
- def __init__(self, nruns: int = 100, nwarmup: int = 50):
- self.nruns = nruns
- self.nwarmup = nwarmup
-
- @abstractmethod
- def run(self):
- """
- Abstract method to run the benchmark.
- """
- pass
-
-
-class PyTorchBenchmark:
- def __init__(
- self,
- model: torch.nn.Module,
- device: str = "cuda",
- input_shape: Tuple[int, int, int, int] = (32, 3, 224, 224),
- dtype: torch.dtype = torch.float32,
- nwarmup: int = 50,
- nruns: int = 100,
- ) -> None:
- """
- Initialize the Benchmark object.
-
- :param model: The model to be benchmarked.
- :param device: The device to run the benchmark on ("cpu" or "cuda").
- :param input_shape: The shape of the input data.
- :param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
- :param nwarmup: The number of warmup runs before timing.
- :param nruns: The number of runs for timing.
- """
- self.model = model
- self.device = device
- self.input_shape = input_shape
- self.dtype = dtype
- self.nwarmup = nwarmup
- self.nruns = nruns
-
- cudnn.benchmark = True # Enable cuDNN benchmarking optimization
-
- def run(self):
- """
- Run the benchmark with the given model, input shape, and other parameters.
- Log the average batch time and print the input shape and output feature size.
- """
- # Prepare input data
- input_data = torch.randn(self.input_shape).to(self.device).to(self.dtype)
-
- # Warm up
- print("Warm up ...")
- with torch.no_grad():
- for _ in range(self.nwarmup):
- features = self.model(input_data)
-
- if self.device == "cuda":
- torch.cuda.synchronize()
-
- # Start timing
- print("Start timing ...")
- timings = []
- with torch.no_grad():
- for i in range(1, self.nruns + 1):
- start_time = time.time()
- features = self.model(input_data)
- if self.device == "cuda":
- torch.cuda.synchronize()
- end_time = time.time()
- timings.append(end_time - start_time)
-
- if i % 10 == 0:
- print(
- f"Iteration {i}/{self.nruns}, ave batch time {np.mean(timings) * 1000:.2f} ms"
- )
-
- logging.info(f"Average batch time: {np.mean(timings) * 1000:.2f} ms")
- return np.mean(timings) * 1000
-
-
-class ONNXBenchmark(Benchmark):
- """
- A class used to benchmark the performance of an ONNX model.
- """
-
- def __init__(
- self,
- ort_session: ort.InferenceSession,
- input_shape: tuple,
- nruns: int = 100,
- nwarmup: int = 50,
- ):
- super().__init__(nruns)
- self.ort_session = ort_session
- self.input_shape = input_shape
- self.nwarmup = nwarmup
- self.nruns = nruns
-
- def run(self):
- print("Warming up ...")
- # Adjusting the batch size in the input shape to match the expected input size of the model.
- input_shape = (1,) + self.input_shape[1:]
- input_data = np.random.randn(*input_shape).astype(np.float32)
-
- for _ in range(self.nwarmup): # Warm-up runs
- _ = self.ort_session.run(None, {"input": input_data})
-
- print("Starting benchmark ...")
- timings = []
-
- for i in range(1, self.nruns + 1):
- start_time = time.time()
- _ = self.ort_session.run(None, {"input": input_data})
- end_time = time.time()
- timings.append(end_time - start_time)
-
- if i % 10 == 0:
- print(
- f"Iteration {i}/{self.nruns}, ave batch time {np.mean(timings) * 1000:.2f} ms"
- )
-
- avg_time = np.mean(timings) * 1000
- logging.info(f"Average ONNX inference time: {avg_time:.2f} ms")
- return avg_time
-
-
-class OVBenchmark(Benchmark):
- def __init__(
- self, model: ov.frontend.FrontEnd, input_shape: Tuple[int, int, int, int]
- ):
- """
- Initialize the OVBenchmark with the OpenVINO model and the input shape.
-
- :param model: ov.frontend.FrontEnd
- The OpenVINO model.
- :param input_shape: Tuple[int, int, int, int]
- The shape of the model input.
- """
- self.ov_model = model
- self.core = ov.Core()
- self.compiled_model = None
- self.input_shape = input_shape
- self.nwarmup = 50
- self.nruns = 100
- self.dummy_input = np.random.randn(*input_shape).astype(np.float32)
-
- def warmup(self):
- """
- Compile the OpenVINO model for optimal execution on available hardware.
- """
- self.compiled_model = self.core.compile_model(self.ov_model, "AUTO")
-
- def inference(self, input_data) -> dict:
- """
- Perform inference on the input data using the compiled OpenVINO model.
-
- :param input_data: np.ndarray
- The input data for the model.
- :return: dict
- The model's output as a dictionary.
- """
- outputs = self.compiled_model(inputs={"input": input_data})
- return outputs
-
- def run(self):
- """
- Run the benchmark on the OpenVINO model. It first warms up by compiling the model and then measures
- the average inference time over a set number of runs.
- """
- # Warm-up runs
- logging.info("Warming up ...")
- for _ in range(self.nwarmup):
- self.warmup()
-
- # Benchmarking
- total_time = 0
- for i in range(1, self.nruns + 1):
- start_time = time.time()
- _ = self.inference(self.dummy_input)
- total_time += time.time() - start_time
-
- if i % 10 == 0:
- print(
- f"Iteration {i}/{self.nruns}, ave batch time {total_time / i * 1000:.2f} ms"
- )
-
- avg_time = total_time / self.nruns
- logging.info(f"Average inference time: {avg_time * 1000:.2f} ms")
- return avg_time * 1000
-
-
-def benchmark_onnx_model(ort_session: ort.InferenceSession):
- run_benchmark(None, None, None, ort_session, onnx=True)
-
-
-def benchmark_ov_model(ov_model: ov.CompiledModel) -> OVBenchmark:
- ov_benchmark = OVBenchmark(ov_model, input_shape=(1, 3, 224, 224))
- ov_benchmark.run()
- return ov_benchmark
-
-
-def benchmark_cuda_model(cuda_model: torch.nn.Module, device: str, dtype: torch.dtype):
- run_benchmark(cuda_model, device, dtype)
-
-
-def run_benchmark(
- model: torch.nn.Module,
- device: str,
- dtype: torch.dtype,
- ort_session: ort.InferenceSession = None,
- onnx: bool = False,
-) -> None:
- """
- Run and log the benchmark for the given model, device, and dtype.
-
- :param onnx:
- :param ort_session:
- :param model: The model to be benchmarked.
- :param device: The device to run the benchmark on ("cpu" or "cuda").
- :param dtype: The data type to be used in the benchmark (typically torch.float32 or torch.float16).
- """
- if onnx:
- logging.info(f"Running Benchmark for ONNX")
- benchmark = ONNXBenchmark(ort_session, input_shape=(32, 3, 224, 224))
- else:
- logging.info(f"Running Benchmark for {device.upper()} and precision {dtype}")
- benchmark = PyTorchBenchmark(model, device=device, dtype=dtype)
- benchmark.run()
\ No newline at end of file
diff --git a/benchmark/benchmark_utils.py b/benchmark/benchmark_utils.py
deleted file mode 100644
index 2957d2b..0000000
--- a/benchmark/benchmark_utils.py
+++ /dev/null
@@ -1,108 +0,0 @@
-import logging
-
-import numpy as np
-import pandas as pd
-import matplotlib.pyplot as plt
-import seaborn as sns
-from typing import Dict, Any
-import torch
-
-from benchmark.benchmark_models import PyTorchBenchmark, ONNXBenchmark, OVBenchmark
-
-
-def run_all_benchmarks(
- models: Dict[str, Any], img_batch: np.ndarray
-) -> Dict[str, float]:
- """
- Run benchmarks for all models and return a dictionary of average inference times.
-
- :param models: Dictionary of models. Key is model type ("onnx", "ov", "pytorch", "trt_fp32", "trt_fp16"), value is the model.
- :param img_batch: The batch of images to run the benchmark on.
- :return: Dictionary of average inference times. Key is model type, value is average inference time.
- """
- results = {}
-
- # ONNX benchmark
- logging.info(f"Running benchmark inference for ONNX model")
- onnx_benchmark = ONNXBenchmark(models["onnx"], img_batch.shape)
- avg_time_onnx = onnx_benchmark.run()
- results["ONNX"] = avg_time_onnx
-
- # OpenVINO benchmark
- logging.info(f"Running benchmark inference for OpenVINO model")
- ov_benchmark = OVBenchmark(models["ov"], img_batch.shape)
- avg_time_ov = ov_benchmark.run()
- results["OpenVINO"] = avg_time_ov
-
- # PyTorch + TRT benchmark
- configs = [
- ("cpu", torch.float32, False),
- ("cuda", torch.float32, False),
- ("cuda", torch.float32, True),
- ("cuda", torch.float16, True),
- ]
- for device, precision, is_trt in configs:
- if not torch.cuda.is_available() and device == "cuda":
- continue
-
- model_to_use = models[f"PyTorch_{device}"].to(device)
-
- if not is_trt:
- pytorch_benchmark = PyTorchBenchmark(
- model_to_use, device=device, dtype=precision
- )
- logging.info(f"Running benchmark inference for PyTorch_{device} model")
- avg_time_pytorch = pytorch_benchmark.run()
- results[f"PyTorch_{device}"] = avg_time_pytorch
-
- else:
- # TensorRT benchmarks
- if precision == torch.float32 or precision == torch.float16:
- mode = "fp32" if precision == torch.float32 else "fp16"
- logging.info(f"Running benchmark inference for TRT_{mode} model")
- trt_benchmark = PyTorchBenchmark(
- models[f"trt_{mode}"], device=device, dtype=precision
- )
- avg_time_trt = trt_benchmark.run()
- results[f"TRT_{mode}"] = avg_time_trt
-
- return results
-
-
-def plot_benchmark_results(results: Dict[str, float]):
- """
- Plot the benchmark results using Seaborn.
-
- :param results: Dictionary of average inference times. Key is model type, value is average inference time.
- """
- # Convert dictionary to two lists for plotting
- models = list(results.keys())
- times = list(results.values())
-
- # Create a DataFrame for plotting
- data = pd.DataFrame({"Model": models, "Time": times})
-
- # Sort the DataFrame by Time
- data = data.sort_values("Time", ascending=True)
-
- # Plot
- plt.figure(figsize=(10, 6))
- ax = sns.barplot(
- x=data["Time"],
- y=data["Model"],
- hue=data["Model"],
- palette="rocket",
- legend=False,
- )
-
- # Adding the actual values on the bars
- for index, value in enumerate(data["Time"]):
- ax.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")
-
- plt.xlabel("Average Inference Time (ms)")
- plt.ylabel("Model Type")
- plt.title("ResNet50 - Inference Benchmark Results")
-
- # Save the plot to a file
- plt.savefig("./inference/plot.png", bbox_inches="tight")
- plt.show()
diff --git a/common/utils.py b/common/utils.py
index 8cadf76..a8e0a33 100644
--- a/common/utils.py
+++ b/common/utils.py
@@ -1,65 +1,113 @@
import argparse
-import openvino as ov
-import torch
-from src.model import ModelLoader
-from src.onnx_exporter import ONNXExporter
-from src.ov_exporter import OVExporter
-import onnxruntime as ort
+import pandas as pd
+import matplotlib.pyplot as plt
+import seaborn as sns
+from typing import Dict, Tuple
-def export_onnx_model(
- onnx_path: str, model_loader: ModelLoader, device: torch.device
-) -> None:
- onnx_exporter = ONNXExporter(model_loader.model, device, onnx_path)
- onnx_exporter.export_model()
+def plot_benchmark_results(results: Dict[str, Tuple[float, float]]):
+ """
+ Plot the benchmark results using Seaborn.
+ :param results: Dictionary where the key is the model type and the value is a tuple (average inference time, throughput).
+ """
+ plot_path = "./inference/plot.png"
-def init_onnx_model(
- onnx_path: str, model_loader: ModelLoader, device: torch.device
-) -> ort.InferenceSession:
- export_onnx_model(onnx_path=onnx_path, model_loader=model_loader, device=device)
- return ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
+ # Extract data from the results
+ models = list(results.keys())
+ times = [value[0] for value in results.values()]
+ throughputs = [value[1] for value in results.values()]
+ # Create DataFrames for plotting
+ time_data = pd.DataFrame({"Model": models, "Time": times})
+ throughput_data = pd.DataFrame({"Model": models, "Throughput": throughputs})
-def init_ov_model(onnx_path: str) -> ov.CompiledModel:
- ov_exporter = OVExporter(onnx_path)
- return ov_exporter.export_model()
+ # Sort the DataFrames
+ time_data = time_data.sort_values("Time", ascending=True)
+ throughput_data = throughput_data.sort_values("Throughput", ascending=False)
+ # Create subplots
+ fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 6))
-def init_cuda_model(
- model_loader: ModelLoader, device: torch.device, dtype: torch.dtype
-) -> torch.nn.Module:
- cuda_model = model_loader.model.to(device)
- if device == "cuda":
- cuda_model = torch.jit.trace(
- cuda_model, [torch.randn((1, 3, 224, 224)).to(device)]
- )
- return cuda_model
+ # Plot inference times
+ sns.barplot(
+ x=time_data["Time"],
+ y=time_data["Model"],
+ hue=time_data["Model"],
+ palette="rocket",
+ ax=ax1,
+ legend=False,
+ )
+ ax1.set_xlabel("Average Inference Time (ms)")
+ ax1.set_ylabel("Model Type")
+ ax1.set_title("ResNet50 - Inference Benchmark Results")
+ for index, value in enumerate(time_data["Time"]):
+ ax1.text(value, index, f"{value:.2f} ms", color="black", ha="left", va="center")
+
+ # Plot throughputs
+ sns.barplot(
+ x=throughput_data["Throughput"],
+ y=throughput_data["Model"],
+ hue=throughput_data["Model"],
+ palette="viridis",
+ ax=ax2,
+ legend=False,
+ )
+ ax2.set_xlabel("Throughput (samples/sec)")
+ ax2.set_ylabel("")
+ ax2.set_title("ResNet50 - Throughput Benchmark Results")
+ for index, value in enumerate(throughput_data["Throughput"]):
+ ax2.text(value, index, f"{value:.2f}", color="black", ha="left", va="center")
+
+ # Save the plot to a file
+ plt.tight_layout()
+ plt.savefig(plot_path, bbox_inches="tight")
+ plt.show()
+
+ print(f"Plot saved to {plot_path}")
def parse_arguments():
# Initialize ArgumentParser with description
parser = argparse.ArgumentParser(description="PyTorch Inference")
+
parser.add_argument(
"--image_path",
type=str,
default="./inference/cat3.jpg",
help="Path to the image to predict",
)
+
parser.add_argument(
"--topk", type=int, default=5, help="Number of top predictions to show"
)
+
parser.add_argument(
"--onnx_path",
type=str,
- default="./inference/model.onnx",
+ default="./models/model.onnx",
help="Path where model in ONNX format will be exported",
)
+
+ parser.add_argument(
+ "--ov_path",
+ type=str,
+ default="./models/model.ov",
+ help="Path where model in OpenVINO format will be exported",
+ )
+
parser.add_argument(
"--mode",
- choices=["onnx", "ov", "cuda", "all"],
+ choices=["onnx", "ov", "cpu", "cuda", "tensorrt", "all"],
default="all",
- help="Mode for exporting and running the model. Choices are: onnx, ov, cuda or all.",
+ help="Mode for exporting and running the model. Choices are: onnx, ov, cuda, tensorrt or all.",
+ )
+
+ parser.add_argument(
+ "-D",
+ "--DEBUG",
+ action="store_true",
+ help="Enable or disable debug capabilities.",
)
return parser.parse_args()
diff --git a/inference/IMG1.jpg b/inference/IMG1.jpg
deleted file mode 100644
index 34f0d6e..0000000
Binary files a/inference/IMG1.jpg and /dev/null differ
diff --git a/inference/IMG2.jpg b/inference/IMG2.jpg
deleted file mode 100644
index 39f35aa..0000000
Binary files a/inference/IMG2.jpg and /dev/null differ
diff --git a/inference/IMG3.jpg b/inference/IMG3.jpg
deleted file mode 100644
index 5d257b6..0000000
Binary files a/inference/IMG3.jpg and /dev/null differ
diff --git a/inference/cat2.jpg b/inference/cat2.jpg
deleted file mode 100644
index b29d714..0000000
Binary files a/inference/cat2.jpg and /dev/null differ
diff --git a/inference/image-3.jpg b/inference/image-3.jpg
deleted file mode 100644
index c1f993c..0000000
Binary files a/inference/image-3.jpg and /dev/null differ
diff --git a/inference/image-4.jpg b/inference/image-4.jpg
deleted file mode 100644
index d838912..0000000
Binary files a/inference/image-4.jpg and /dev/null differ
diff --git a/inference/logo.png b/inference/logo.png
deleted file mode 100644
index 02b10dd..0000000
Binary files a/inference/logo.png and /dev/null differ
diff --git a/inference/plot.png b/inference/plot.png
index 43bb7bb..940f9a5 100644
Binary files a/inference/plot.png and b/inference/plot.png differ
diff --git a/inference/plot_infer_thrp.png b/inference/plot_infer_thrp.png
new file mode 100644
index 0000000..38a3343
Binary files /dev/null and b/inference/plot_infer_thrp.png differ
diff --git a/inference/plot_latest.png b/inference/plot_latest.png
deleted file mode 100644
index d64d727..0000000
Binary files a/inference/plot_latest.png and /dev/null differ
diff --git a/inference/plot_linux_server.png b/inference/plot_linux_server.png
deleted file mode 100755
index ef401ae..0000000
Binary files a/inference/plot_linux_server.png and /dev/null differ
diff --git a/inference/plot_new_gpu.png b/inference/plot_new_gpu.png
new file mode 100644
index 0000000..793ee8c
Binary files /dev/null and b/inference/plot_new_gpu.png differ
diff --git a/main.py b/main.py
index 1ec21bb..7ef512e 100644
--- a/main.py
+++ b/main.py
@@ -1,26 +1,23 @@
import logging
-import os.path
import torch
+from src.onnx_inference import ONNXInference
+from src.ov_inference import OVInference
+from src.pytorch_inference import PyTorchInference
+
+from src.tensorrt_inference import TensorRTInference
+
CUDA_AVAILABLE = False
if torch.cuda.is_available():
try:
import torch_tensorrt
+
CUDA_AVAILABLE = True
except ImportError:
print("torch-tensorrt is not installed. Running on CPU mode only.")
-from benchmark.benchmark_models import benchmark_onnx_model, benchmark_ov_model
-from benchmark.benchmark_utils import run_all_benchmarks, plot_benchmark_results
-from common.utils import (
- parse_arguments,
- init_onnx_model,
- init_ov_model,
- init_cuda_model,
- export_onnx_model,
-)
+from common.utils import parse_arguments, plot_benchmark_results
from src.image_processor import ImageProcessor
-from prediction.prediction_models import *
from src.model import ModelLoader
import warnings
@@ -28,7 +25,7 @@
warnings.filterwarnings("ignore", category=UserWarning, module="torchvision.io.image")
# Configure logging
-logging.basicConfig(filename="model.log", level=logging.INFO)
+logging.basicConfig(filename="inference.log", level=logging.INFO)
def main():
@@ -38,8 +35,12 @@ def main():
"""
args = parse_arguments()
+ if args.DEBUG:
+ print("Debug mode is enabled")
+
# Model and Image Initialization
- models = {}
+ benchmark_results = {}
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_loader = ModelLoader(device=device)
img_processor = ImageProcessor(img_path=args.image_path, device=device)
@@ -47,83 +48,61 @@ def main():
# ONNX
if args.mode in ["onnx", "all"]:
- ort_session = init_onnx_model(args.onnx_path, model_loader, device)
- if args.mode != "all":
- benchmark_onnx_model(ort_session)
- predict_onnx_model(
- ort_session, img_batch, args.topk, model_loader.categories
- )
+ onnx_inference = ONNXInference(
+ model_loader, args.onnx_path, debug_mode=args.DEBUG
+ )
+
+ benchmark_results["ONNX (CPU)"] = onnx_inference.benchmark(img_batch)
+ onnx_inference.predict(img_batch)
# OpenVINO
if args.mode in ["ov", "all"]:
- # Check if ONNX model wasn't exported previously
- if not os.path.isfile(args.onnx_path):
- export_onnx_model(
- onnx_path=args.onnx_path, model_loader=model_loader, device=device
+ ov_inference = OVInference(model_loader, args.ov_path, debug_mode=args.DEBUG)
+
+ benchmark_results["OpenVINO (CPU)"] = ov_inference.benchmark(img_batch)
+ ov_inference.predict(img_batch)
+
+ # PyTorch CPU
+ if args.mode in ["cpu", "all"]:
+ pytorch_cpu_inference = PyTorchInference(
+ model_loader, device="cpu", debug_mode=args.DEBUG
+ )
+
+ benchmark_results["PyTorch (CPU)"] = pytorch_cpu_inference.benchmark(img_batch)
+ pytorch_cpu_inference.predict(img_batch)
+
+ # PyTorch CUDA + TRT
+ if torch.cuda.is_available():
+ if args.mode in ["cuda", "all"]:
+ print("Inside inference for CUDA...")
+ pytorch_cuda_inference = PyTorchInference(
+ model_loader, device=device, debug_mode=args.DEBUG
)
- ov_model = init_ov_model(args.onnx_path)
- if args.mode != "all":
- ov_benchmark = benchmark_ov_model(ov_model)
- predict_ov_model(
- ov_benchmark.compiled_model,
- img_batch,
- args.topk,
- model_loader.categories,
+ benchmark_results["PyTorch (CUDA)"] = pytorch_cuda_inference.benchmark(
+ img_batch
)
-
- # CUDA
- if args.mode in ["cuda", "all"]:
- # CUDA configurations
- cuda_configs = [
- {"device": "cpu", "precision": torch.float32, "is_trt": False},
- {"device": "cuda", "precision": torch.float32, "is_trt": False},
- {"device": "cuda", "precision": torch.float32, "is_trt": True},
- {"device": "cuda", "precision": torch.float16, "is_trt": True},
- ]
-
- for config in cuda_configs:
- device = config["device"]
- precision = config["precision"]
- is_trt = config["is_trt"]
-
- # check if CUDA is available
- if device.lower() == "cuda" and not CUDA_AVAILABLE:
- continue
-
- model = init_cuda_model(model_loader, device, precision)
-
- # If the configuration is not for TensorRT, store the model under a PyTorch key
- if not is_trt:
- models[f"PyTorch_{device}"] = model
- model = model.to(device)
- img_batch = img_batch.to(device)
- else:
- print("Compiling TensorRT model")
- batch_size = 1 if args.mode == "cuda" else 32
- model = torch_tensorrt.compile(
- model,
- inputs=[torch_tensorrt.Input((batch_size, 3, 224, 224), dtype=precision)],
- enabled_precisions={precision},
- truncate_long_and_double=True,
- require_full_compilation=True,
+ pytorch_cuda_inference.predict(img_batch)
+
+ # TensorRT
+ if args.mode in ["tensorrt", "all"]:
+ precisions = [torch.float16, torch.float32]
+ for precision in precisions:
+ tensorrt_inference = TensorRTInference(
+ model_loader,
+ device=device,
+ precision=precision,
+ debug_mode=args.DEBUG,
)
- # If it is for TensorRT, determine the mode (FP32 or FP16) and store under a TensorRT key
- mode = "fp32" if precision == torch.float32 else "fp16"
- models[f"trt_{mode}"] = model
- if args.mode != "all":
- predict_cuda_model(
- model, img_batch, args.topk, model_loader.categories, precision
+ benchmark_results[f"TRT_{precision}"] = tensorrt_inference.benchmark(
+ img_batch
)
+ tensorrt_inference.predict(img_batch)
- # Aggregate Benchmark (if mode is "all")
+ # Plot graph combining all results
if args.mode == "all":
- models["onnx"] = ort_session
- models["ov"] = ov_model
-
- results = run_all_benchmarks(models, img_batch)
- plot_benchmark_results(results)
+ plot_benchmark_results(benchmark_results)
if __name__ == "__main__":
diff --git a/prediction/__init__.py b/prediction/__init__.py
deleted file mode 100644
index e69de29..0000000
diff --git a/prediction/prediction_models.py b/prediction/prediction_models.py
deleted file mode 100644
index aaaf230..0000000
--- a/prediction/prediction_models.py
+++ /dev/null
@@ -1,32 +0,0 @@
-import onnxruntime as ort
-import openvino as ov
-import numpy as np
-import torch
-from typing import List
-from prediction.prediction_utils import make_prediction
-
-
-# Prediction Functions
-def predict_onnx_model(
- ort_session: ort.InferenceSession,
- img_batch: np.ndarray,
- topk: int,
- categories: List[str],
-):
- make_prediction(ort_session, img_batch.cpu().numpy(), topk, categories)
-
-
-def predict_ov_model(
- ov_model: ov.CompiledModel, img_batch: np.ndarray, topk: int, categories: List[str]
-):
- make_prediction(ov_model, img_batch.cpu().numpy(), topk, categories)
-
-
-def predict_cuda_model(
- cuda_model: torch.nn.Module,
- img_batch: torch.Tensor,
- topk: int,
- categories: List[str],
- precision: torch.dtype,
-):
- make_prediction(cuda_model, img_batch, topk, categories, precision)
diff --git a/prediction/prediction_utils.py b/prediction/prediction_utils.py
deleted file mode 100644
index 7b0f2e0..0000000
--- a/prediction/prediction_utils.py
+++ /dev/null
@@ -1,89 +0,0 @@
-import logging
-from typing import List, Tuple, Union, Dict, Any
-import openvino as ov
-import torch
-import onnxruntime as ort
-import numpy as np
-
-
-def make_prediction(
- model: Union[torch.nn.Module, ort.InferenceSession, ov.CompiledModel],
- img_batch: Union[torch.Tensor, np.ndarray],
- topk: int,
- categories: List[str],
- precision: torch.dtype = None,
-) -> None:
- """
- Make and print predictions for the given model, img_batch, topk, and categories.
-
- :param model: The model to make predictions with.
- :param img_batch: The batch of images to make predictions on.
- :param topk: The number of top predictions to show.
- :param categories: The list of categories to label the predictions.
- :param precision: The data type to be used for the predictions (typically torch.float32 or torch.float16) for PyTorch models.
- """
- is_onnx_model = isinstance(model, ort.InferenceSession)
- is_ov_model = isinstance(model, ov.CompiledModel)
-
- if is_onnx_model:
- logging.info(f"Running prediction for ONNX model")
- # Get the input name for the ONNX model.
- input_name = model.get_inputs()[0].name
-
- # Run the model with the properly named input.
- ort_inputs = {input_name: img_batch}
- ort_outs = model.run(None, ort_inputs)
-
- # Assuming the model returns a list with one array of class probabilities.
- if len(ort_outs) > 0:
- prob = ort_outs[0]
-
- # Checking if prob has more than one dimension and selecting the right one.
- if prob.ndim > 1:
- prob = prob[0]
-
- # Apply Softmax to get probabilities
- prob = np.exp(prob) / np.sum(np.exp(prob))
- elif is_ov_model:
- logging.info(f"Running prediction for OV model")
- # For OV, the input name is usually the first input
- input_name = next(iter(model.inputs))
- outputs = model(inputs={input_name: img_batch})
-
- # Assuming the model returns a dictionary with one key for class probabilities
- prob_key = next(iter(outputs))
- prob = outputs[prob_key]
-
- # Apply Softmax to get probabilities
- prob = np.exp(prob[0]) / np.sum(np.exp(prob[0]))
-
- else: # PyTorch Model
- params = list(model.parameters())
- if params:
- logging.info(f"Running prediction for PyTorch_{params[0].device}")
- elif isinstance(model, torch.nn.Module):
- logging.info(f"Running prediction for TensorRT_{precision} model")
- else:
- raise ValueError("Running prediction for an unknown model type")
-
- if isinstance(img_batch, np.ndarray):
- img_batch = torch.tensor(img_batch)
- else:
- img_batch = img_batch.clone().to(precision)
- model.eval()
- with torch.no_grad():
- outputs = model(img_batch.to(precision))
- prob = torch.nn.functional.softmax(outputs[0], dim=0)
- prob = prob.cpu().numpy()
-
- top_indices = prob.argsort()[-topk:][::-1]
- top_probs = prob[top_indices]
-
- for i in range(topk):
- probability = top_probs[i]
- if is_onnx_model:
- # Accessing the DataFrame by row number using .iloc[]
- class_label = categories.iloc[top_indices[i]].item()
- else:
- class_label = categories[0][int(top_indices[i])]
- logging.info(f"#{i + 1}: {int(probability * 100)}% {class_label}")
diff --git a/src/benchmark_class.py b/src/benchmark_class.py
index 47e8069..b1800cf 100644
--- a/src/benchmark_class.py
+++ b/src/benchmark_class.py
@@ -124,7 +124,7 @@ def run(self):
print("Starting benchmark ...")
timings = []
- for i in range(1, self.nruns+1):
+ for i in range(1, self.nruns + 1):
start_time = time.time()
_ = self.ort_session.run(None, {"input": input_data})
end_time = time.time()
@@ -190,7 +190,7 @@ def run(self):
# Benchmarking
total_time = 0
- for i in range(1, self.nruns+1):
+ for i in range(1, self.nruns + 1):
start_time = time.time()
_ = self.inference(self.dummy_input)
total_time += time.time() - start_time
diff --git a/src/inference_base.py b/src/inference_base.py
new file mode 100644
index 0000000..45f5c02
--- /dev/null
+++ b/src/inference_base.py
@@ -0,0 +1,127 @@
+import time
+import logging
+import numpy as np
+import torch
+
+
+class InferenceBase:
+ def __init__(
+ self,
+ model_loader,
+ onnx_path=None,
+ ov_path=None,
+ topk=5,
+ debug_mode=False,
+ batch_size=8,
+ ):
+ """
+ Base class for inference.
+
+ :param model_loader: Object responsible for loading the model and categories.
+ :param onnx_path: Path to the ONNX model (if applicable).
+ :param ov_path: Path to the OpenVINO model (if applicable).
+ :param topk: Number of top predictions to return.
+ :param debug_mode: If True, print additional debug information.
+ :param batch_size: How many input images to stack for benchmark
+ """
+ self.model_loader = model_loader
+ self.onnx_path = onnx_path
+ self.ov_path = ov_path
+ self.categories = model_loader.categories
+ self.model = self.load_model()
+ self.topk = topk
+ self.debug_mode = debug_mode
+ self.batch_size = batch_size
+
+ def load_model(self):
+ """
+ Load the model. This method should be implemented by subclasses.
+ """
+ raise NotImplementedError
+
+ def predict(self, input_data, is_benchmark=False):
+ """
+ Run prediction on the input data.
+
+ :param input_data: Data to run the prediction on.
+ :param is_benchmark: If True, the prediction is part of a benchmark run.
+ """
+ if not is_benchmark:
+ logging.info(f"Running prediction for {self.__class__.__name__} model")
+ if self.debug_mode:
+ print(f"Running prediction for {self.__class__.__name__} model")
+
+ def benchmark(self, input_data, num_runs=100, warmup_runs=50):
+ """
+ Benchmark the prediction performance.
+
+ :param input_data: Data to run the benchmark on.
+ :param num_runs: Number of runs for the benchmark.
+ :param warmup_runs: Number of warmup runs before the benchmark.
+ :return: Average inference time in milliseconds.
+ """
+ # Expand batch size to stack identical images to load the system for benchmark
+ if len(input_data.shape) == 4:
+ input_data = input_data.squeeze(0)
+ input_batch = torch.stack([input_data] * self.batch_size)
+
+ # Warmup
+ logging.info(f"Starting warmup for {self.__class__.__name__} inference...")
+ for _ in range(warmup_runs):
+ for img in input_batch:
+ self.predict(img.unsqueeze(0), is_benchmark=True)
+
+ # Benchmark
+ logging.info(f"Starting benchmark for {self.__class__.__name__} inference...")
+ start_time = time.time()
+ for _ in range(num_runs):
+ for img in input_batch:
+ self.predict(img.unsqueeze(0), is_benchmark=True)
+ avg_time = (
+ (time.time() - start_time) / (num_runs * self.batch_size)
+ ) * 1000 # Convert to ms
+
+ logging.info(f"Average inference time for {num_runs} runs: {avg_time:.4f} ms")
+ if self.debug_mode:
+ print(
+ f"Average inference time for {self.__class__.__name__} and {num_runs} runs: {avg_time:.4f} ms"
+ )
+
+ # Calculate throughput
+ total_samples = input_data.size(0) * num_runs
+ total_time_seconds = time.time() - start_time
+ throughput = total_samples / total_time_seconds
+
+ logging.info(
+ f"Throughput for {self.__class__.__name__}: {throughput:.2f} samples/sec"
+ )
+ if self.debug_mode:
+ print(
+ f"Throughput for {self.__class__.__name__}: {throughput:.2f} samples/sec"
+ )
+
+ return avg_time, throughput
+
+ def get_top_predictions(self, prob: np.ndarray, is_benchmark=False):
+ """
+ Get the top predictions based on the probabilities.
+
+ :param prob: Array of probabilities.
+ :param is_benchmark: If True, the method is called during a benchmark run.
+ :return: Array of probabilities.
+ """
+ if is_benchmark:
+ return None
+
+ # Get the top indices and probabilities
+ top_indices = prob.argsort()[-self.topk :][::-1]
+ top_probs = prob[top_indices]
+
+ # Log and print the top predictions
+ for i in range(self.topk):
+ probability = top_probs[i]
+ class_label = self.categories[0][int(top_indices[i])]
+ logging.info(f"#{i + 1}: {int(probability * 100)}% {class_label}")
+ if self.debug_mode:
+ print(f"#{i + 1}: {int(probability * 100)}% {class_label}")
+ return prob
diff --git a/src/model.py b/src/model.py
index 7ec31a0..4eb1415 100644
--- a/src/model.py
+++ b/src/model.py
@@ -1,5 +1,4 @@
import pandas as pd
-import torch
from torchvision import models
@@ -18,20 +17,3 @@ def __init__(self, device: str = "cuda") -> None:
"https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt",
header=None,
)
-
- def predict(self, img_batch: torch.Tensor) -> torch.Tensor:
- """
- Make a prediction on the provided image batch.
-
- :param img_batch: A batch of images to make predictions on.
- :return: A tensor representing the probabilities of the predictions.
- """
- # Set the model to evaluation mode and make a prediction
- self.model.eval()
- with torch.no_grad():
- outputs = self.model(img_batch)
-
- # Compute the softmax probabilities
- prob = torch.nn.functional.softmax(outputs[0], dim=0)
-
- return prob
diff --git a/src/onnx_exporter.py b/src/onnx_exporter.py
index c11bec2..7b051e0 100644
--- a/src/onnx_exporter.py
+++ b/src/onnx_exporter.py
@@ -1,6 +1,6 @@
+import os
import torch
from torch.onnx import export, TrainingMode
-from torchvision import models
class ONNXExporter:
@@ -15,6 +15,9 @@ def export_model(self):
# Define dummy input tensor
x = torch.randn(1, 3, 224, 224).to(self.device)
+ if not os.path.exists(self.onnx_path):
+ os.makedirs("models", exist_ok=True)
+
# Export model as ONNX
export(
self.model,
diff --git a/src/onnx_inference.py b/src/onnx_inference.py
new file mode 100644
index 0000000..1329fcf
--- /dev/null
+++ b/src/onnx_inference.py
@@ -0,0 +1,64 @@
+import os
+import logging
+import onnxruntime as ort
+import numpy as np
+from src.inference_base import InferenceBase
+from src.onnx_exporter import ONNXExporter
+
+
+class ONNXInference(InferenceBase):
+ def __init__(self, model_loader, model_path, debug_mode=False):
+ """
+ Initialize the ONNXInference object.
+
+ :param model_loader: Object responsible for loading the model and categories.
+ :param model_path: Path to the ONNX model.
+ :param debug_mode: If True, print additional debug information.
+ """
+ super().__init__(model_loader, onnx_path=model_path, debug_mode=debug_mode)
+
+ def load_model(self):
+ """
+ Load the ONNX model. If the model does not exist, export it.
+
+ :return: Loaded ONNX model.
+ """
+ if not os.path.exists(self.onnx_path):
+ onnx_exporter = ONNXExporter(
+ self.model_loader.model, self.model_loader.device, self.onnx_path
+ )
+ onnx_exporter.export_model()
+ return ort.InferenceSession(self.onnx_path, providers=["CPUExecutionProvider"])
+
+ def predict(self, input_data, is_benchmark=False):
+ """
+ Run prediction on the input data using the ONNX model.
+
+ :param input_data: Data to run the prediction on.
+ :param is_benchmark: If True, the prediction is part of a benchmark run.
+ :return: Top predictions based on the probabilities.
+ """
+ super().predict(input_data, is_benchmark)
+
+ input_name = self.model.get_inputs()[0].name
+ ort_inputs = {input_name: input_data.cpu().numpy()}
+ ort_outs = self.model.run(None, ort_inputs)
+
+ # Extract probabilities from the output and normalize them
+ if len(ort_outs) > 0:
+ prob = ort_outs[0]
+ if prob.ndim > 1:
+ prob = prob[0]
+ prob = np.exp(prob) / np.sum(np.exp(prob))
+ return self.get_top_predictions(prob, is_benchmark)
+
+ def benchmark(self, input_data, num_runs=100, warmup_runs=50):
+ """
+ Benchmark the prediction performance using the ONNX model.
+
+ :param input_data: Data to run the benchmark on.
+ :param num_runs: Number of runs for the benchmark.
+ :param warmup_runs: Number of warmup runs before the benchmark.
+ :return: Average inference time in milliseconds.
+ """
+ return super().benchmark(input_data, num_runs, warmup_runs)
diff --git a/src/ov_inference.py b/src/ov_inference.py
new file mode 100644
index 0000000..5d94bb6
--- /dev/null
+++ b/src/ov_inference.py
@@ -0,0 +1,71 @@
+import os
+import numpy as np
+import openvino as ov
+from src.inference_base import InferenceBase
+from src.onnx_exporter import ONNXExporter
+from src.ov_exporter import OVExporter
+
+
+class OVInference(InferenceBase):
+ def __init__(self, model_loader, model_path, debug_mode=False):
+ """
+ Initialize the OVInference object.
+
+ :param model_loader: Object responsible for loading the model and categories.
+ :param model_path: Path to the OpenVINO model.
+ :param debug_mode: If True, print additional debug information.
+ """
+ super().__init__(model_loader, ov_path=model_path, debug_mode=debug_mode)
+ self.core = ov.Core()
+ self.ov_model = self.load_model()
+ self.compiled_model = self.core.compile_model(self.ov_model, "AUTO")
+
+ def load_model(self):
+ """
+ Load the OpenVINO model. If the ONNX model does not exist, export it.
+
+ :return: Loaded OpenVINO model.
+ """
+ # Determine the path for the ONNX model
+ self.onnx_path = self.ov_path.replace(".ov", ".onnx")
+
+ # Export ONNX model if it doesn't exist
+ if not os.path.exists(self.onnx_path):
+ onnx_exporter = ONNXExporter(
+ self.model_loader.model, self.model_loader.device, self.onnx_path
+ )
+ onnx_exporter.export_model()
+
+ ov_exporter = OVExporter(self.onnx_path)
+ return ov_exporter.export_model()
+
+ def predict(self, input_data, is_benchmark=False):
+ """
+ Run prediction on the input data using the OpenVINO model.
+
+ :param input_data: Data to run the prediction on.
+ :param is_benchmark: If True, the prediction is part of a benchmark run.
+ :return: Top predictions based on the probabilities.
+ """
+ super().predict(input_data, is_benchmark=is_benchmark)
+
+ input_name = next(iter(self.compiled_model.inputs))
+ outputs = self.compiled_model(inputs={input_name: input_data.cpu().numpy()})
+
+ # Extract probabilities from the output and normalize them
+ prob_key = next(iter(outputs))
+ prob = outputs[prob_key]
+ prob = np.exp(prob[0]) / np.sum(np.exp(prob[0]))
+
+ return self.get_top_predictions(prob, is_benchmark)
+
+ def benchmark(self, input_data, num_runs=100, warmup_runs=50):
+ """
+ Benchmark the prediction performance using the OpenVINO model.
+
+ :param input_data: Data to run the benchmark on.
+ :param num_runs: Number of runs for the benchmark.
+ :param warmup_runs: Number of warmup runs before the benchmark.
+ :return: Average inference time in milliseconds.
+ """
+ return super().benchmark(input_data, num_runs, warmup_runs)
diff --git a/src/pytorch_inference.py b/src/pytorch_inference.py
new file mode 100644
index 0000000..9984594
--- /dev/null
+++ b/src/pytorch_inference.py
@@ -0,0 +1,55 @@
+import torch
+from src.inference_base import InferenceBase
+
+
+class PyTorchInference(InferenceBase):
+ def __init__(self, model_loader, device="cpu", debug_mode=False):
+ """
+ Initialize the PyTorchInference object.
+
+ :param model_loader: Object responsible for loading the model and categories.
+ :param device: The device to load the model on ("cpu" or "cuda").
+ :param debug_mode: If True, print additional debug information.
+ """
+ self.device = device
+ super().__init__(model_loader, debug_mode=debug_mode)
+ self.model = self.load_model()
+
+ def load_model(self):
+ """
+ Load the PyTorch model to the specified device.
+
+ :return: Loaded PyTorch model.
+ """
+ return self.model_loader.model.to(self.device)
+
+ def predict(self, input_data, is_benchmark=False):
+ """
+ Run prediction on the input data using the PyTorch model.
+
+ :param input_data: Data to run the prediction on.
+ :param is_benchmark: If True, the prediction is part of a benchmark run.
+ :return: Top predictions based on the probabilities.
+ """
+ super().predict(input_data, is_benchmark=is_benchmark)
+
+ self.model.eval()
+ with torch.no_grad():
+ outputs = self.model(input_data.to(self.device))
+
+ # Compute the softmax probabilities
+ prob = torch.nn.functional.softmax(outputs[0], dim=0)
+ prob = prob.cpu().numpy()
+
+ return self.get_top_predictions(prob, is_benchmark)
+
+ def benchmark(self, input_data, num_runs=100, warmup_runs=50):
+ """
+ Benchmark the prediction performance using the PyTorch model.
+
+ :param input_data: Data to run the benchmark on.
+ :param num_runs: Number of runs for the benchmark.
+ :param warmup_runs: Number of warmup runs before the benchmark.
+ :return: Average inference time in milliseconds.
+ """
+ return super().benchmark(input_data, num_runs, warmup_runs)
diff --git a/src/tensorrt_inference.py b/src/tensorrt_inference.py
new file mode 100644
index 0000000..c68ef93
--- /dev/null
+++ b/src/tensorrt_inference.py
@@ -0,0 +1,80 @@
+import torch
+import logging
+from src.inference_base import InferenceBase
+
+# Check for CUDA and TensorRT availability
+CUDA_AVAILABLE = torch.cuda.is_available()
+if CUDA_AVAILABLE:
+ try:
+ import torch_tensorrt as trt
+ except ImportError:
+ logging.warning("torch-tensorrt is not installed. Running on CPU mode only.")
+ CUDA_AVAILABLE = False
+
+
+class TensorRTInference(InferenceBase):
+ def __init__(self, model_loader, device, precision=torch.float32, debug_mode=False):
+ """
+ Initialize the TensorRTInference object.
+
+ :param model_loader: Object responsible for loading the model and categories.
+ :param precision: Precision mode for TensorRT (default is torch.float32).
+ """
+ self.precision = precision
+ self.device = device
+ super().__init__(model_loader, debug_mode=debug_mode)
+ if CUDA_AVAILABLE:
+ self.load_model()
+
+ def load_model(self):
+ """
+ Load and convert the PyTorch model to TensorRT format.
+ """
+ # Load the PyTorch model
+ self.model = self.model_loader.model.to(self.device).eval()
+
+ # Convert the model to the desired precision
+ if self.precision == torch.float16:
+ self.model = self.model.half()
+ elif self.precision == torch.float32:
+ self.model = self.model.float()
+
+ # Convert the input tensor for tracing to the desired precision
+ tracing_input = torch.randn((1, 3, 224, 224)).to(self.device).to(self.precision)
+
+ self.model = torch.jit.trace(self.model, [tracing_input])
+
+ # Convert the PyTorch model to TensorRT
+ self.model = trt.ts.compile(
+ self.model, inputs=[trt.Input((1, 3, 224, 224), dtype=self.precision)]
+ )
+
+ def predict(self, input_data, is_benchmark=False):
+ """
+ Run prediction on the input data using the TensorRT model.
+
+ :param input_data: Data to run the prediction on.
+ :param is_benchmark: If True, the prediction is part of a benchmark run.
+ :return: Top predictions based on the probabilities.
+ """
+ super().predict(input_data, is_benchmark=is_benchmark)
+
+ with torch.no_grad():
+ outputs = self.model(input_data.to(self.device).to(dtype=self.precision))
+
+ # Compute the softmax probabilities
+ prob = torch.nn.functional.softmax(outputs[0], dim=0)
+ prob = prob.cpu().numpy()
+
+ return self.get_top_predictions(prob, is_benchmark)
+
+ def benchmark(self, input_data, num_runs=100, warmup_runs=50):
+ """
+ Benchmark the prediction performance using the TensorRT model.
+
+ :param input_data: Data to run the benchmark on.
+ :param num_runs: Number of runs for the benchmark.
+ :param warmup_runs: Number of warmup runs before the benchmark.
+ :return: Average inference time in milliseconds.
+ """
+ return super().benchmark(input_data, num_runs, warmup_runs)