diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/.gitignore b/Qwen-Qwen2.5-1.5B-Instruct/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/README.md b/Qwen-Qwen2.5-1.5B-Instruct/aitk/README.md new file mode 100644 index 00000000..bd256350 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/README.md @@ -0,0 +1,160 @@ +# Qwen2.5-1.5B-Instruct Model Optimization + +This repository demonstrates the optimization of the [Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into three main workflows: + +- QDQ for AMD NPU +- PTQ + AOT for QNN NPU + + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** +- OpenVINO for Intel NPU + + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` + +## **QDQ Model with 4-bit Weights & 16-bit Activations** + +This workflow produces an ONNX QDQ model that is agnostic to the target hardware and accelerator, making it suitable for general inference. + +### **Optimization Process** + +The model is optimized using **weight-only quantization** and **activation quantization** for efficient deployment. The process includes: + +1. **Weight Rotation ([QuaRot](https://arxiv.org/abs/2404.00456))** + - Reduces outliers from weights and hidden states to enhance quantization efficiency. + +2. **4-bit Per-Channel Symmetric Quantization ([GPTQ](https://arxiv.org/abs/2210.17323))** + - Reduces transformer layer size while preserving accuracy. + +3. **ONNX Graph Capture** + - Exports the model to ONNX for further optimization. + +4. **4-bit Block-wise Quantization** + - Applies weight-only quantization to the **embedding layer** and **language modeling head**. + +5. **16-bit Activation Quantization** + - Uses 16-bit activations to balance precision and efficiency. + +The final output is a **QDQ model** with **4-bit weights** and **16-bit activations**. This model also leverages [GroupQueryAttention (GQA)](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.GroupQueryAttention) for efficient long-context processing and long-sequence generation. + +### **Handling Dynamic and Static Input Shapes** + +NPUs require **precompiled graphs**, meaning the model must use **static input shapes**. However, **text generation** involves two distinct processing stages: + +- **Prefill (Prompt Processing)**: Processes multiple tokens simultaneously. +- **Token Generation (Iteration)**: Processes one token at a time. + +To support both efficiently, we create **two model instances**: +1. **Prefill model**: Optimized for batch processing. +2. **Token generation model**: Optimized for one-token-at-a-time inference. + +## **PTQ + AOT Compilation for Qualcomm NPUs using QNN EP** + +This process extends the [**QDQ Model with 4-bit Weights & 16-bit Activations**](#qdq-model-with-4-bit-weights--16-bit-activations) by compiling it specifically for **Qualcomm NPUs** using the **QNN Execution Provider**. + +### **Resource Optimization Strategy** + +To maximize efficiency while supporting dynamic input handling: + +- **Embedding Layer & Language Model Head** → Executed on CPU (handles dynamic input). +- **Transformer Layers** → Executed on NPU (requires static input shapes). +- **Weight Sharing** → Prefill & token generation models reuse weights to minimize memory usage. + +> ⚠️ **Note:** GQA is an ONNX Runtime *contrib operator* and must be executed on the CPU. The model graph is partitioned into **CPU (GQA nodes)** and **NPU (other nodes)** for execution. + +### **Compilation for Qualcomm NPU Deployment** + +Once optimized, the model is compiled for Qualcomm NPUs using **ONNX Runtime QNNExecutionProvider**. The steps include: + +1. **Split the Quantized Model** → Divide into three parts: + - **Embedding Layer** + - **Transformer Layers** + - **Language Model Head** +2. **Set Static Input Shapes**: + - **(1, 64)** for prefill (batch size, sequence length). + - **(1, 1)** for token generation. +3. **Compile using QNNExecutionProvider**: + - Leverages **weight sharing** across the prefill and token generation models. + +### **Usage** + +This workflow is configured using the `qnn_config.json` file. It contains all of the quantization and compilation steps. It requires two separate Python environments described below. + +#### A workable version + +- python=3.10 +- CUDA=12.1 +- cudnn=9.2.0 + +#### Quantization Python Environment Setup + +Quantization is resource-intensive and requires GPU acceleration. In an [x64 Python environment with Olive installed](https://github.com/microsoft/Olive/blob/main/examples/README.md#important), install the required packages: + +```bash +# Install common dependencies +pip install -r requirements.txt + +# Install ONNX Runtime GPU packages +pip install "onnxruntime-gpu>=1.21.0" "onnxruntime-genai-cuda>=0.6.0" + +# AutoGPTQ: Install from source (stable package may be slow for weight packing) +# Disable CUDA extension build (not required) +# Linux +export BUILD_CUDA_EXT=0 +# Windows +# set BUILD_CUDA_EXT=0 + +# Install AutoGPTQ from source +pip install --no-build-isolation git+https://github.com/PanQiWei/AutoGPTQ.git +``` + +> ⚠️ Only set up the environment and install the packages. Do not run the `olive run` command at this point. + +#### AOT Compilation Python Environment Setup + +Model compilation using QNN Execution Provider requires a Python environment with onnxruntime-qnn installed. In a separate Python environment with Olive installed, install the required packages: + +```bash +# Install ONNX Runtime QNN +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +``` + +Replace `/path/to/qnn/env/bin` in `qnn_config.json` with the path to the directory containing your QNN environment's Python executable. This path can be found by running the following command in the environment: + +```bash +# Linux +command -v python +# Windows +# where python +``` + +This command will return the path to the Python executable. Set the parent directory of the executable as the `/path/to/qnn/env/bin` in the config file. + +#### **Run the Quantization + Compilation Config** + +Activate the **Quantization Python Environment** and run the workflow: + +```bash +olive run --config qnn_config.json +``` + +Olive will run the AOT compilation step in the **AOT Compilation Python Environment** specified in the config file using a subprocess. All other steps will run in the **Quantization Python Environment** natively. + +✅ Optimized model saved in: `./model` + +> ⚠️ If optimization fails due to out of memory, please remove `calibration_providers` in config file. + +> ⚠️ If optimization fails during context binary generation, rerun the command. The process will resume from the last completed step. + +### **Inference** + +The optimized model can be used for inference using ONNX Runtime QNNExecutionProvider and ONNX Runtime GenAI. **Inference must be run on a Windows Copilot+ PC with a Qualcomm NPU.** + +#### **Install Required Packages (arm64 Python)** +```bash +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +pip install "onnxruntime-genai>=0.7.0rc2" +``` + +#### **Run Console-Based Chat Interface** +Execute the provided `inference_sample.ipynb` notebook. + + diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/_copy.json.config b/Qwen-Qwen2.5-1.5B-Instruct/aitk/_copy.json.config new file mode 100644 index 00000000..c28c58db --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/_copy.json.config @@ -0,0 +1,144 @@ +{ + "copies": [ + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/model_project.config", + "dst": "model_project.config", + "replacements": [ + { + "find": "deepseek_qnn_config", + "replace": "qwen2_5_qnn_config" + }, + { + "find": "deepseek_vitis_ai_config", + "replace": "qwen2_5_vitis_ai_config" + }, + { + "find": "deepseek_ov_config", + "replace": "qwen2_5_ov_config" + }, + { + "find": "deepseek_dml_config", + "replace": "qwen2_5_dml_config" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_qnn_config.json", + "dst": "qwen2_5_qnn_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "Qwen/Qwen2.5-1.5B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/qwen2_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_qnn_config.json.config", + "dst": "qwen2_5_qnn_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_vitis_ai_config.json", + "dst": "qwen2_5_vitis_ai_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "Qwen/Qwen2.5-1.5B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/qwen2_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_vitis_ai_config.json.config", + "dst": "qwen2_5_vitis_ai_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json", + "dst": "qwen2_5_ov_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "Qwen/Qwen2.5-1.5B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/qwen2_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json.config", + "dst": "qwen2_5_ov_config.json.config", + "replacements": [ + { + "find": "deepseek/openvino/DeepSeek-R1-Distill-Qwen-1.5B_context_ov_dynamic_sym_gs128_bkp_int8_sym_r1.json", + "replace": "qwen2_5/openvino/Qwen2.5-1.5B-instruct_context_ov_dynamic_sym_bkp_int8_sym_r1.json" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_dml_config.json", + "dst": "qwen2_5_dml_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "Qwen/Qwen2.5-1.5B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/qwen2_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_dml_config.json.config", + "dst": "qwen2_5_dml_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/README.md", + "dst": "README.md", + "replacements": [ + { + "find": "# DeepSeek-R1-Distill-Qwen-1.5B Model Optimization", + "replace": "# Qwen2.5-1.5B-Instruct Model Optimization" + }, + { + "find": "[DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)", + "replace": "[Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct)" + }, + { + "find": "> ⚠️ If got 6033 error, replace `genai_config.json` in `./model` folder", + "replace": "" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/requirements.txt", + "dst": "requirements.txt", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/inference_sample.ipynb", + "dst": "inference_sample.ipynb", + "replacements": [ + { + "find": "<|User|>{input}<|Assistant|>", + "replace": "<|im_start|>user\\\\n{input}<|im_end|>\\\\n<|im_start|>assistant\\\\n" + } + ] + } + ] +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/inference_model.json b/Qwen-Qwen2.5-1.5B-Instruct/aitk/inference_model.json new file mode 100644 index 00000000..7a3bb4a0 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/inference_model.json @@ -0,0 +1,31 @@ +{ + "Name": "Qwen2.5-1.5B-Instruct", + "PromptTemplate": { + "assistant": "{Content}", + "prompt": "<|im_start|>user\n{Content}<|im_end|>\n<|im_start|>assistant\n" + }, + "ParameterSchema": { + "enabled": [ + { + "name": "max_tokens", + "default": 512 + }, + { + "name": "temperature", + "default": 0.6 + }, + { + "name": "top_p", + "default": 0.95 + }, + { + "name": "top_k", + "default": 5 + }, + { + "name": "random_seed", + "default": 3328 + } + ] + } +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/inference_sample.ipynb b/Qwen-Qwen2.5-1.5B-Instruct/aitk/inference_sample.ipynb new file mode 100644 index 00000000..7757249e --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/inference_sample.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = 'Who is Isaac Newton?'\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "model_folder = \"./model\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime_genai as og\n", + "import json\n", + "import time\n", + "from pathlib import Path\n", + "\n", + "def get_session_options(obj):\n", + " if type(obj) is dict:\n", + " for k, v in obj.items():\n", + " if k == \"session_options\":\n", + " yield v\n", + " else:\n", + " for x in get_session_options(v):\n", + " yield x\n", + " elif type(obj) is list:\n", + " for v in obj:\n", + " for x in get_session_options(v):\n", + " yield x\n", + "\n", + "\n", + "def remove_provider_options(model_path):\n", + " genai_config_path = Path(model_path) / \"genai_config.json\"\n", + " data = json.loads(genai_config_path.read_text())\n", + " for session_option in get_session_options(data):\n", + " if 'provider_options' in session_option:\n", + " session_option['provider_options'] = [{k: dict() for k in opts.keys()} for opts in session_option['provider_options']]\n", + "\n", + " json.dump(data, genai_config_path.open(\"w\"), indent=4)\n", + "\n", + "if ExecutionProvider == \"QNNExecutionProvider\":\n", + " remove_provider_options(model_folder)\n", + "\n", + "# Load the base model and tokenizer\n", + "model = og.Model(model_folder)\n", + "tokenizer = og.Tokenizer(model)\n", + "tokenizer_stream = tokenizer.create_stream()\n", + "\n", + "# Set the max length to something sensible by default,\n", + "# since otherwise it will be set to the entire context length\n", + "search_options = {}\n", + "search_options[\"max_length\"] = 200\n", + "\n", + "chat_template = \"<|im_start|>user\\n{input}<|im_end|>\\n<|im_start|>assistant\\n\"\n", + "\n", + "# Generate prompt (prompt template + input)\n", + "prompt = f\"{chat_template.format(input=text)}\"\n", + "\n", + "# Encode the prompt using the tokenizer\n", + "input_tokens = tokenizer.encode(prompt)\n", + "\n", + "# Create params and generator\n", + "params = og.GeneratorParams(model)\n", + "params.set_search_options(**search_options)\n", + "generator = og.Generator(model, params)\n", + "\n", + "# Append input tokens to the generator\n", + "generator.append_tokens(input_tokens)\n", + "\n", + "print(\"\")\n", + "print(\"Output: \", end=\"\", flush=True)\n", + "\n", + "token_times = []\n", + "\n", + "# Stream the output\n", + "while not generator.is_done():\n", + " start_time = time.time()\n", + " generator.generate_next_token()\n", + " end_time = time.time()\n", + " \n", + " # Record the time for this token generation\n", + " token_time = end_time - start_time\n", + " token_times.append(token_time)\n", + "\n", + " new_token = generator.get_next_tokens()[0]\n", + " print(tokenizer_stream.decode(new_token), end=\"\", flush=True)\n", + "\n", + "print()\n", + "\n", + "# Calculate and display timing statistics\n", + "if token_times:\n", + " total_tokens = len(token_times)\n", + " avg_time = sum(token_times) / total_tokens\n", + " \n", + " print(f\"Total tokens generated: {total_tokens}\")\n", + " print(f\"Average time per token: {avg_time:.4f} seconds\")\n", + " print(f\"Tokens per second: {total_tokens / sum(token_times):.2f}\")\n", + "\n", + "del generator\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/info.yml b/Qwen-Qwen2.5-1.5B-Instruct/aitk/info.yml new file mode 100644 index 00000000..8e284e83 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/info.yml @@ -0,0 +1,20 @@ +keywords: + aitk +arch: qwen2 +recipes: + - file: "qwen2_5_qnn_config.json" + device: npu + ep: QNNExecutionProvider + - file: "qwen2_5_vitis_ai_config.json" + device: npu + ep: VitisAIExecutionProvider + - file: "qwen2_5_ov_config.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "qwen2_5_dml_config.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/Qwen/Qwen2.5-1.5B-Instruct" + version: 1 diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/model_project.config b/Qwen-Qwen2.5-1.5B-Instruct/aitk/model_project.config new file mode 100644 index 00000000..68672843 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/model_project.config @@ -0,0 +1,24 @@ +{ + "workflows": [ + { + "file": "qwen2_5_qnn_config.json", + "templateName": "qwen2_5_qnn_config" + }, + { + "file": "qwen2_5_vitis_ai_config.json", + "templateName": "qwen2_5_vitis_ai_config" + }, + { + "file": "qwen2_5_ov_config.json", + "templateName": "qwen2_5_ov_config" + }, + { + "file": "qwen2_5_dml_config.json", + "templateName": "qwen2_5_dml_config" + } + ], + "modelInfo": { + "id": "huggingface/Qwen/Qwen2.5-1.5B-Instruct", + "version": 1 + } +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_dml_config.json b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_dml_config.json new file mode 100644 index 00000000..4e7b0265 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_dml_config.json @@ -0,0 +1,46 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "Qwen/Qwen2.5-1.5B-Instruct" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "passes": { + "q": { + "type": "AutoAWQQuantizer" + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4" + } + }, + "host": "host_system", + "target": "target_system", + "log_severity_level": 1, + "output_dir": "model/qwen2_5", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_dml_config.json.config b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_dml_config.json.config new file mode 100644 index 00000000..5778ef75 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_dml_config.json.config @@ -0,0 +1,48 @@ +{ + "name": "Convert to DirectML", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "executeRuntimeFeatures": [ + "AutoAwq" + ], + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_ov_config.json b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_ov_config.json new file mode 100644 index 00000000..1d55d610 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_ov_config.json @@ -0,0 +1,56 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "Qwen/Qwen2.5-1.5B-Instruct" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + }, + "ov_quant_config": { + "weight_format": "int4", + "group_size": 128, + "dataset": "wikitext2", + "ratio": 1, + "sym": true, + "trust_remote_code": true, + "awq": false, + "scale_estimation": false, + "sensitivity_metric": "weight_quantization_error", + "backup_precision": "int8_asym" + } + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "static": false, + "reuse_cache": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "keep_ov_dynamic_dims": true, + "ov_version": "2025.1", + "reuse_cache": true + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/qwen2_5" +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_ov_config.json.config b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_ov_config.json.config new file mode 100644 index 00000000..b95b828a --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_ov_config.json.config @@ -0,0 +1,153 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "qwen2_5/openvino/Qwen2.5-1.5B-instruct_context_ov_dynamic_sym_bkp_int8_sym_r1.json", + "isLLM": true, + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ] + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": { + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_qnn_config.json b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_qnn_config.json new file mode 100644 index 00000000..d84eb1fa --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_qnn_config.json @@ -0,0 +1,132 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "Qwen/Qwen2.5-1.5B-Instruct" + }, + "systems": { + "qnn_system": { + "type": "PythonEnvironment", + "python_environment_path": "/path/to/qnn/env/bin", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 4, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64 + }, + "cb": { + "type": "EPContextBinaryGenerator", + "provider_options": { + "htp_performance_mode": "burst", + "htp_graph_finalization_optimization_mode": "3", + "soc_model": "60" + }, + "session_options": { + "intra_op_num_threads": 2, + "inter_op_num_threads": 1 + }, + "weight_sharing": true + }, + "cp": { + "type": "ComposeOnnxModels" + } + }, + "target": "qnn_system", + "log_severity_level": 1, + "output_dir": "model/qwen2_5", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_qnn_config.json.config b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_qnn_config.json.config new file mode 100644 index 00000000..032429d1 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_qnn_config.json.config @@ -0,0 +1,197 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "phi3_5/qnn_config.json", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isQNNLLM": true, + "isGPURequired": true, + "runtimeOverwrite": { + "autoGenerated": true, + "pyEnvPath": "systems.qnn_system.python_environment_path", + "executeEp": "CUDAExecutionProvider", + "evaluateUsedInExecute": true + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "pyEnvRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_vitis_ai_config.json b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_vitis_ai_config.json new file mode 100644 index 00000000..d49375ec --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_vitis_ai_config.json @@ -0,0 +1,134 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "Qwen/Qwen2.5-1.5B-Instruct" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 1, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "int4", + "quant_type": "QuaRot" + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64, + "group_session_options": { + "log_id": "onnxruntime-genai", + "provider_options": [ + { + "VitisAI": {} + } + ], + "graph_optimization_level": "ORT_ENABLE_ALL" + } + } + }, + "target": "local_system", + "log_severity_level": 1, + "output_dir": "model/qwen2_5", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_vitis_ai_config.json.config b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_vitis_ai_config.json.config new file mode 100644 index 00000000..f6624c83 --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/qwen2_5_vitis_ai_config.json.config @@ -0,0 +1,191 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "phi3_5/qdq_config_vitis_ai.json", + "isLLM": true, + "evalRuntime": "AMDNPU", + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "runtimeOverwrite": { + "executeEp": "CUDAExecutionProvider" + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/Qwen-Qwen2.5-1.5B-Instruct/aitk/requirements.txt b/Qwen-Qwen2.5-1.5B-Instruct/aitk/requirements.txt new file mode 100644 index 00000000..03275c3e --- /dev/null +++ b/Qwen-Qwen2.5-1.5B-Instruct/aitk/requirements.txt @@ -0,0 +1,2 @@ +datasets +optimum diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/.gitignore b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/README.md b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/README.md new file mode 100644 index 00000000..8de94d1c --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/README.md @@ -0,0 +1,160 @@ +# DeepSeek-R1-Distill-Qwen-1.5B Model Optimization + +This repository demonstrates the optimization of the [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into three main workflows: + +- QDQ for AMD NPU +- PTQ + AOT for QNN NPU + + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** +- OpenVINO for Intel NPU + + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` + +## **QDQ Model with 4-bit Weights & 16-bit Activations** + +This workflow produces an ONNX QDQ model that is agnostic to the target hardware and accelerator, making it suitable for general inference. + +### **Optimization Process** + +The model is optimized using **weight-only quantization** and **activation quantization** for efficient deployment. The process includes: + +1. **Weight Rotation ([QuaRot](https://arxiv.org/abs/2404.00456))** + - Reduces outliers from weights and hidden states to enhance quantization efficiency. + +2. **4-bit Per-Channel Symmetric Quantization ([GPTQ](https://arxiv.org/abs/2210.17323))** + - Reduces transformer layer size while preserving accuracy. + +3. **ONNX Graph Capture** + - Exports the model to ONNX for further optimization. + +4. **4-bit Block-wise Quantization** + - Applies weight-only quantization to the **embedding layer** and **language modeling head**. + +5. **16-bit Activation Quantization** + - Uses 16-bit activations to balance precision and efficiency. + +The final output is a **QDQ model** with **4-bit weights** and **16-bit activations**. This model also leverages [GroupQueryAttention (GQA)](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.GroupQueryAttention) for efficient long-context processing and long-sequence generation. + +### **Handling Dynamic and Static Input Shapes** + +NPUs require **precompiled graphs**, meaning the model must use **static input shapes**. However, **text generation** involves two distinct processing stages: + +- **Prefill (Prompt Processing)**: Processes multiple tokens simultaneously. +- **Token Generation (Iteration)**: Processes one token at a time. + +To support both efficiently, we create **two model instances**: +1. **Prefill model**: Optimized for batch processing. +2. **Token generation model**: Optimized for one-token-at-a-time inference. + +## **PTQ + AOT Compilation for Qualcomm NPUs using QNN EP** + +This process extends the [**QDQ Model with 4-bit Weights & 16-bit Activations**](#qdq-model-with-4-bit-weights--16-bit-activations) by compiling it specifically for **Qualcomm NPUs** using the **QNN Execution Provider**. + +### **Resource Optimization Strategy** + +To maximize efficiency while supporting dynamic input handling: + +- **Embedding Layer & Language Model Head** → Executed on CPU (handles dynamic input). +- **Transformer Layers** → Executed on NPU (requires static input shapes). +- **Weight Sharing** → Prefill & token generation models reuse weights to minimize memory usage. + +> ⚠️ **Note:** GQA is an ONNX Runtime *contrib operator* and must be executed on the CPU. The model graph is partitioned into **CPU (GQA nodes)** and **NPU (other nodes)** for execution. + +### **Compilation for Qualcomm NPU Deployment** + +Once optimized, the model is compiled for Qualcomm NPUs using **ONNX Runtime QNNExecutionProvider**. The steps include: + +1. **Split the Quantized Model** → Divide into three parts: + - **Embedding Layer** + - **Transformer Layers** + - **Language Model Head** +2. **Set Static Input Shapes**: + - **(1, 64)** for prefill (batch size, sequence length). + - **(1, 1)** for token generation. +3. **Compile using QNNExecutionProvider**: + - Leverages **weight sharing** across the prefill and token generation models. + +### **Usage** + +This workflow is configured using the `qnn_config.json` file. It contains all of the quantization and compilation steps. It requires two separate Python environments described below. + +#### A workable version + +- python=3.10 +- CUDA=12.1 +- cudnn=9.2.0 + +#### Quantization Python Environment Setup + +Quantization is resource-intensive and requires GPU acceleration. In an [x64 Python environment with Olive installed](https://github.com/microsoft/Olive/blob/main/examples/README.md#important), install the required packages: + +```bash +# Install common dependencies +pip install -r requirements.txt + +# Install ONNX Runtime GPU packages +pip install "onnxruntime-gpu>=1.21.0" "onnxruntime-genai-cuda>=0.6.0" + +# AutoGPTQ: Install from source (stable package may be slow for weight packing) +# Disable CUDA extension build (not required) +# Linux +export BUILD_CUDA_EXT=0 +# Windows +# set BUILD_CUDA_EXT=0 + +# Install AutoGPTQ from source +pip install --no-build-isolation git+https://github.com/PanQiWei/AutoGPTQ.git +``` + +> ⚠️ Only set up the environment and install the packages. Do not run the `olive run` command at this point. + +#### AOT Compilation Python Environment Setup + +Model compilation using QNN Execution Provider requires a Python environment with onnxruntime-qnn installed. In a separate Python environment with Olive installed, install the required packages: + +```bash +# Install ONNX Runtime QNN +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +``` + +Replace `/path/to/qnn/env/bin` in `qnn_config.json` with the path to the directory containing your QNN environment's Python executable. This path can be found by running the following command in the environment: + +```bash +# Linux +command -v python +# Windows +# where python +``` + +This command will return the path to the Python executable. Set the parent directory of the executable as the `/path/to/qnn/env/bin` in the config file. + +#### **Run the Quantization + Compilation Config** + +Activate the **Quantization Python Environment** and run the workflow: + +```bash +olive run --config qnn_config.json +``` + +Olive will run the AOT compilation step in the **AOT Compilation Python Environment** specified in the config file using a subprocess. All other steps will run in the **Quantization Python Environment** natively. + +✅ Optimized model saved in: `./model` + +> ⚠️ If optimization fails due to out of memory, please remove `calibration_providers` in config file. + +> ⚠️ If optimization fails during context binary generation, rerun the command. The process will resume from the last completed step. + +### **Inference** + +The optimized model can be used for inference using ONNX Runtime QNNExecutionProvider and ONNX Runtime GenAI. **Inference must be run on a Windows Copilot+ PC with a Qualcomm NPU.** + +#### **Install Required Packages (arm64 Python)** +```bash +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +pip install "onnxruntime-genai>=0.7.0rc2" +``` + +#### **Run Console-Based Chat Interface** +Execute the provided `inference_sample.ipynb` notebook. + +> ⚠️ If got 6033 error, replace `genai_config.json` in `./model` folder diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_dml_config.json b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_dml_config.json new file mode 100644 index 00000000..e0e26360 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_dml_config.json @@ -0,0 +1,46 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "passes": { + "q": { + "type": "AutoAWQQuantizer" + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4" + } + }, + "host": "host_system", + "target": "target_system", + "log_severity_level": 1, + "output_dir": "model/deepseek", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_dml_config.json.config b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_dml_config.json.config new file mode 100644 index 00000000..5778ef75 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_dml_config.json.config @@ -0,0 +1,48 @@ +{ + "name": "Convert to DirectML", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "executeRuntimeFeatures": [ + "AutoAwq" + ], + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_ov_config.json b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_ov_config.json new file mode 100644 index 00000000..7cd11cf6 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_ov_config.json @@ -0,0 +1,56 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + }, + "ov_quant_config": { + "weight_format": "int4", + "group_size": 128, + "dataset": "wikitext2", + "ratio": 1, + "sym": true, + "trust_remote_code": true, + "awq": false, + "scale_estimation": false, + "sensitivity_metric": "weight_quantization_error", + "backup_precision": "int8_asym" + } + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "static": false, + "reuse_cache": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "keep_ov_dynamic_dims": true, + "ov_version": "2025.1", + "reuse_cache": true + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/deepseek" +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_ov_config.json.config b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_ov_config.json.config new file mode 100644 index 00000000..d39f9f91 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_ov_config.json.config @@ -0,0 +1,153 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "deepseek/openvino/DeepSeek-R1-Distill-Qwen-1.5B_context_ov_dynamic_sym_gs128_bkp_int8_sym_r1.json", + "isLLM": true, + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ] + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": { + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_qnn_config.json b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_qnn_config.json new file mode 100644 index 00000000..616a0e74 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_qnn_config.json @@ -0,0 +1,132 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" + }, + "systems": { + "qnn_system": { + "type": "PythonEnvironment", + "python_environment_path": "/path/to/qnn/env/bin", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 4, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64 + }, + "cb": { + "type": "EPContextBinaryGenerator", + "provider_options": { + "htp_performance_mode": "burst", + "htp_graph_finalization_optimization_mode": "3", + "soc_model": "60" + }, + "session_options": { + "intra_op_num_threads": 2, + "inter_op_num_threads": 1 + }, + "weight_sharing": true + }, + "cp": { + "type": "ComposeOnnxModels" + } + }, + "target": "qnn_system", + "log_severity_level": 1, + "output_dir": "model/deepseek", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_qnn_config.json.config b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_qnn_config.json.config new file mode 100644 index 00000000..032429d1 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_qnn_config.json.config @@ -0,0 +1,197 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "phi3_5/qnn_config.json", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isQNNLLM": true, + "isGPURequired": true, + "runtimeOverwrite": { + "autoGenerated": true, + "pyEnvPath": "systems.qnn_system.python_environment_path", + "executeEp": "CUDAExecutionProvider", + "evaluateUsedInExecute": true + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "pyEnvRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_vitis_ai_config.json b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_vitis_ai_config.json new file mode 100644 index 00000000..e4e30711 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_vitis_ai_config.json @@ -0,0 +1,134 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 1, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "int4", + "quant_type": "QuaRot" + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64, + "group_session_options": { + "log_id": "onnxruntime-genai", + "provider_options": [ + { + "VitisAI": {} + } + ], + "graph_optimization_level": "ORT_ENABLE_ALL" + } + } + }, + "target": "local_system", + "log_severity_level": 1, + "output_dir": "model/deepseek", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_vitis_ai_config.json.config b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_vitis_ai_config.json.config new file mode 100644 index 00000000..f6624c83 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/deepseek_vitis_ai_config.json.config @@ -0,0 +1,191 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "phi3_5/qdq_config_vitis_ai.json", + "isLLM": true, + "evalRuntime": "AMDNPU", + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "runtimeOverwrite": { + "executeEp": "CUDAExecutionProvider" + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/inference_model.json b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/inference_model.json new file mode 100644 index 00000000..cf831c1e --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/inference_model.json @@ -0,0 +1,31 @@ +{ + "Name": "DeepSeek-R1-Distill-Qwen-1.5B", + "PromptTemplate": { + "assistant": "{Content}", + "prompt": "<|User|>{Content}<|Assistant|>" + }, + "ParameterSchema": { + "enabled": [ + { + "name": "max_tokens", + "default": 512 + }, + { + "name": "temperature", + "default": 0.6 + }, + { + "name": "top_p", + "default": 0.9 + }, + { + "name": "top_k", + "default": 5 + }, + { + "name": "random_seed", + "default": 5687 + } + ] + } +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/inference_sample.ipynb b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/inference_sample.ipynb new file mode 100644 index 00000000..67a72436 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/inference_sample.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = 'Who is Isaac Newton?'\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "model_folder = \"./model\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime_genai as og\n", + "import json\n", + "import time\n", + "from pathlib import Path\n", + "\n", + "def get_session_options(obj):\n", + " if type(obj) is dict:\n", + " for k, v in obj.items():\n", + " if k == \"session_options\":\n", + " yield v\n", + " else:\n", + " for x in get_session_options(v):\n", + " yield x\n", + " elif type(obj) is list:\n", + " for v in obj:\n", + " for x in get_session_options(v):\n", + " yield x\n", + "\n", + "\n", + "def remove_provider_options(model_path):\n", + " genai_config_path = Path(model_path) / \"genai_config.json\"\n", + " data = json.loads(genai_config_path.read_text())\n", + " for session_option in get_session_options(data):\n", + " if 'provider_options' in session_option:\n", + " session_option['provider_options'] = [{k: dict() for k in opts.keys()} for opts in session_option['provider_options']]\n", + "\n", + " json.dump(data, genai_config_path.open(\"w\"), indent=4)\n", + "\n", + "if ExecutionProvider == \"QNNExecutionProvider\":\n", + " remove_provider_options(model_folder)\n", + "\n", + "# Load the base model and tokenizer\n", + "model = og.Model(model_folder)\n", + "tokenizer = og.Tokenizer(model)\n", + "tokenizer_stream = tokenizer.create_stream()\n", + "\n", + "# Set the max length to something sensible by default,\n", + "# since otherwise it will be set to the entire context length\n", + "search_options = {}\n", + "search_options[\"max_length\"] = 200\n", + "\n", + "chat_template = \"<|User|>{input}<|Assistant|>\"\n", + "\n", + "# Generate prompt (prompt template + input)\n", + "prompt = f\"{chat_template.format(input=text)}\"\n", + "\n", + "# Encode the prompt using the tokenizer\n", + "input_tokens = tokenizer.encode(prompt)\n", + "\n", + "# Create params and generator\n", + "params = og.GeneratorParams(model)\n", + "params.set_search_options(**search_options)\n", + "generator = og.Generator(model, params)\n", + "\n", + "# Append input tokens to the generator\n", + "generator.append_tokens(input_tokens)\n", + "\n", + "print(\"\")\n", + "print(\"Output: \", end=\"\", flush=True)\n", + "\n", + "token_times = []\n", + "\n", + "# Stream the output\n", + "while not generator.is_done():\n", + " start_time = time.time()\n", + " generator.generate_next_token()\n", + " end_time = time.time()\n", + " \n", + " # Record the time for this token generation\n", + " token_time = end_time - start_time\n", + " token_times.append(token_time)\n", + "\n", + " new_token = generator.get_next_tokens()[0]\n", + " print(tokenizer_stream.decode(new_token), end=\"\", flush=True)\n", + "\n", + "print()\n", + "\n", + "# Calculate and display timing statistics\n", + "if token_times:\n", + " total_tokens = len(token_times)\n", + " avg_time = sum(token_times) / total_tokens\n", + " \n", + " print(f\"Total tokens generated: {total_tokens}\")\n", + " print(f\"Average time per token: {avg_time:.4f} seconds\")\n", + " print(f\"Tokens per second: {total_tokens / sum(token_times):.2f}\")\n", + "\n", + "del generator\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/info.yml b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/info.yml new file mode 100644 index 00000000..7c05c28d --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/info.yml @@ -0,0 +1,20 @@ +keywords: + aitk +arch: deepseek +recipes: + - file: "deepseek_qnn_config.json" + device: npu + ep: QNNExecutionProvider + - file: "deepseek_vitis_ai_config.json" + device: npu + ep: VitisAIExecutionProvider + - file: "deepseek_ov_config.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "deepseek_dml_config.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" + version: 1 diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/model_project.config b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/model_project.config new file mode 100644 index 00000000..dab152a5 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/model_project.config @@ -0,0 +1,24 @@ +{ + "workflows": [ + { + "file": "deepseek_qnn_config.json", + "templateName": "deepseek_qnn_config" + }, + { + "file": "deepseek_vitis_ai_config.json", + "templateName": "deepseek_vitis_ai_config" + }, + { + "file": "deepseek_ov_config.json", + "templateName": "deepseek_ov_config" + }, + { + "file": "deepseek_dml_config.json", + "templateName": "deepseek_dml_config" + } + ], + "modelInfo": { + "id": "huggingface/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "version": 1 + } +} diff --git a/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/requirements.txt b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/requirements.txt new file mode 100644 index 00000000..7af84714 --- /dev/null +++ b/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk/requirements.txt @@ -0,0 +1,4 @@ +# This file will be installed together with AITK runtime requirements +# For the full requirements, see AITK +datasets +optimum diff --git a/google-bert-bert-base-multilingual-cased/aitk/.gitignore b/google-bert-bert-base-multilingual-cased/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/google-bert-bert-base-multilingual-cased/aitk/README.md b/google-bert-bert-base-multilingual-cased/aitk/README.md new file mode 100644 index 00000000..e00d9063 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/README.md @@ -0,0 +1,22 @@ +# BERT Optimization + +This folder contains examples of BERT optimization using different workflows. + +- QDQ for Qualcomm NPU / AMD NPU +- OpenVINO for Intel NPU + +## BERT Quantization QDQ + +This workflow quantizes the model. It performs the pipeline: +- *HF Model-> ONNX Model ->Quantized Onnx Model* + +Config file: `bert-base-multilingual-cased_qdq.json` + +### Latency / Throughput + +| Model Version | Latency (ms/sample) | Throughput (token per second)| Dataset | +|-----------------------|----------------------|------------------------------|---------------| +| PyTorch FP32 | 1162 | 0.81 | facebook/xnli | +| ONNX INT8 (QDQ) | 590 | 1.75 | facebook/xnli | + +*Note: Latency can vary significantly depending on the hardware and system environment. The values provided here are for reference only and may not reflect performance on all devices.* diff --git a/google-bert-bert-base-multilingual-cased/aitk/_copy.json.config b/google-bert-bert-base-multilingual-cased/aitk/_copy.json.config new file mode 100644 index 00000000..ff27826d --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/_copy.json.config @@ -0,0 +1,18 @@ +{ + "copies": [ + { + "src": "bert-base-multilingual-cased_qdq_amd.json.config", + "dst": "bert-base-multilingual-cased_qdq_qnn.json.config", + "replacements": [ + { + "find": "bert/google_bert_qdq_vitis_ai.json", + "replace": "bert/google_bert_qdq.json" + }, + { + "find": "Convert to AMD NPU", + "replace": "Convert to Qualcomm NPU" + } + ] + } + ] +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_context_ov_static.json b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_context_ov_static.json new file mode 100644 index 00000000..ba5d70d8 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_context_ov_static.json @@ -0,0 +1,97 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google-bert/bert-base-multilingual-cased", + "task": "fill-mask" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "bert_base_multilingual_cased_dataset", + "data_name": "wikipedia", + "split": "train", + "max_samples": 300 + }, + "dataloader_config": { + "batch_size": 1, + "drop_last": true + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "latency", + "type": "latency", + "sub_types": [ + { "name": "avg", "priority": 1, "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } }, + { "name": "p90", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } } + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu", + "task": "feature-extraction" + } + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "input_shapes": [ + [ + 1, + 128 + ], + [ + 1, + 128 + ], + [ + 1, + 128 + ] + ], + "static": true + }, + "ov_quantize": { + "type": "OpenVINOQuantization", + "target_device": "npu", + "data_config": "quantize_data_config", + "model_type": "TRANSFORMER", + "user_script": "user_script.py", + "transform_fn": "custom_transform_func" + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "ov_version": "2025.1" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/bert-base-multilingual-cased_context_ov_static" +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_context_ov_static.json.config b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_context_ov_static.json.config new file mode 100644 index 00000000..801c3bc6 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_context_ov_static.json.config @@ -0,0 +1,182 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "bert/openvino/bert_base_multilingual_cased/bert-base-multilingual-cased_context_ov_static.json", + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "cpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "gpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "npu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikipedia" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikipedia" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.max_samples", + "template": { + "path": "data_configs[0].load_dataset_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_dml.json b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_dml.json new file mode 100644 index 00000000..72e4e129 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_dml.json @@ -0,0 +1,139 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google-bert/bert-base-multilingual-cased", + "task": "feature-extraction" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "facebook/xnli", + "subset": "en", + "split": "validation" + }, + "pre_process_data_config": { + "input_cols": [ + "premise" + ], + "padding": "max_length", + "max_length": 128, + "max_samples": 10 + }, + "dataloader_config": { + "batch_size": 1 + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 1, + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "bert", + "opt_level": 0, + "float16": true, + "use_gpu": true, + "keep_io_types": false, + "optimization_options": { + "enable_gelu": true, + "enable_layer_norm": true, + "enable_attention": true, + "use_multi_head_attention": true, + "enable_skip_layer_norm": false, + "enable_embed_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_bias_gelu": false, + "enable_gelu_approximation": false, + "enable_qordered_matmul": false, + "enable_shape_inference": true, + "enable_gemm_fast_gelu": false, + "enable_nhwc_conv": false, + "enable_group_norm": false, + "enable_bias_splitgelu": false, + "enable_packed_qkv": true, + "enable_packed_kv": true, + "enable_bias_add": false, + "enable_rotary_embeddings": true + }, + "save_as_external_data": true + } + }, + "host": "host_system", + "target": "target_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/google_bert", + "evaluate_input_model": false +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_dml.json.config b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_dml.json.config new file mode 100644 index 00000000..ca319b9d --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_dml.json.config @@ -0,0 +1,126 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Subset", + "tags": [ + "EvaluationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": "EvaluationDatasetSubset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_amd.json b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_amd.json new file mode 100644 index 00000000..7e5e9c73 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_amd.json @@ -0,0 +1,168 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google-bert/bert-base-multilingual-cased", + "task": "feature-extraction" + }, + "systems": { + "qnn_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "VitisAIExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantization_data_config", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "facebook/xnli", + "subset": "en", + "split": "validation" + }, + "pre_process_data_config": { + "input_cols": [ + "premise" + ], + "padding": "max_length", + "max_length": 128, + "max_samples": 10 + }, + "dataloader_config": { + "batch_size": 1 + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "facebook/xnli", + "subset": "en", + "split": "validation" + }, + "pre_process_data_config": { + "input_cols": [ + "premise" + ], + "padding": "max_length", + "max_length": 128, + "max_samples": 10 + }, + "dataloader_config": { + "batch_size": 1 + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 1, + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "bert", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "dynamic_shape_to_fixed": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "sequence_length" + ], + "dim_value": [ + 1, + 128 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue" + } + ] + }, + "OnnxQuantization": { + "type": "OnnxStaticQuantization", + "data_config": "quantization_data_config", + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "uint8", + "quant_type": "OnnxStaticQuantization" + } + }, + "host": "qnn_system", + "target": "qnn_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/google_bert", + "evaluate_input_model": false +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_amd.json.config b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_amd.json.config new file mode 100644 index 00000000..19476bf7 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_amd.json.config @@ -0,0 +1,273 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "bert/google_bert_qdq_vitis_ai.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "AMD NPU", + "CPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "VitisAIExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.OnnxQuantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Subset", + "tags": [ + "EvaluationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": { + "path": "data_configs[1].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": "EvaluationDatasetSubset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_qnn.json b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_qnn.json new file mode 100644 index 00000000..da4c6d4f --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_qnn.json @@ -0,0 +1,163 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google-bert/bert-base-multilingual-cased", + "task": "feature-extraction" + }, + "systems": { + "qnn_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantization_data_config", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "facebook/xnli", + "subset": "en", + "split": "validation" + }, + "pre_process_data_config": { + "input_cols": [ + "premise" + ], + "padding": "max_length", + "max_length": 128, + "max_samples": 10 + }, + "dataloader_config": { + "batch_size": 1 + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "facebook/xnli", + "subset": "en", + "split": "validation" + }, + "pre_process_data_config": { + "input_cols": [ + "premise" + ], + "padding": "max_length", + "max_length": 128, + "max_samples": 10 + }, + "dataloader_config": { + "batch_size": 1 + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 1, + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "sequence_length" + ], + "dim_value": [ + 1, + 128 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue", + "replacement": -100.0 + }, + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "bert", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "OnnxQuantization": { + "type": "OnnxStaticQuantization", + "data_config": "quantization_data_config", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "host": "qnn_system", + "target": "qnn_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/google_bert", + "evaluate_input_model": false +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_qnn.json.config b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_qnn.json.config new file mode 100644 index 00000000..45b6868c --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_qdq_qnn.json.config @@ -0,0 +1,273 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "bert/google_bert_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.OnnxQuantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Subset", + "tags": [ + "EvaluationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": { + "path": "data_configs[1].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": "EvaluationDatasetSubset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_trtrtx.json b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_trtrtx.json new file mode 100644 index 00000000..5994f683 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_trtrtx.json @@ -0,0 +1,128 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google-bert/bert-base-multilingual-cased", + "task": "feature-extraction" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "NvTensorRTRTXExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "xnli", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "facebook/xnli", + "subset": "en", + "split": "validation" + }, + "pre_process_data_config": { + "input_cols": [ + "premise" + ], + "padding": "max_length", + "max_length": 128, + "max_samples": 10 + }, + "dataloader_config": { + "batch_size": 1 + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "latency", + "type": "latency", + "data_config": "xnli", + "sub_types": [ + { + "name": "avg", + "priority": 1, + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "xnli", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + }, + "dynamic_shape_to_fixed": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "sequence_length" + ], + "dim_value": [ + 1, + 128 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "save_as_external_data": true, + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue" + } + ] + }, + "session_params_tuning": { + "type": "OrtSessionParamsTuning", + "io_bind": false, + "data_config": "xnli" + } + }, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/google_bert_trtrtx", + "log_severity_level": 0, + "evaluate_input_model": false +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_trtrtx.json.config b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_trtrtx.json.config new file mode 100644 index 00000000..90a60833 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/bert-base-multilingual-cased_trtrtx.json.config @@ -0,0 +1,125 @@ +{ + "name": "Convert to NVIDIA TRT for RTX", + "oliveFile": "bert/google_bert_trtrtx.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "NVIDIA TensorRT for RTX", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "NvTensorRTRTXExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "facebook/xnli" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Subset", + "tags": [ + "EvaluationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "en", + "all_languages" + ], + "template": "EvaluationDatasetSubset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/inference_sample.ipynb b/google-bert-bert-base-multilingual-cased/aitk/inference_sample.ipynb new file mode 100644 index 00000000..0ff50f9d --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/inference_sample.ipynb @@ -0,0 +1,150 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "if ExecutionProvider == \"OpenVINOExecutionProvider\":\n", + " onnx_model_path = \"./model/openvino_model_st_quant.onnx\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "inputs = \"This is an example sentence.\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import torch.nn.functional as F\n", + "\n", + "from transformers import AutoModel, AutoTokenizer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def mean_pooling(model_output, attention_mask):\n", + " token_embeddings = torch.tensor(model_output[0])\n", + " input_mask_expanded = attention_mask.unsqueeze(-1).expand_as(token_embeddings).float()\n", + " return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tokenizer = AutoTokenizer.from_pretrained('google-bert/bert-base-multilingual-cased')\n", + "encoded_input = tokenizer(\n", + " inputs,\n", + " padding=\"max_length\",\n", + " max_length=128,\n", + " truncation=True,\n", + " add_special_tokens=True,\n", + " return_tensors=\"pt\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model wirh QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "input_ids = encoded_input[\"input_ids\"]\n", + "attention_mask = encoded_input[\"attention_mask\"]\n", + "token_type_ids = encoded_input[\"token_type_ids\"]\n", + "inputs = {\n", + " \"input_ids\": input_ids.long().cpu().numpy(),\n", + " \"attention_mask\": attention_mask.long().cpu().numpy(),\n", + " \"token_type_ids\": token_type_ids.long().cpu().numpy()\n", + "}\n", + "\n", + "outputs = session.run(None, inputs)\n", + "embeds_1 = mean_pooling(outputs, encoded_input['attention_mask'])\n", + "embeds_1 = F.normalize(embeds_1, p=2, dim=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get text embedding from orinal model, as ground truth.\n", + "model = AutoModel.from_pretrained('google-bert/bert-base-multilingual-cased').eval()\n", + "with torch.no_grad():\n", + " outputs = model(**encoded_input)\n", + " embeds_2 = mean_pooling(outputs, encoded_input['attention_mask'])\n", + " embeds_2 = F.normalize(embeds_2, p=2, dim=1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "similarity = F.cosine_similarity(embeds_1, embeds_2).item()\n", + "print(\"Similarity: \", similarity)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/info.yml b/google-bert-bert-base-multilingual-cased/aitk/info.yml new file mode 100644 index 00000000..c5771102 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/info.yml @@ -0,0 +1,23 @@ +keywords: + aitk +arch: bert +recipes: + - file: "bert-base-multilingual-cased_qdq_qnn.json" + device: npu + ep: QNNExecutionProvider + - file: "bert-base-multilingual-cased_qdq_amd.json" + device: npu + ep: VitisAIExecutionProvider + - file: "bert-base-multilingual-cased_context_ov_static.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "bert-base-multilingual-cased_trtrtx.json" + device: gpu + ep: NvTensorRTRTXExecutionProvider + - file: "bert-base-multilingual-cased_dml.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/google-bert/bert-base-multilingual-cased" + version: 1 diff --git a/google-bert-bert-base-multilingual-cased/aitk/model_project.config b/google-bert-bert-base-multilingual-cased/aitk/model_project.config new file mode 100644 index 00000000..41846e12 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/model_project.config @@ -0,0 +1,28 @@ +{ + "workflows": [ + { + "file": "bert-base-multilingual-cased_qdq_qnn.json", + "templateName": "bert-base-multilingual-cased_qdq_qnn" + }, + { + "file": "bert-base-multilingual-cased_qdq_amd.json", + "templateName": "bert-base-multilingual-cased_qdq_amd" + }, + { + "file": "bert-base-multilingual-cased_context_ov_static.json", + "templateName": "bert-base-multilingual-cased_context_ov_static" + }, + { + "file": "bert-base-multilingual-cased_trtrtx.json", + "templateName": "bert-base-multilingual-cased_trtrtx" + }, + { + "file": "bert-base-multilingual-cased_dml.json", + "templateName": "bert-base-multilingual-cased_dml" + } + ], + "modelInfo": { + "id": "huggingface/google-bert/bert-base-multilingual-cased", + "version": 1 + } +} diff --git a/google-bert-bert-base-multilingual-cased/aitk/requirements.txt b/google-bert-bert-base-multilingual-cased/aitk/requirements.txt new file mode 100644 index 00000000..b02be515 --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/requirements.txt @@ -0,0 +1,5 @@ +# This file will be installed together with AITK runtime requirements +# For the full requirements, see AITK +olive-ai +datasets +optimum diff --git a/google-bert-bert-base-multilingual-cased/aitk/user_script.py b/google-bert-bert-base-multilingual-cased/aitk/user_script.py new file mode 100644 index 00000000..f7442c2f --- /dev/null +++ b/google-bert-bert-base-multilingual-cased/aitk/user_script.py @@ -0,0 +1,83 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Intel Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- +import datasets +import numpy as np +import torch +from transformers import BertTokenizer + +from olive.data.registry import Registry + +# ------------------------------------------------------------------------- +# Common Dataset +# ------------------------------------------------------------------------- + +seed = 0 +# seed everything to 0 for reproducibility, https://pytorch.org/docs/stable/notes/randomness.html +# do not set random seed and np.random.seed for aml test, since it will cause aml job name conflict +torch.manual_seed(seed) +# the following are needed only for GPU +torch.cuda.manual_seed(seed) +torch.backends.cudnn.deterministic = True +torch.backends.cudnn.benchmark = False + +# set max sequence length +MAX_SEQ_LENGTH = 128 + +# define the tokenizer +tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-multilingual-cased") +VOCAB_SIZE = len(tokenizer) + +# set default input +default_input = torch.ones(1, MAX_SEQ_LENGTH, dtype=torch.int64) + +# define model inputs +model_inputs = { + "input_ids": default_input, + "attention_mask": default_input, + "token_type_ids": default_input, +} + +# capture input names +INPUT_NAMES = list(model_inputs) + + +@Registry.register_dataset() +def bert_base_multilingual_cased_dataset(data_name, split, max_samples): + # load the raw wikipedia dataset for tuning. Load just 300 examples for speed. + raw_dataset = datasets.load_dataset(data_name, "20220301.en", split=f"{split}[:{max_samples}]", trust_remote_code=True) + + def _preprocess_fn(examples): + return tokenizer( + examples["text"], + padding="max_length", + max_length=MAX_SEQ_LENGTH, + truncation=True, + ) + + # preprocess the dataset + return raw_dataset.map(_preprocess_fn, batched=True, batch_size=1) + + +def custom_transform_func(data_item): + return { + name: np.asarray([np.array([g.flatten() for g in data_item[name]]).flatten()], dtype=np.int64) + for name in INPUT_NAMES + } + + +def custom_example_func(): + vocab_size = VOCAB_SIZE + batch_size = 1 + sequence_length = MAX_SEQ_LENGTH + + input_ids = torch.randint(0, vocab_size, (batch_size, sequence_length)) + + # Generate random attention_mask (1s for actual tokens, 0s for padding) + attention_mask = default_input + + # Generate random token_type_ids (0 for sentence 1, 1 for sentence 2) + token_type_ids = default_input + + return [input_ids, attention_mask, token_type_ids] diff --git a/google-vit-base-patch16-224/aitk/.gitignore b/google-vit-base-patch16-224/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/google-vit-base-patch16-224/aitk/README.md b/google-vit-base-patch16-224/aitk/README.md new file mode 100644 index 00000000..f6d32027 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/README.md @@ -0,0 +1,14 @@ +# Vision Transformer (ViT) Optimization + +This folder contains examples of VIT optimization using different workflows. + +- QDQ for Qualcomm NPU / AMD NPU +- OpenVINO for Intel NPU + +## Optimization Workflows + +### ViT optimization with qdq + +This example performs ViT optimization in one workflow. It performs the optimization pipeline: + +- *Huggingface Model -> Onnx Model -> Quantized Onnx Model* diff --git a/google-vit-base-patch16-224/aitk/_copy.json.config b/google-vit-base-patch16-224/aitk/_copy.json.config new file mode 100644 index 00000000..6a948f91 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/_copy.json.config @@ -0,0 +1,42 @@ +{ + "copies": [ + { + "src": "vit-base-patch16-224_qdq_amd.json.config", + "dst": "vit-base-patch16-224_qdq_qnn.json.config", + "replacements": [ + { + "find": "vit/vit_qdq_vitis_ai.json", + "replace": "vit/vit_qdq.json" + }, + { + "find": "Convert to AMD NPU", + "replace": "Convert to Qualcomm NPU" + } + ] + }, + { + "src": "inference_sample.ipynb", + "dst": "vit-base-patch16-224_dml_inference_sample.ipynb", + "replacements": [ + { + "find": "QNNExecutionProvider", + "replace": "DmlExecutionProvider" + }, + { + "find": "input_name: image", + "replace": "input_name: image.astype(np.float16)" + } + ] + }, + { + "src": "vit-base-patch16-224_dml_inference_sample.ipynb", + "dst": "vit-base-patch16-224_trtrtx_inference_sample.ipynb", + "replacements": [ + { + "find": "DmlExecutionProvider", + "replace": "NvTensorRTRTXExecutionProvider" + } + ] + } + ] +} diff --git a/google-vit-base-patch16-224/aitk/inference_sample.ipynb b/google-vit-base-patch16-224/aitk/inference_sample.ipynb new file mode 100644 index 00000000..650f381d --- /dev/null +++ b/google-vit-base-patch16-224/aitk/inference_sample.ipynb @@ -0,0 +1,209 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "if ExecutionProvider == \"OpenVINOExecutionProvider\":\n", + " onnx_model_path = \"./model/ov_model_st_quant.onnx\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import onnxruntime as ort\n", + "import time\n", + "import torch\n", + "import torchvision.transforms as transforms\n", + "from datasets import load_dataset\n", + "from transformers import ViTFeatureExtractor, ViTForImageClassification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "num_samples = 256" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load datasets\n", + "\n", + "feature_extractor = ViTFeatureExtractor.from_pretrained(\"google/vit-base-patch16-224\")\n", + "preprocess = transforms.Compose([\n", + " transforms.Lambda(lambda img: img.convert(\"RGB\")),\n", + " transforms.Resize((224, 224)),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),\n", + "])\n", + "\n", + "def imageTransform(example):\n", + " example[\"image\"] = preprocess(example[\"image\"])\n", + " return example\n", + "datasetStream = load_dataset(\"timm/mini-imagenet\", split=\"validation\", streaming=True, trust_remote_code=True)\n", + "iterable_dataset = iter(datasetStream)\n", + "selected_samples = [next(iterable_dataset) for _ in range(num_samples)]\n", + "selected_samples = list(map(imageTransform, selected_samples))\n", + "\n", + "def get_imagenet_label_map():\n", + " import json\n", + " from pathlib import Path\n", + " cache_file = Path(f\"../../cache/data/imagenet_class_index.json\")\n", + " if not cache_file.exists():\n", + " import requests \n", + " imagenet_class_index_url = (\n", + " \"https://raw.githubusercontent.com/pytorch/vision/main/gallery/assets/imagenet_class_index.json\"\n", + " )\n", + " response = requests.get(imagenet_class_index_url)\n", + " response.raise_for_status() # Ensure the request was successful\n", + " content = response.json()\n", + " cache_file.parent.resolve().mkdir(parents=True, exist_ok=True)\n", + " with open(cache_file, \"w\") as f:\n", + " json.dump(content, f)\n", + " else:\n", + " with open(cache_file) as f:\n", + " content = json.loads(f.read())\n", + "\n", + " return {v[0]: int(k) for k, v in content.items()}\n", + "\n", + "label_map = get_imagenet_label_map()\n", + "label_names = datasetStream.features[\"label\"].names\n", + "\n", + "def mini_to_imagenet_label(mini_label):\n", + " class_name = label_names[mini_label]\n", + " return label_map[class_name]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Original model metrics\n", + "\n", + "def evaluate_torch(model, selected_samples, device):\n", + " model.eval()\n", + " correct, total = 0, 0\n", + " latencies = []\n", + " with torch.no_grad():\n", + " for example in selected_samples:\n", + " image = example[\"image\"].unsqueeze(0).to(device)\n", + " label = torch.tensor(example[\"label\"]).to(device)\n", + " label = mini_to_imagenet_label(label.item())\n", + " \n", + " start_time = time.time()\n", + " output = model(image)\n", + " end_time = time.time()\n", + " \n", + " latencies.append((end_time - start_time))\n", + " pred = torch.argmax(output.logits, dim=1)\n", + " correct += (pred == label).sum().item()\n", + " total += 1\n", + " \n", + " accuracy = correct / total\n", + " avg_latency = np.mean(latencies)\n", + " return accuracy, avg_latency\n", + "\n", + "device = torch.device(\"cpu\")\n", + "model = ViTForImageClassification.from_pretrained(\"google/vit-base-patch16-224\").to(device)\n", + "accuracy, avg_latency = evaluate_torch(model, selected_samples, device)\n", + "\n", + "print(f\"Original Model Accuracy: {accuracy * 100:.2f}%\")\n", + "print(f\"Original Model Average Latency Per Image: {avg_latency * 1000:.2f} ms\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Quantized model metrics\n", + "\n", + "def evaluate_onnx(session, selected_samples):\n", + " correct, total = 0, 0\n", + " latencies = []\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name\n", + "\n", + " for example in selected_samples:\n", + " image = np.expand_dims(example[\"image\"], axis=0)\n", + " label = example[\"label\"]\n", + " label = mini_to_imagenet_label(label)\n", + " \n", + " start_time = time.time()\n", + " output = session.run([output_name], {input_name: image})[0]\n", + " end_time = time.time()\n", + " \n", + " latencies.append((end_time - start_time))\n", + " pred = np.argmax(output, axis=1)[0]\n", + " correct += (pred == label)\n", + " total += 1\n", + " \n", + " accuracy = correct / total\n", + " avg_latency = np.mean(latencies)\n", + " return accuracy, avg_latency\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model wirh QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "accuracy, avg_latency = evaluate_onnx(session, selected_samples)\n", + "\n", + "print(f\"Quantized Model Accuracy: {accuracy * 100:.2f}%\")\n", + "print(f\"Quantized Model Average Latency Per Image: {avg_latency * 1000:.2f} ms\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python-WCR-win32-x64-3.12.9", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/google-vit-base-patch16-224/aitk/info.yml b/google-vit-base-patch16-224/aitk/info.yml new file mode 100644 index 00000000..26289b59 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/info.yml @@ -0,0 +1,23 @@ +keywords: + aitk +arch: vit +recipes: + - file: "vit-base-patch16-224_qdq_qnn.json" + device: npu + ep: QNNExecutionProvider + - file: "vit-base-patch16-224_qdq_amd.json" + device: npu + ep: VitisAIExecutionProvider + - file: "vit_base_patch16_224_context_ov_static.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "vit-base-patch16-224_trtrtx.json" + device: gpu + ep: NvTensorRTRTXExecutionProvider + - file: "vit-base-patch16-224_dml.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/google/vit-base-patch16-224" + version: 1 diff --git a/google-vit-base-patch16-224/aitk/model_project.config b/google-vit-base-patch16-224/aitk/model_project.config new file mode 100644 index 00000000..7ec62cd3 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/model_project.config @@ -0,0 +1,28 @@ +{ + "workflows": [ + { + "file": "vit-base-patch16-224_qdq_qnn.json", + "templateName": "vit-base-patch16-224_qdq_qnn" + }, + { + "file": "vit-base-patch16-224_qdq_amd.json", + "templateName": "vit-base-patch16-224_qdq_amd" + }, + { + "file": "vit_base_patch16_224_context_ov_static.json", + "templateName": "vit_base_patch16_224_context_ov_static" + }, + { + "file": "vit-base-patch16-224_trtrtx.json", + "templateName": "vit-base-patch16-224_trtrtx" + }, + { + "file": "vit-base-patch16-224_dml.json", + "templateName": "vit-base-patch16-224_dml" + } + ], + "modelInfo": { + "id": "huggingface/google/vit-base-patch16-224", + "version": 1 + } +} diff --git a/google-vit-base-patch16-224/aitk/requirements.txt b/google-vit-base-patch16-224/aitk/requirements.txt new file mode 100644 index 00000000..8992d27f --- /dev/null +++ b/google-vit-base-patch16-224/aitk/requirements.txt @@ -0,0 +1,5 @@ +# This file will be installed together with AITK runtime requirements +# For the full requirements, see AITK +olive-ai +datasets +torchvision diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224.py b/google-vit-base-patch16-224/aitk/vit-base-patch16-224.py new file mode 100644 index 00000000..92751ca1 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224.py @@ -0,0 +1,100 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- +from logging import getLogger +from pathlib import Path + +import numpy as np +import torchvision.transforms as transforms +import transformers +from torch import from_numpy +from torch.utils.data import Dataset + +from olive.data.registry import Registry + +logger = getLogger(__name__) + +def get_imagenet_label_map(): + import json + cache_file = Path(f"./cache/data/imagenet_class_index.json") + if not cache_file.exists(): + import requests + imagenet_class_index_url = ( + "https://raw.githubusercontent.com/pytorch/vision/main/gallery/assets/imagenet_class_index.json" + ) + response = requests.get(imagenet_class_index_url) + response.raise_for_status() # Ensure the request was successful + content = response.json() + cache_file.parent.resolve().mkdir(parents=True, exist_ok=True) + with open(cache_file, "w") as f: + json.dump(content, f) + else: + with open(cache_file) as f: + content = json.loads(f.read()) + + return {v[0]: int(k) for k, v in content.items()} + +def adapt_label_for_mini_imagenet(labels: list, label_names: list): + label_map = get_imagenet_label_map() + return [label_map[label_names[x]] for x in labels] + +class ImagenetDataset(Dataset): + def __init__(self, data): + self.images = from_numpy(data["images"]) + self.labels = from_numpy(data["labels"]) + + def __len__(self): + return min(len(self.images), len(self.labels)) + + def __getitem__(self, idx): + return {"pixel_values": self.images[idx]}, self.labels[idx] + + +@Registry.register_post_process() +def dataset_post_process(output): + return ( + output.logits.argmax(axis=1) + if isinstance(output, transformers.modeling_outputs.ModelOutput) + else output.argmax(axis=1) + ) + +from transformers import AutoImageProcessor +processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224", use_fast=True) + +@Registry.register_pre_process() +def dataset_pre_process(output_data, **kwargs): + shuffle = kwargs.get("shuffle", True) + if shuffle: + seed = kwargs.get("seed", 42) + output_data = output_data.shuffle(seed=seed) + cache_key = kwargs.get("cache_key") + size = kwargs.get("size", 256) + cache_file = None + if cache_key: + cache_file = Path(f"./cache/data/{cache_key}_{output_data.info.dataset_name}_{size}.npz") + if cache_file.exists(): + with np.load(Path(cache_file)) as data: + return ImagenetDataset(data) + + labels = [] + images = [] + for i, sample in enumerate(output_data): + if i >= size: + break + image = sample["image"] + label = sample["label"] + image = image.convert("RGB") + image = processor(image)["pixel_values"][0] + images.append(image) + labels.append(label) + + if(output_data.info.dataset_name == "mini-imagenet"): + labels = adapt_label_for_mini_imagenet(labels, output_data.features["label"].names) + result_data = ImagenetDataset({"images": np.array(images), "labels": np.array(labels)}) + + if cache_file: + cache_file.parent.resolve().mkdir(parents=True, exist_ok=True) + np.savez(cache_file, images=np.array(images), labels=np.array(labels)) + + return result_data diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml.json b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml.json new file mode 100644 index 00000000..14b49d34 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml.json @@ -0,0 +1,143 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google/vit-base-patch16-224", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "output" + ] + } + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1000 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "vit", + "opt_level": 0, + "float16": true, + "use_gpu": true, + "keep_io_types": false, + "optimization_options": { + "enable_gelu": true, + "enable_layer_norm": true, + "enable_attention": true, + "use_multi_head_attention": true, + "enable_skip_layer_norm": false, + "enable_embed_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_bias_gelu": false, + "enable_gelu_approximation": false, + "enable_qordered_matmul": false, + "enable_shape_inference": true, + "enable_gemm_fast_gelu": false, + "enable_nhwc_conv": false, + "enable_group_norm": false, + "enable_bias_splitgelu": false, + "enable_packed_qkv": true, + "enable_packed_kv": true, + "enable_bias_add": false, + "enable_rotary_embeddings": true + }, + "save_as_external_data": true + } + }, + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "host": "host_system", + "target": "target_system", + "cache_dir": "cache", + "output_dir": "model/vit" +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml.json.config b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml.json.config new file mode 100644 index 00000000..7216c02e --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml.json.config @@ -0,0 +1,107 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml_inference_sample.ipynb b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml_inference_sample.ipynb new file mode 100644 index 00000000..369bdc7b --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_dml_inference_sample.ipynb @@ -0,0 +1,209 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"DmlExecutionProvider\"\n", + "if ExecutionProvider == \"OpenVINOExecutionProvider\":\n", + " onnx_model_path = \"./model/ov_model_st_quant.onnx\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import onnxruntime as ort\n", + "import time\n", + "import torch\n", + "import torchvision.transforms as transforms\n", + "from datasets import load_dataset\n", + "from transformers import ViTFeatureExtractor, ViTForImageClassification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "num_samples = 256" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load datasets\n", + "\n", + "feature_extractor = ViTFeatureExtractor.from_pretrained(\"google/vit-base-patch16-224\")\n", + "preprocess = transforms.Compose([\n", + " transforms.Lambda(lambda img: img.convert(\"RGB\")),\n", + " transforms.Resize((224, 224)),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),\n", + "])\n", + "\n", + "def imageTransform(example):\n", + " example[\"image\"] = preprocess(example[\"image\"])\n", + " return example\n", + "datasetStream = load_dataset(\"timm/mini-imagenet\", split=\"validation\", streaming=True, trust_remote_code=True)\n", + "iterable_dataset = iter(datasetStream)\n", + "selected_samples = [next(iterable_dataset) for _ in range(num_samples)]\n", + "selected_samples = list(map(imageTransform, selected_samples))\n", + "\n", + "def get_imagenet_label_map():\n", + " import json\n", + " from pathlib import Path\n", + " cache_file = Path(f\"../../cache/data/imagenet_class_index.json\")\n", + " if not cache_file.exists():\n", + " import requests \n", + " imagenet_class_index_url = (\n", + " \"https://raw.githubusercontent.com/pytorch/vision/main/gallery/assets/imagenet_class_index.json\"\n", + " )\n", + " response = requests.get(imagenet_class_index_url)\n", + " response.raise_for_status() # Ensure the request was successful\n", + " content = response.json()\n", + " cache_file.parent.resolve().mkdir(parents=True, exist_ok=True)\n", + " with open(cache_file, \"w\") as f:\n", + " json.dump(content, f)\n", + " else:\n", + " with open(cache_file) as f:\n", + " content = json.loads(f.read())\n", + "\n", + " return {v[0]: int(k) for k, v in content.items()}\n", + "\n", + "label_map = get_imagenet_label_map()\n", + "label_names = datasetStream.features[\"label\"].names\n", + "\n", + "def mini_to_imagenet_label(mini_label):\n", + " class_name = label_names[mini_label]\n", + " return label_map[class_name]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Original model metrics\n", + "\n", + "def evaluate_torch(model, selected_samples, device):\n", + " model.eval()\n", + " correct, total = 0, 0\n", + " latencies = []\n", + " with torch.no_grad():\n", + " for example in selected_samples:\n", + " image = example[\"image\"].unsqueeze(0).to(device)\n", + " label = torch.tensor(example[\"label\"]).to(device)\n", + " label = mini_to_imagenet_label(label.item())\n", + " \n", + " start_time = time.time()\n", + " output = model(image)\n", + " end_time = time.time()\n", + " \n", + " latencies.append((end_time - start_time))\n", + " pred = torch.argmax(output.logits, dim=1)\n", + " correct += (pred == label).sum().item()\n", + " total += 1\n", + " \n", + " accuracy = correct / total\n", + " avg_latency = np.mean(latencies)\n", + " return accuracy, avg_latency\n", + "\n", + "device = torch.device(\"cpu\")\n", + "model = ViTForImageClassification.from_pretrained(\"google/vit-base-patch16-224\").to(device)\n", + "accuracy, avg_latency = evaluate_torch(model, selected_samples, device)\n", + "\n", + "print(f\"Original Model Accuracy: {accuracy * 100:.2f}%\")\n", + "print(f\"Original Model Average Latency Per Image: {avg_latency * 1000:.2f} ms\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Quantized model metrics\n", + "\n", + "def evaluate_onnx(session, selected_samples):\n", + " correct, total = 0, 0\n", + " latencies = []\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name\n", + "\n", + " for example in selected_samples:\n", + " image = np.expand_dims(example[\"image\"], axis=0)\n", + " label = example[\"label\"]\n", + " label = mini_to_imagenet_label(label)\n", + " \n", + " start_time = time.time()\n", + " output = session.run([output_name], {input_name: image.astype(np.float16)})[0]\n", + " end_time = time.time()\n", + " \n", + " latencies.append((end_time - start_time))\n", + " pred = np.argmax(output, axis=1)[0]\n", + " correct += (pred == label)\n", + " total += 1\n", + " \n", + " accuracy = correct / total\n", + " avg_latency = np.mean(latencies)\n", + " return accuracy, avg_latency\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model wirh QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "accuracy, avg_latency = evaluate_onnx(session, selected_samples)\n", + "\n", + "print(f\"Quantized Model Accuracy: {accuracy * 100:.2f}%\")\n", + "print(f\"Quantized Model Average Latency Per Image: {avg_latency * 1000:.2f} ms\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python-WCR-win32-x64-3.12.9", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_amd.json b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_amd.json new file mode 100644 index 00000000..4e519509 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_amd.json @@ -0,0 +1,157 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google/vit-base-patch16-224", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "output" + ] + } + }, + "systems": { + "qnn_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "VitisAIExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantization_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 256, + "cache_key": "imagedata_quantization" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1000 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "device": "cpu", + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "use_dynamo_exporter": false + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "vit", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "OnnxQuantization": { + "type": "OnnxQuantization", + "data_config": "quantization_data_config", + "activation_type": "uint16", + "precision": "uint8", + "calibrate_method": "MinMax", + "quant_preprocess": true, + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "uint8", + "quant_type": "OnnxStaticQuantization" + } + }, + "host": "qnn_system", + "target": "qnn_system", + "evaluator": "common_evaluator", + "output_dir": "model/vit", + "evaluate_input_model": false, + "cache_dir": "cache" +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_amd.json.config b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_amd.json.config new file mode 100644 index 00000000..e217a030 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_amd.json.config @@ -0,0 +1,238 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "vit/vit_qdq_vitis_ai.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "AMD NPU", + "CPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "VitisAIExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.OnnxQuantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "device": "cpu", + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "use_dynamo_exporter": false + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.size", + "template": { + "path": "data_configs[1].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_qnn.json b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_qnn.json new file mode 100644 index 00000000..b6048eec --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_qnn.json @@ -0,0 +1,151 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google/vit-base-patch16-224", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "output" + ] + } + }, + "systems": { + "qnn_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantization_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 256, + "cache_key": "imagedata_quantization" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1000 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "vit", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "OnnxQuantization": { + "type": "OnnxQuantization", + "data_config": "quantization_data_config", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "host": "qnn_system", + "target": "qnn_system", + "evaluator": "common_evaluator", + "output_dir": "model/vit", + "evaluate_input_model": false, + "cache_dir": "cache" +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_qnn.json.config b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_qnn.json.config new file mode 100644 index 00000000..1a1e2951 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_qdq_qnn.json.config @@ -0,0 +1,235 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "vit/vit_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.OnnxQuantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.size", + "template": { + "path": "data_configs[1].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx.json b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx.json new file mode 100644 index 00000000..dd5972c5 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx.json @@ -0,0 +1,113 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google/vit-base-patch16-224", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "logits" + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "NvTensorRTRTXExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 256, + "cache_key": "imagedata_quantization" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "quantize_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1001 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "quantize_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "use_dynamo_exporter": false + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + }, + "session_params_tuning": { + "type": "OrtSessionParamsTuning", + "io_bind": false, + "data_config": "quantize_data_config" + } + }, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "output_dir": "model/vit-base-patch16-224", + "cache_dir": "cache", + "evaluate_input_model": false +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx.json.config b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx.json.config new file mode 100644 index 00000000..2d9d01c2 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx.json.config @@ -0,0 +1,106 @@ +{ + "name": "Convert to NVIDIA TRT for RTX", + "oliveFile": "vit/vit_trtrtx.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "NVIDIA TensorRT for RTX", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "NvTensorRTRTXExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx_inference_sample.ipynb b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx_inference_sample.ipynb new file mode 100644 index 00000000..b74e7976 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit-base-patch16-224_trtrtx_inference_sample.ipynb @@ -0,0 +1,209 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"NvTensorRTRTXExecutionProvider\"\n", + "if ExecutionProvider == \"OpenVINOExecutionProvider\":\n", + " onnx_model_path = \"./model/ov_model_st_quant.onnx\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import onnxruntime as ort\n", + "import time\n", + "import torch\n", + "import torchvision.transforms as transforms\n", + "from datasets import load_dataset\n", + "from transformers import ViTFeatureExtractor, ViTForImageClassification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "num_samples = 256" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Load datasets\n", + "\n", + "feature_extractor = ViTFeatureExtractor.from_pretrained(\"google/vit-base-patch16-224\")\n", + "preprocess = transforms.Compose([\n", + " transforms.Lambda(lambda img: img.convert(\"RGB\")),\n", + " transforms.Resize((224, 224)),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std),\n", + "])\n", + "\n", + "def imageTransform(example):\n", + " example[\"image\"] = preprocess(example[\"image\"])\n", + " return example\n", + "datasetStream = load_dataset(\"timm/mini-imagenet\", split=\"validation\", streaming=True, trust_remote_code=True)\n", + "iterable_dataset = iter(datasetStream)\n", + "selected_samples = [next(iterable_dataset) for _ in range(num_samples)]\n", + "selected_samples = list(map(imageTransform, selected_samples))\n", + "\n", + "def get_imagenet_label_map():\n", + " import json\n", + " from pathlib import Path\n", + " cache_file = Path(f\"../../cache/data/imagenet_class_index.json\")\n", + " if not cache_file.exists():\n", + " import requests \n", + " imagenet_class_index_url = (\n", + " \"https://raw.githubusercontent.com/pytorch/vision/main/gallery/assets/imagenet_class_index.json\"\n", + " )\n", + " response = requests.get(imagenet_class_index_url)\n", + " response.raise_for_status() # Ensure the request was successful\n", + " content = response.json()\n", + " cache_file.parent.resolve().mkdir(parents=True, exist_ok=True)\n", + " with open(cache_file, \"w\") as f:\n", + " json.dump(content, f)\n", + " else:\n", + " with open(cache_file) as f:\n", + " content = json.loads(f.read())\n", + "\n", + " return {v[0]: int(k) for k, v in content.items()}\n", + "\n", + "label_map = get_imagenet_label_map()\n", + "label_names = datasetStream.features[\"label\"].names\n", + "\n", + "def mini_to_imagenet_label(mini_label):\n", + " class_name = label_names[mini_label]\n", + " return label_map[class_name]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Original model metrics\n", + "\n", + "def evaluate_torch(model, selected_samples, device):\n", + " model.eval()\n", + " correct, total = 0, 0\n", + " latencies = []\n", + " with torch.no_grad():\n", + " for example in selected_samples:\n", + " image = example[\"image\"].unsqueeze(0).to(device)\n", + " label = torch.tensor(example[\"label\"]).to(device)\n", + " label = mini_to_imagenet_label(label.item())\n", + " \n", + " start_time = time.time()\n", + " output = model(image)\n", + " end_time = time.time()\n", + " \n", + " latencies.append((end_time - start_time))\n", + " pred = torch.argmax(output.logits, dim=1)\n", + " correct += (pred == label).sum().item()\n", + " total += 1\n", + " \n", + " accuracy = correct / total\n", + " avg_latency = np.mean(latencies)\n", + " return accuracy, avg_latency\n", + "\n", + "device = torch.device(\"cpu\")\n", + "model = ViTForImageClassification.from_pretrained(\"google/vit-base-patch16-224\").to(device)\n", + "accuracy, avg_latency = evaluate_torch(model, selected_samples, device)\n", + "\n", + "print(f\"Original Model Accuracy: {accuracy * 100:.2f}%\")\n", + "print(f\"Original Model Average Latency Per Image: {avg_latency * 1000:.2f} ms\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Quantized model metrics\n", + "\n", + "def evaluate_onnx(session, selected_samples):\n", + " correct, total = 0, 0\n", + " latencies = []\n", + " input_name = session.get_inputs()[0].name\n", + " output_name = session.get_outputs()[0].name\n", + "\n", + " for example in selected_samples:\n", + " image = np.expand_dims(example[\"image\"], axis=0)\n", + " label = example[\"label\"]\n", + " label = mini_to_imagenet_label(label)\n", + " \n", + " start_time = time.time()\n", + " output = session.run([output_name], {input_name: image.astype(np.float16)})[0]\n", + " end_time = time.time()\n", + " \n", + " latencies.append((end_time - start_time))\n", + " pred = np.argmax(output, axis=1)[0]\n", + " correct += (pred == label)\n", + " total += 1\n", + " \n", + " accuracy = correct / total\n", + " avg_latency = np.mean(latencies)\n", + " return accuracy, avg_latency\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model wirh QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "accuracy, avg_latency = evaluate_onnx(session, selected_samples)\n", + "\n", + "print(f\"Quantized Model Accuracy: {accuracy * 100:.2f}%\")\n", + "print(f\"Quantized Model Average Latency Per Image: {avg_latency * 1000:.2f} ms\")" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python-WCR-win32-x64-3.12.9", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/google-vit-base-patch16-224/aitk/vit_base_patch16_224_context_ov_static.json b/google-vit-base-patch16-224/aitk/vit_base_patch16_224_context_ov_static.json new file mode 100644 index 00000000..2deef015 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit_base_patch16_224_context_ov_static.json @@ -0,0 +1,144 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "google/vit-base-patch16-224", + "task": "image-classification" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantization_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 256, + "cache_key": "imagedata_quantization" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "vit-base-patch16-224.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1000 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "ov_convert": { + "type": "OpenVINOConversion", + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_model": "vit_base_patch16_224", + "compress_to_fp16": true + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "static": true + }, + "ov_quantize": { + "type": "OpenVINOQuantization", + "target_device": "npu", + "data_config": "quantization_data_config", + "model_type": "TRANSFORMER", + "extra_configs": [ + { + "advanced_quantization_parameters": { + "smooth_quant_alpha": 0.6 + } + } + ] + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "ov_version": "2025.1" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/vit_base_patch16_224_ov_static" +} diff --git a/google-vit-base-patch16-224/aitk/vit_base_patch16_224_context_ov_static.json.config b/google-vit-base-patch16-224/aitk/vit_base_patch16_224_context_ov_static.json.config new file mode 100644 index 00000000..0186b350 --- /dev/null +++ b/google-vit-base-patch16-224/aitk/vit_base_patch16_224_context_ov_static.json.config @@ -0,0 +1,217 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "vit/openvino/vit_base_patch16_224_context_ov_static.json", + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOConversion": "ov_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.ov_quantize.target_device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.ov_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.ov_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.size", + "template": { + "path": "data_configs[1].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/intel-bert-base-uncased-mrpc/LICENSE b/intel-bert-base-uncased-mrpc/LICENSE deleted file mode 100644 index 9b2c5698..00000000 --- a/intel-bert-base-uncased-mrpc/LICENSE +++ /dev/null @@ -1,237 +0,0 @@ ---- -title: Apache License 2.0 -spdx-id: Apache-2.0 -redirect_from: /licenses/apache/ -featured: true -hidden: false - -description: A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. - -how: Create a text file (typically named LICENSE or LICENSE.txt) in the root of your source code and copy the text of the license into the file. - -note: The Apache Software Foundation recommends taking the additional step of adding a boilerplate notice to the header of each source file. You can find the notice in the appendix at the very end of the license text. - -using: - Kubernetes: https://github.com/kubernetes/kubernetes/blob/master/LICENSE - PDF.js: https://github.com/mozilla/pdf.js/blob/master/LICENSE - Swift: https://github.com/apple/swift/blob/main/LICENSE.txt - -permissions: - - commercial-use - - modifications - - distribution - - patent-use - - private-use - -conditions: - - include-copyright - - document-changes - -limitations: - - trademark-use - - liability - - warranty - ---- - - Apache License - Version 2.0, January 2004 - http://www.apache.org/licenses/ - - TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION - - 1. Definitions. - - "License" shall mean the terms and conditions for use, reproduction, - and distribution as defined by Sections 1 through 9 of this document. - - "Licensor" shall mean the copyright owner or entity authorized by - the copyright owner that is granting the License. - - "Legal Entity" shall mean the union of the acting entity and all - other entities that control, are controlled by, or are under common - control with that entity. For the purposes of this definition, - "control" means (i) the power, direct or indirect, to cause the - direction or management of such entity, whether by contract or - otherwise, or (ii) ownership of fifty percent (50%) or more of the - outstanding shares, or (iii) beneficial ownership of such entity. - - "You" (or "Your") shall mean an individual or Legal Entity - exercising permissions granted by this License. - - "Source" form shall mean the preferred form for making modifications, - including but not limited to software source code, documentation - source, and configuration files. - - "Object" form shall mean any form resulting from mechanical - transformation or translation of a Source form, including but - not limited to compiled object code, generated documentation, - and conversions to other media types. - - "Work" shall mean the work of authorship, whether in Source or - Object form, made available under the License, as indicated by a - copyright notice that is included in or attached to the work - (an example is provided in the Appendix below). - - "Derivative Works" shall mean any work, whether in Source or Object - form, that is based on (or derived from) the Work and for which the - editorial revisions, annotations, elaborations, or other modifications - represent, as a whole, an original work of authorship. For the purposes - of this License, Derivative Works shall not include works that remain - separable from, or merely link (or bind by name) to the interfaces of, - the Work and Derivative Works thereof. - - "Contribution" shall mean any work of authorship, including - the original version of the Work and any modifications or additions - to that Work or Derivative Works thereof, that is intentionally - submitted to Licensor for inclusion in the Work by the copyright owner - or by an individual or Legal Entity authorized to submit on behalf of - the copyright owner. For the purposes of this definition, "submitted" - means any form of electronic, verbal, or written communication sent - to the Licensor or its representatives, including but not limited to - communication on electronic mailing lists, source code control systems, - and issue tracking systems that are managed by, or on behalf of, the - Licensor for the purpose of discussing and improving the Work, but - excluding communication that is conspicuously marked or otherwise - designated in writing by the copyright owner as "Not a Contribution." - - "Contributor" shall mean Licensor and any individual or Legal Entity - on behalf of whom a Contribution has been received by Licensor and - subsequently incorporated within the Work. - - 2. Grant of Copyright License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - copyright license to reproduce, prepare Derivative Works of, - publicly display, publicly perform, sublicense, and distribute the - Work and such Derivative Works in Source or Object form. - - 3. Grant of Patent License. Subject to the terms and conditions of - this License, each Contributor hereby grants to You a perpetual, - worldwide, non-exclusive, no-charge, royalty-free, irrevocable - (except as stated in this section) patent license to make, have made, - use, offer to sell, sell, import, and otherwise transfer the Work, - where such license applies only to those patent claims licensable - by such Contributor that are necessarily infringed by their - Contribution(s) alone or by combination of their Contribution(s) - with the Work to which such Contribution(s) was submitted. If You - institute patent litigation against any entity (including a - cross-claim or counterclaim in a lawsuit) alleging that the Work - or a Contribution incorporated within the Work constitutes direct - or contributory patent infringement, then any patent licenses - granted to You under this License for that Work shall terminate - as of the date such litigation is filed. - - 4. Redistribution. You may reproduce and distribute copies of the - Work or Derivative Works thereof in any medium, with or without - modifications, and in Source or Object form, provided that You - meet the following conditions: - - (a) You must give any other recipients of the Work or - Derivative Works a copy of this License; and - - (b) You must cause any modified files to carry prominent notices - stating that You changed the files; and - - (c) You must retain, in the Source form of any Derivative Works - that You distribute, all copyright, patent, trademark, and - attribution notices from the Source form of the Work, - excluding those notices that do not pertain to any part of - the Derivative Works; and - - (d) If the Work includes a "NOTICE" text file as part of its - distribution, then any Derivative Works that You distribute must - include a readable copy of the attribution notices contained - within such NOTICE file, excluding those notices that do not - pertain to any part of the Derivative Works, in at least one - of the following places: within a NOTICE text file distributed - as part of the Derivative Works; within the Source form or - documentation, if provided along with the Derivative Works; or, - within a display generated by the Derivative Works, if and - wherever such third-party notices normally appear. The contents - of the NOTICE file are for informational purposes only and - do not modify the License. You may add Your own attribution - notices within Derivative Works that You distribute, alongside - or as an addendum to the NOTICE text from the Work, provided - that such additional attribution notices cannot be construed - as modifying the License. - - You may add Your own copyright statement to Your modifications and - may provide additional or different license terms and conditions - for use, reproduction, or distribution of Your modifications, or - for any such Derivative Works as a whole, provided Your use, - reproduction, and distribution of the Work otherwise complies with - the conditions stated in this License. - - 5. Submission of Contributions. Unless You explicitly state otherwise, - any Contribution intentionally submitted for inclusion in the Work - by You to the Licensor shall be under the terms and conditions of - this License, without any additional terms or conditions. - Notwithstanding the above, nothing herein shall supersede or modify - the terms of any separate license agreement you may have executed - with Licensor regarding such Contributions. - - 6. Trademarks. This License does not grant permission to use the trade - names, trademarks, service marks, or product names of the Licensor, - except as required for reasonable and customary use in describing the - origin of the Work and reproducing the content of the NOTICE file. - - 7. Disclaimer of Warranty. Unless required by applicable law or - agreed to in writing, Licensor provides the Work (and each - Contributor provides its Contributions) on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or - implied, including, without limitation, any warranties or conditions - of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A - PARTICULAR PURPOSE. You are solely responsible for determining the - appropriateness of using or redistributing the Work and assume any - risks associated with Your exercise of permissions under this License. - - 8. Limitation of Liability. In no event and under no legal theory, - whether in tort (including negligence), contract, or otherwise, - unless required by applicable law (such as deliberate and grossly - negligent acts) or agreed to in writing, shall any Contributor be - liable to You for damages, including any direct, indirect, special, - incidental, or consequential damages of any character arising as a - result of this License or out of the use or inability to use the - Work (including but not limited to damages for loss of goodwill, - work stoppage, computer failure or malfunction, or any and all - other commercial damages or losses), even if such Contributor - has been advised of the possibility of such damages. - - 9. Accepting Warranty or Additional Liability. While redistributing - the Work or Derivative Works thereof, You may choose to offer, - and charge a fee for, acceptance of support, warranty, indemnity, - or other liability obligations and/or rights consistent with this - License. However, in accepting such obligations, You may act only - on Your own behalf and on Your sole responsibility, not on behalf - of any other Contributor, and only if You agree to indemnify, - defend, and hold each Contributor harmless for any liability - incurred by, or claims asserted against, such Contributor by reason - of your accepting any such warranty or additional liability. - - END OF TERMS AND CONDITIONS - - APPENDIX: How to apply the Apache License to your work. - - To apply the Apache License to your work, attach the following - boilerplate notice, with the fields enclosed by brackets "[]" - replaced with your own identifying information. (Don't include - the brackets!) The text should be enclosed in the appropriate - comment syntax for the file format. We also recommend that a - file or class name and description of purpose be included on the - same "printed page" as the copyright notice for easier - identification within third-party archives. - - Copyright [yyyy] [name of copyright owner] - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. diff --git a/intel-bert-base-uncased-mrpc/README.md b/intel-bert-base-uncased-mrpc/README.md index 186af082..6e89e05a 100644 --- a/intel-bert-base-uncased-mrpc/README.md +++ b/intel-bert-base-uncased-mrpc/README.md @@ -1,5 +1,5 @@ -# BERT Optimization -This folder contains examples of BERT optimization using different workflows. - -- `aitk` recipe folder contains recipes used by AI Toolkit. -- `oci` recipe folder contains recipes used by Olive CI. \ No newline at end of file +# BERT Optimization +This folder contains examples of BERT optimization using different workflows. + +- `aitk` recipe folder contains recipes used by AI Toolkit. +- `oci` recipe folder contains recipes used by Olive CI. diff --git a/intel-bert-base-uncased-mrpc/aitk/bert_dml.json b/intel-bert-base-uncased-mrpc/aitk/bert_dml.json new file mode 100644 index 00000000..28eafa9e --- /dev/null +++ b/intel-bert-base-uncased-mrpc/aitk/bert_dml.json @@ -0,0 +1,131 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "Intel/bert-base-uncased-mrpc", + "task": "text-classification", + "load_kwargs": { + "attn_implementation": "eager" + } + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "glue_mrpc_eval", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "glue", + "subset": "mrpc", + "split": "validation" + }, + "pre_process_data_config": { + "max_length": 128, + "padding": "max_length", + "input_cols": [ + "sentence1", + "sentence2" + ], + "max_samples": 100 + }, + "dataloader_config": { + "batch_size": 1 + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "glue_mrpc_eval", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1 + }, + { + "name": "f1_score" + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "glue_mrpc_eval", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "bert", + "opt_level": 0, + "float16": true, + "use_gpu": true, + "keep_io_types": false, + "optimization_options": { + "enable_gelu": true, + "enable_layer_norm": true, + "enable_attention": true, + "use_multi_head_attention": true, + "enable_skip_layer_norm": false, + "enable_embed_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_bias_gelu": false, + "enable_gelu_approximation": false, + "enable_qordered_matmul": false, + "enable_shape_inference": true, + "enable_gemm_fast_gelu": false, + "enable_nhwc_conv": false, + "enable_group_norm": false, + "enable_bias_splitgelu": false, + "enable_packed_qkv": true, + "enable_packed_kv": true, + "enable_bias_add": false, + "enable_rotary_embeddings": true + }, + "save_as_external_data": true + } + }, + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "host": "host_system", + "target": "target_system", + "cache_dir": "cache", + "output_dir": "model/bert_dml" +} \ No newline at end of file diff --git a/intel-bert-base-uncased-mrpc/aitk/bert_dml.json.config b/intel-bert-base-uncased-mrpc/aitk/bert_dml.json.config new file mode 100644 index 00000000..a0925b99 --- /dev/null +++ b/intel-bert-base-uncased-mrpc/aitk/bert_dml.json.config @@ -0,0 +1,105 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "glue" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "glue" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} \ No newline at end of file diff --git a/intel-bert-base-uncased-mrpc/aitk/info.yml b/intel-bert-base-uncased-mrpc/aitk/info.yml index 638cf88d..77efd2a2 100644 --- a/intel-bert-base-uncased-mrpc/aitk/info.yml +++ b/intel-bert-base-uncased-mrpc/aitk/info.yml @@ -24,6 +24,9 @@ recipes: - file: "bert_trtrtx.json" device: gpu ep: NvTensorRTRTXExecutionProvider + - file: "bert_dml.json" + device: gpu + ep: DmlExecutionProvider aitk: modelInfo: id: "huggingface/Intel/bert-base-uncased-mrpc" @@ -33,3 +36,4 @@ aitk: - file: "bert_qdq_amd.json" - file: "bert_ov.json" - file: "bert_trtrtx.json" + - file: "bert_dml.json" diff --git a/intel-bert-base-uncased-mrpc/aitk/model_project.config b/intel-bert-base-uncased-mrpc/aitk/model_project.config new file mode 100644 index 00000000..ca302634 --- /dev/null +++ b/intel-bert-base-uncased-mrpc/aitk/model_project.config @@ -0,0 +1,28 @@ +{ + "workflows": [ + { + "file": "bert_qdq_qnn.json", + "templateName": "bert_qdq_qnn" + }, + { + "file": "bert_qdq_amd.json", + "templateName": "bert_qdq_amd" + }, + { + "file": "bert_ov.json", + "templateName": "bert_ov" + }, + { + "file": "bert_trtrtx.json", + "templateName": "bert_trtrtx" + }, + { + "file": "bert_dml.json", + "templateName": "bert_dml" + } + ], + "modelInfo": { + "id": "huggingface/Intel/bert-base-uncased-mrpc", + "version": 1 + } +} diff --git a/intel-bert-base-uncased-mrpc/aitk/requirements.txt b/intel-bert-base-uncased-mrpc/aitk/requirements.txt index 0ce2fda0..bad441ca 100644 --- a/intel-bert-base-uncased-mrpc/aitk/requirements.txt +++ b/intel-bert-base-uncased-mrpc/aitk/requirements.txt @@ -1,3 +1,4 @@ -# For a full requirements, see AITK +# This file will be installed together with AITK runtime requirements +# For the full requirements, see AITK olive-ai optimum diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/.gitignore b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/README.md b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/README.md new file mode 100644 index 00000000..6ae6ffb0 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/README.md @@ -0,0 +1,48 @@ +# Laion Clip optimization + +This folder contains examples of Laion Clip optimization using different workflows. + +- Text and vision model QDQ for Qualcomm NPU +- QDQ for AMD NPU +- OpenVINO for Intel NPU + +## Laion Clip text optimization with QDQ for Qualcomm NPU + +This example performs Laion Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +### Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + + +| Activation Type  | Weight Type  | Size  | Latency ms (avg)  | +| --------------------- | ----------------- | ---------- | ---------------------- | +| QUInt16 | QUInt8 | 100 | 6.53724 | + +## Laion Clip vision optimization with QDQ for Qualcomm NPU + +This example performs Laion Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +### Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + + +| Activation Type  | Weight Type  | Size  | Latency ms (avg)  | +| --------------------- | ----------------- | ---------- | ---------------------- | +| QUInt16 | QUInt8 | 100 | 20.13231 | + + +## Laion Clip optimization with QDQ for AMD NPU + +This example performs Laion Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +## Laion Clip optimization with OpenVINO + +This example performs Laion Clip optimization with OpenVINO in one workflow for Intel NPU. diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/_copy.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/_copy.json.config new file mode 100644 index 00000000..4629da4e --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/_copy.json.config @@ -0,0 +1,224 @@ +{ + "copies": [ + { + "src": "../../../openai/clip-vit-base-patch16/1/model_project.config", + "dst": "model_project.config", + "replacements": [ + { + "find": "openai_clip", + "replace": "laion_clip" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_text_qnn_inference_sample.ipynb", + "dst": "laion_clip_text_qnn_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_text_qnn.json", + "dst": "laion_clip_text_qnn.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_text_qnn.json.config", + "dst": "laion_clip_text_qnn.json.config", + "replacements": [ + { + "find": "clip/qdq/openai_clip_text_b16_qdq.json", + "replace": "clip/qdq/laion_clip_text_b32_qdq.json" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_vision_qnn_inference_sample.ipynb", + "dst": "laion_clip_vision_qnn_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_vision_qnn.json", + "dst": "laion_clip_vision_qnn.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_vision_qnn.json.config", + "dst": "laion_clip_vision_qnn.json.config", + "replacements": [ + { + "find": "clip/qdq/openai_clip_vision_b16_qdq.json", + "replace": "clip/qdq/laion_clip_vision_b32_qdq.json" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_ov_inference_sample.ipynb", + "dst": "laion_clip_ov_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_ov.json", + "dst": "laion_clip_ov.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + }, + { + "find": "openai_clip", + "replace": "laion_clip" + }, + { + "find": "\"device\": \"npu\"\n", + "replace": "\"device\": \"npu\", \"library\": \"transformers\"\n" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_ov.json.config", + "dst": "laion_clip_ov.json.config", + "replacements": [ + { + "find": "clip/openvino/clip_vit_base_patch16_context_ov_static.json", + "replace": "clip/openvino/clip_vit_b32_laion2b_s34B_b79k_context_ov_static.json" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_qdq_amd_inference_sample.ipynb", + "dst": "laion_clip_qdq_amd_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_qdq_amd.json", + "dst": "laion_clip_qdq_amd.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_qdq_amd.json.config", + "dst": "laion_clip_qdq_amd.json.config", + "replacements": [ + { + "find": "clip/openai_clip-vit-base-patch16_ptq_qdq_vitis_ai.json", + "replace": "clip/laion_CLIP-ViT-B-32-laion2B-s34B-b79K_ptq_qdq_vitis_ai.json" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_trtrtx.json", + "dst": "laion_clip_trtrtx.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_trtrtx.json.config", + "dst": "laion_clip_trtrtx.json.config", + "replacements": [ + { + "find": "clip/openai_clip-vit-base-patch16_trtrtx.json", + "replace": "clip/laion_CLIP-ViT-B-32-laion2B-s34B-b79K_trtrtx.json" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_trtrtx_inference_sample.ipynb", + "dst": "laion_clip_trtrtx_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_dml.json", + "dst": "laion_clip_dml.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_dml.json.config", + "dst": "laion_clip_dml.json.config", + "replacements": [ + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_dml_inference_sample.ipynb", + "dst": "laion_clip_dml_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/clip_script.py", + "dst": "clip_script.py" + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/user_script.py", + "dst": "user_script.py" + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/openai_clip_ov.py", + "dst": "laion_clip_ov.py" + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/README.md", + "dst": "README.md", + "replacements": [ + { + "find": "Openai", + "replace": "Laion" + } + ] + }, + { + "src": "../../../openai/clip-vit-base-patch16/1/requirements.txt", + "dst": "requirements.txt" + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/clip_script.py b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/clip_script.py new file mode 100644 index 00000000..6f775697 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/clip_script.py @@ -0,0 +1,151 @@ +from __future__ import annotations + +from collections import OrderedDict +from itertools import chain + +import torch +from transformers import ( + AutoProcessor, + CLIPTextModelWithProjection, + CLIPVisionModelWithProjection, +) + +from olive.data.component.dataset import BaseDataset +from olive.data.registry import Registry + +HF_MODEL_SUBFOLDER_MAPPING = { + "sentence-transformers/clip-ViT-B-32": "0_CLIPModel", +} + + +def load_image_encoder(model_name): + return CLIPVisionModelWithProjection.from_pretrained( + model_name, + subfolder=HF_MODEL_SUBFOLDER_MAPPING.get(model_name, ""), + ).eval() + + +def load_text_encoder(model_name): + if model_name == "sentence-transformers/clip-ViT-B-32-multilingual-v1": + from sbert_clip_script import SDistilBertTextEncoder + + return SDistilBertTextEncoder(model_name).eval() + + return CLIPTextModelWithProjection.from_pretrained( + model_name, + subfolder=HF_MODEL_SUBFOLDER_MAPPING.get(model_name, ""), + ).eval() + + +def hfdataset_pre_process_for_clip( + dataset, + processor, + torch_model=None, + image_col: str | None = None, + caption_col: str | None = None, + label_col: str = "label", + max_samples: int | None = None, + max_length: int = 77, + batch_size: int = 32, +): + def generate_inputs(sample, indices): + captions = sample.get(caption_col, None) + images = sample.get(image_col, None) + + kwargs = { + "padding": "max_length", + "max_length": max_length, + "truncation": True, + "add_special_tokens": True, + "return_tensors": "pt", + } + if images: + kwargs["images"] = [img.convert("RGB") for img in images] + if captions: + kwargs["text"] = list(chain([x[0] for x in captions])) + + encoded_input = processor(**kwargs) + + return { + **encoded_input, + label_col: torch_model(**encoded_input)[0] if torch_model else sample.get(label_col, indices), + } + + if max_samples is not None and max_samples < len(dataset): + dataset = dataset.select(range(max_samples)) + + tokenized_datasets = dataset.map( + generate_inputs, + batched=True, + batch_size=batch_size, + with_indices=True, + remove_columns=dataset.column_names, + desc="Processing dataset", + ) + tokenized_datasets.set_format("torch", output_all_columns=True) + + return tokenized_datasets + + +@Registry.register_pre_process() +def pre_process_dataset( + dataset, + model_name: str, + generate_ground_truth: bool = False, + image_col: str | None = None, + caption_col: str | None = None, + label_col: str = "label", + max_samples: int | None = None, + max_length: int = 77, + **kwargs, +): + if image_col is None and caption_col is None: + raise ValueError("Either image_col or caption_col must be provided.") + + if generate_ground_truth: + if image_col and caption_col: + raise ValueError("Can not generate two types of embedding at the same time.") + + torch_model = load_image_encoder(model_name) if image_col else load_text_encoder(model_name) + else: + torch_model = None + + processor = AutoProcessor.from_pretrained(model_name) + dataset = hfdataset_pre_process_for_clip( + dataset, + processor, + torch_model=torch_model, + image_col=image_col, + caption_col=caption_col, + label_col=label_col, + max_length=max_length, + max_samples=max_samples, + ) + return BaseDataset(dataset, label_col) + + +@Registry.register_post_process() +def embed_post_process(output): + """Post-processing for CLIP output.""" + match output: + case dict() | OrderedDict() as out: + if "embeds" in out: + return out["embeds"] + elif "text_embeds" in out: + return out["text_embeds"] + elif "image_embeds" in out: + return out["image_embeds"] + case torch.Tensor(): + return output.argmax(dim=-1) + raise ValueError(f"Unsupported output type: {type(output)}") + + +def eval_similarity_degrad(output, targets, batch_size=1024): + import torch.nn.functional as F + + preds = output.preds + scores = [ + F.cosine_similarity(preds[i : i + batch_size], targets[i : i + batch_size]) + for i in range(0, preds.size(0), batch_size) + ] + return {"percentage": f"{100.0 - torch.mean(torch.cat(scores)) * 100.0:.2f}"} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/info.yml b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/info.yml new file mode 100644 index 00000000..70edec7c --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/info.yml @@ -0,0 +1,28 @@ +keywords: + aitk +arch: clip +recipes: + - file: "laion_clip_text_qnn.json" + device: npu + ep: QNNExecutionProvider + name: "laion-CLIP-ViT-B-32-laion2B-s34B-b79K (Text)" + - file: "laion_clip_vision_qnn.json" + device: npu + ep: QNNExecutionProvider + name: "laion-CLIP-ViT-B-32-laion2B-s34B-b79K (Vision)" + - file: "laion_clip_qdq_amd.json" + device: npu + ep: VitisAIExecutionProvider + - file: "laion_clip_ov.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "laion_clip_trtrtx.json" + device: gpu + ep: NvTensorRTRTXExecutionProvider + - file: "laion_clip_dml.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + version: 1 diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml.json b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml.json new file mode 100644 index 00000000..27847826 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml.json @@ -0,0 +1,192 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "clip", + "opt_level": 0, + "float16": true, + "use_gpu": true, + "keep_io_types": false, + "optimization_options": { + "enable_gelu": true, + "enable_layer_norm": true, + "enable_attention": true, + "use_multi_head_attention": true, + "enable_skip_layer_norm": false, + "enable_embed_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_bias_gelu": false, + "enable_gelu_approximation": false, + "enable_qordered_matmul": false, + "enable_shape_inference": true, + "enable_gemm_fast_gelu": false, + "enable_nhwc_conv": false, + "enable_group_norm": false, + "enable_bias_splitgelu": false, + "enable_packed_qkv": true, + "enable_packed_kv": true, + "enable_bias_add": false, + "enable_rotary_embeddings": true + }, + "save_as_external_data": true + } + }, + "search_strategy": false, + "host": "host_system", + "target": "target_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip" +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml.json.config new file mode 100644 index 00000000..ed09dcf4 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml.json.config @@ -0,0 +1,87 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.end", + "template": { + "path": "data_configs[0].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml_inference_sample.ipynb b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml_inference_sample.ipynb new file mode 100644 index 00000000..c33db85d --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_dml_inference_sample.ipynb @@ -0,0 +1,90 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"DmlExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.GPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values'].astype(np.float16)\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "winml", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.json b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.json new file mode 100644 index 00000000..b1c07ec0 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.json @@ -0,0 +1,125 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "user_script": "laion_clip_ov.py", + "load_dataset_config": { + "type": "conceptual_captions_dataset", + "data_name": "google-research-datasets/conceptual_captions", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" + }, + "dataloader_config": { + "batch_size": 1, + "drop_last": true + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "dataset_name": "nlphuji/flickr30k", + "start": 10, + "end": 20 + }, + "dataloader_config": { "type": "no_auto_batch_dataloader" }, + "post_process_data_config": { "type": "clip_post_process" } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { "name": "accuracy", "priority": 1, "goal": { "type": "max-degradation", "value": 0.05 } } + ] + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { "name": "avg", "priority": 2, "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } }, + { "name": "p90", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } } + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu", "library": "transformers" + } + }, + "ov_quantize": { + "type": "OpenVINOQuantization", + "target_device": "npu", + "data_config": "quantize_data_config", + "model_type": "TRANSFORMER", + "user_script": "laion_clip_ov.py", + "transform_fn": "custom_transform_func", + "extra_configs": [ + { + "advanced_quantization_parameters": { + "smooth_quant_alpha": 0.6 + } + } + ] + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "static": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "ov_version": "2025.1" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip_vit_base_patch16_context_ov_static" +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.json.config new file mode 100644 index 00000000..fb471c3b --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.json.config @@ -0,0 +1,174 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "clip/openvino/clip_vit_b32_laion2b_s34B_b79k_context_ov_static.json", + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "cpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "gpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "npu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "google-research-datasets/conceptual_captions" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "google-research-datasets/conceptual_captions" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.py b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.py new file mode 100644 index 00000000..d1971b50 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov.py @@ -0,0 +1,124 @@ +from io import BytesIO + +import requests +import torch +from datasets import load_dataset +from PIL import Image +from requests.packages.urllib3.exceptions import InsecureRequestWarning +from tqdm import tqdm +from transformers import CLIPModel, CLIPProcessor + +from olive.data.registry import Registry + +requests.packages.urllib3.disable_warnings(InsecureRequestWarning) + +# ------------------------------------------------------------------------- +# Common Dataset +# ------------------------------------------------------------------------- + +seed = 0 +# seed everything to 0 for reproducibility, https://pytorch.org/docs/stable/notes/randomness.html +# do not set random seed and np.random.seed for aml test, since it will cause aml job name conflict +torch.manual_seed(seed) +# the following are needed only for GPU +torch.cuda.manual_seed(seed) +torch.backends.cudnn.deterministic = True +torch.backends.cudnn.benchmark = False + + +def check_text_data(data): + """Check if the given data is text-based.""" + if isinstance(data, str): + return True + if isinstance(data, list): + return all(isinstance(x, str) for x in data) + return False + + +def get_pil_from_url(url): + """Download and convert an image from a URL to a PIL Image object.""" + response = requests.get(url, verify=True, timeout=20) + image = Image.open(BytesIO(response.content)) + return image.convert("RGB") + + +def wrap_collate_fn(processor, max_length): + def collate_fn(example, image_column="image_url", text_column="caption"): + """Preprocess an example by loading and transforming image and text data. + + Check if the text data in the example is valid by calling the `check_text_data` function. + Download the image specified by the URL in the image_column by calling the `get_pil_from_url` function. + If there is any error during the download process, return None. + Return the preprocessed inputs with transformed image and text data. + """ + if len(example) != 1: + raise ValueError(f"Expected 'example' to have exactly one element, but got {len(example)}.") + example = example[0] + + if not check_text_data(example[text_column]): + raise ValueError("Text data is not valid") + + url = example[image_column] + try: + image = get_pil_from_url(url) + w, h = image.size + if h == 1 or w == 1: + return None + except Exception: + return None + + inputs = processor(text=example[text_column], images=[image], return_tensors="pt", padding=True) + if inputs["input_ids"].shape[1] > max_length: + return None + return inputs + + return collate_fn + + +def prepare_calibration_data(dataloader, init_steps): + """Prepare calibration data from a dataloader for a specified number of initialization steps. + + Iterate over the dataloader, fetching batches and storing the relevant data. + """ + data = [] + with tqdm(total=init_steps) as pbar: + for batch in dataloader: + if len(data) == init_steps: + break + if batch: + pbar.update(1) + with torch.no_grad(): + data.append( + { + "input_ids": batch["input_ids"].to("cpu"), + "pixel_values": batch["pixel_values"].to("cpu"), + "attention_mask": batch["attention_mask"].to("cpu"), + } + ) + return data + + +@Registry.register_dataset() +def conceptual_captions_dataset(data_name,opt_init_steps=200, max_train_samples=1000, **kwargs): + """Prepare a vision-text dataset for quantization.""" + dataset = load_dataset(data_name, trust_remote_code=True) + model_path = kwargs.get("model_path") + if not model_path: + raise ValueError( + "The 'model_path' parameter is required in data_configs.load_dataset_config but was not provided." + ) + model = CLIPModel.from_pretrained(model_path) + processor = CLIPProcessor.from_pretrained(model_path) + max_length = model.config.text_config.max_position_embeddings + train_dataset = dataset["train"].shuffle(seed=seed) + collate_fn = wrap_collate_fn(processor, max_length) + dataloader = torch.utils.data.DataLoader(train_dataset, collate_fn=collate_fn, batch_size=1) + return prepare_calibration_data(dataloader, opt_init_steps) + + +def custom_transform_func(data_item): + np_inputs = {} + for inp in data_item: + # Drop the first dimension using slicing + np_inputs[inp] = data_item[inp].numpy()[0, ...] + return np_inputs diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov_inference_sample.ipynb b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov_inference_sample.ipynb new file mode 100644 index 00000000..df300a10 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_ov_inference_sample.ipynb @@ -0,0 +1,84 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/openvino_model_quant_st.onnx\"\n", + "ExecutionProvider=\"OpenVINOExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values']\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd.json b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd.json new file mode 100644 index 00000000..173e2962 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd.json @@ -0,0 +1,209 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "VitisAIExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quant_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "clip", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue" + }, + { + "surgeon": "PowReduceSumPowDiv2LpNorm" + } + ] + }, + "quantization": { + "type": "OnnxStaticQuantization", + "quant_preprocess": true, + "data_config": "quant_data_config", + "activation_type": "uint16", + "precision": "uint8", + "calibrate_method": "MinMax", + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "uint8", + "quant_type": "OnnxStaticQuantization" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip_vit_base_patch16" +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd.json.config new file mode 100644 index 00000000..a85c1762 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd.json.config @@ -0,0 +1,195 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "clip/laion_CLIP-ViT-B-32-laion2B-s34B-b79K_ptq_qdq_vitis_ai.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "AMD NPU", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "VitisAIExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.end", + "template": { + "path": "data_configs[0].load_dataset_config.end", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].load_dataset_config.end", + "template": { + "path": "data_configs[1].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd_inference_sample.ipynb b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd_inference_sample.ipynb new file mode 100644 index 00000000..b5dd1398 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_qdq_amd_inference_sample.ipynb @@ -0,0 +1,84 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"VitisAIExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values']\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn.json b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn.json new file mode 100644 index 00000000..47cfd022 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn.json @@ -0,0 +1,193 @@ +{ + "input_model": { + "type": "PytorchModel", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "generative": false, + "io_config": { + "input_names": [ + "input_ids", + "attention_mask" + ], + "input_shapes": [ + [ + 1, + 77 + ], + [ + 1, + 77 + ] + ], + "input_types": [ + "int32", + "int32" + ], + "output_names": [ + "embeds", + "last_hidden_state" + ] + }, + "model_loader": "load_text_encoder", + "model_script": "clip_script.py" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "host": "host_system", + "target": "host_system", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "log_to_file": false, + "data_configs": [ + { + "name": "calib_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "nlphuji/flickr30k", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "caption_col": "caption", + "max_length": 77, + "max_samples": 12 + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + }, + { + "name": "eval_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "nlphuji/flickr30k", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "generate_ground_truth": true, + "caption_col": "caption", + "max_length": 77, + "max_samples": 100 + }, + "post_process_data_config": { + "type": "embed_post_process" + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "degrad", + "type": "custom", + "data_config": "eval_data", + "sub_types": [ + { + "name": "percentage", + "priority": 1, + "higher_is_better": false + } + ], + "user_config": { + "user_script": "clip_script.py", + "metric_func": "eval_similarity_degrad" + } + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { + "name": "avg", + "priority": 2, + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + }, + { + "name": "p90", + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "sequence_length" + ], + "dim_value": [ + 1, + 77 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue", + "replacement": -100.0 + }, + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "bert", + "opt_level": 1, + "optimization_options": { + "enable_gelu": false, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "quantization": { + "type": "OnnxStaticQuantization", + "data_config": "calib_data", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "cache_dir": "cache", + "output_dir": "model/clip_text" +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn.json.config new file mode 100644 index 00000000..d312cc7a --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn.json.config @@ -0,0 +1,235 @@ +{ + "name": "Convert Text Model to Qualcomm NPU", + "oliveFile": "clip/qdq/laion_clip_text_b32_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.host_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "test" + ], + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "test" + ], + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn_inference_sample.ipynb b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn_inference_sample.ipynb new file mode 100644 index 00000000..293b9b1f --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_text_qnn_inference_sample.ipynb @@ -0,0 +1,141 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "43751a72", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"QNNExecutionProvider\"" + ] + }, + { + "cell_type": "markdown", + "id": "897ffb42-3569-4d78-b99d-355a38fdce35", + "metadata": {}, + "source": [ + "### Data Processor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa8d84cd-4853-4746-bce3-b281bfc23d8b", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import CLIPProcessor\n", + "\n", + "processor = CLIPProcessor.from_pretrained(\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\")" + ] + }, + { + "cell_type": "markdown", + "id": "5568eb71-5812-4c74-989c-c12271d33b12", + "metadata": {}, + "source": [ + "### Model Inference with ORT-QNN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02bad4ec-f477-4659-8584-00735f6ed5a9", + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import numpy as np\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "text_model = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "def get_text_embedding(text):\n", + " inputs = processor(\n", + " text=text,\n", + " padding=\"max_length\",\n", + " max_length=77,#text_model.sequence_length,\n", + " truncation=True,\n", + " add_special_tokens=True,\n", + " return_tensors=\"np\",\n", + " )\n", + " output = text_model.run(None, {\n", + " \"input_ids\": inputs[\"input_ids\"].astype(np.int32),\n", + " \"attention_mask\": inputs[\"attention_mask\"].astype(np.int32),\n", + " })\n", + " return torch.from_numpy(output[0])\n", + "\n", + "def calculate_score(emb_1, emb_2):\n", + " emb_1 /= torch.norm(emb_1, dim=-1, keepdim=True)\n", + " emb_2 /= torch.norm(emb_2, dim=-1, keepdim=True)\n", + " return torch.matmul(emb_1, emb_2.T) * 100.0\n", + "\n", + "# Get source embedding and calculate the similarity score for each target\n", + "# We need to process one by one because to static quantization, we fixed the batch size to 1\n", + "def ask(source, targets):\n", + " source_emb = get_text_embedding(source)\n", + " scores = []\n", + " for i, target in enumerate(targets):\n", + " target_emb = get_text_embedding(target)\n", + " score = calculate_score(source_emb, target_emb)\n", + " print(f\"Similarity score of sentence {i}:{score.item()}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3477e36c-2e72-432b-ae81-602073a3754c", + "metadata": {}, + "source": [ + "### Play with Samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cdc2a6-4c81-4f93-8426-065ee4c2b013", + "metadata": {}, + "outputs": [], + "source": [ + "ask(\"a photo containing two cats\", [\"a photo of tshirt\", \"a photo of two cats\"])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx.json b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx.json new file mode 100644 index 00000000..f5c79241 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx.json @@ -0,0 +1,173 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "NvTensorRTRTXExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quant_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "dataset_name": "nlphuji/flickr30k", + "start": 10, + "end": 20 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + }, + "session_params_tuning": { + "type": "OrtSessionParamsTuning", + "io_bind": false, + "data_config": "quant_data_config" + } + }, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/clip-vit-base-patch16", + "evaluate_input_model": false +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx.json.config new file mode 100644 index 00000000..55b6e418 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx.json.config @@ -0,0 +1,86 @@ +{ + "name": "Convert to NVIDIA TRT for RTX", + "oliveFile": "clip/laion_CLIP-ViT-B-32-laion2B-s34B-b79K_trtrtx.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "NVIDIA TensorRT for RTX", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "NvTensorRTRTXExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].load_dataset_config.end", + "template": { + "path": "data_configs[1].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx_inference_sample.ipynb b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx_inference_sample.ipynb new file mode 100644 index 00000000..c4c32324 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_trtrtx_inference_sample.ipynb @@ -0,0 +1,90 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"NvTensorRTRTXExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.GPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values'].astype(np.float16)\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "winml", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn.json b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn.json new file mode 100644 index 00000000..20f32514 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn.json @@ -0,0 +1,186 @@ +{ + "input_model": { + "type": "PytorchModel", + "model_path": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "generative": false, + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "embeds" + ] + }, + "model_loader": "load_image_encoder", + "model_script": "clip_script.py" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "host": "host_system", + "target": "host_system", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "log_to_file": false, + "data_configs": [ + { + "name": "calib_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "image_col": "image", + "max_samples": 12 + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + }, + { + "name": "eval_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "generate_ground_truth": true, + "image_col": "image", + "max_samples": 100 + }, + "post_process_data_config": { + "type": "embed_post_process" + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "degrad", + "type": "custom", + "data_config": "eval_data", + "sub_types": [ + { + "name": "percentage", + "priority": 1, + "higher_is_better": false + } + ], + "user_config": { + "user_script": "clip_script.py", + "metric_func": "eval_similarity_degrad", + "metric_func_kwargs": { + "batch_size": 32 + } + } + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { + "name": "avg", + "priority": 2, + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + }, + { + "name": "p90", + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "num_channels", + "height", + "width" + ], + "dim_value": [ + 1, + 3, + 224, + 224 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "vit", + "opt_level": 1, + "optimization_options": { + "enable_gelu": false, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "quantization": { + "type": "OnnxStaticQuantization", + "data_config": "calib_data", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "cache_dir": "cache", + "output_dir": "model/clip_vision" +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn.json.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn.json.config new file mode 100644 index 00000000..6308658b --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn.json.config @@ -0,0 +1,237 @@ +{ + "name": "Convert Vision Model to Qualcomm NPU", + "oliveFile": "clip/qdq/laion_clip_vision_b32_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.host_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn_inference_sample.ipynb b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn_inference_sample.ipynb new file mode 100644 index 00000000..02cfa10a --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/laion_clip_vision_qnn_inference_sample.ipynb @@ -0,0 +1,170 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "3c18a7d6", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"QNNExecutionProvider\"" + ] + }, + { + "cell_type": "markdown", + "id": "897ffb42-3569-4d78-b99d-355a38fdce35", + "metadata": {}, + "source": [ + "### Data Processor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa8d84cd-4853-4746-bce3-b281bfc23d8b", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import CLIPProcessor\n", + "\n", + "processor = CLIPProcessor.from_pretrained(\"laion/CLIP-ViT-B-32-laion2B-s34B-b79K\")" + ] + }, + { + "cell_type": "markdown", + "id": "5568eb71-5812-4c74-989c-c12271d33b12", + "metadata": {}, + "source": [ + "### Model Inference with ORT-QNN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02bad4ec-f477-4659-8584-00735f6ed5a9", + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import numpy as np\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "vision_model = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "def get_image_embedding(image):\n", + " inputs = processor(images=image, return_tensors=\"np\")\n", + " output = vision_model.run(None, { \"pixel_values\": inputs[\"pixel_values\"] })\n", + " return torch.from_numpy(output[0])\n", + "\n", + "def calculate_score(emb_1, emb_2):\n", + " emb_1 /= torch.norm(emb_1, dim=-1, keepdim=True)\n", + " emb_2 /= torch.norm(emb_2, dim=-1, keepdim=True)\n", + " return torch.matmul(emb_1, emb_2.T) * 100.0\n", + "\n", + "# Get source embedding and calculate the similarity score for each target\n", + "# We need to process one by one because to static quantization, we fixed the batch size to 1\n", + "def ask(source, targets):\n", + " source_emb = get_image_embedding(source)\n", + " for i, target in enumerate(targets):\n", + " target_emb = get_image_embedding(target)\n", + " score = calculate_score(source_emb, target_emb)\n", + " print(f\"Similarity score of image {i}:{score.item()}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3477e36c-2e72-432b-ae81-602073a3754c", + "metadata": {}, + "source": [ + "### Play with Samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16868fbd-e447-4866-af7d-eb6e49975bcc", + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from PIL import Image\n", + "\n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + "image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07076b9a", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"http://images.cocodataset.org/train2017/000000208833.jpg\"\n", + "image1 = Image.open(requests.get(url, stream=True).raw)\n", + "image1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c10de7cd", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"http://images.cocodataset.org/train2017/000000125690.jpg\"\n", + "image2 = Image.open(requests.get(url, stream=True).raw)\n", + "image2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cdc2a6-4c81-4f93-8426-065ee4c2b013", + "metadata": {}, + "outputs": [], + "source": [ + "ask(image, [image1, image2])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/model_project.config b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/model_project.config new file mode 100644 index 00000000..add4c18d --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/model_project.config @@ -0,0 +1,32 @@ +{ + "workflows": [ + { + "file": "laion_clip_text_qnn.json", + "templateName": "laion_clip_text_qnn" + }, + { + "file": "laion_clip_vision_qnn.json", + "templateName": "laion_clip_vision_qnn" + }, + { + "file": "laion_clip_qdq_amd.json", + "templateName": "laion_clip_qdq_amd" + }, + { + "file": "laion_clip_ov.json", + "templateName": "laion_clip_ov" + }, + { + "file": "laion_clip_trtrtx.json", + "templateName": "laion_clip_trtrtx" + }, + { + "file": "laion_clip_dml.json", + "templateName": "laion_clip_dml" + } + ], + "modelInfo": { + "id": "huggingface/laion/CLIP-ViT-B-32-laion2B-s34B-b79K", + "version": 1 + } +} diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/requirements.txt b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/requirements.txt new file mode 100644 index 00000000..163d793e --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/requirements.txt @@ -0,0 +1,7 @@ +# This file will be installed together with AITK runtime requirements +# For the full requirements, see AITK +olive-ai +cachetools==5.5.0 +nltk>=3.9.1 +accelerate>=1.4.0 +pillow>=10.0.1 diff --git a/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/user_script.py b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/user_script.py new file mode 100644 index 00000000..2d0051f0 --- /dev/null +++ b/laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk/user_script.py @@ -0,0 +1,64 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- +import numpy as np +import torch +from datasets import load_dataset +from torch.utils.data import Dataset +from transformers import CLIPProcessor + +from olive.data.registry import Registry + + +class CLIPDataset(Dataset): + def __init__( + self, + model_name, + dataset_name, + start=0, + end=500, + image_size=(224, 224), + ): + assert 0 <= start < end + self.start = start + self.end = end + self.model_name = model_name + self.dataset_name = dataset_name + self.processor = CLIPProcessor.from_pretrained(self.model_name) + self.length = self.end - self.start + self.image_size = image_size + self.dataset = load_dataset(self.dataset_name, split=f"test[{0}:{self.end + 10}]") + + def __len__(self): + return self.length + + def __getitem__(self, idx): + text_inputs = self.processor( + text=[" ".join(item) for item in self.dataset[idx : idx + 10]["caption"]], + return_tensors="np", + padding="max_length", + truncation=True, + ) + + image_input = self.processor(images=self.dataset[idx]["image"].resize(self.image_size), return_tensors="np") + model_inputs = [ + { + "input_ids": text_inputs["input_ids"].astype(np.int64), + "pixel_values": image_input["pixel_values"], + "attention_mask": text_inputs["attention_mask"].astype(np.int64), + } + ] + + target = torch.Tensor([0]).to(torch.int32) + return model_inputs[0], target + + +@Registry.register_dataset() +def clip_dataset(**kwargs): + return CLIPDataset(**kwargs) + + +@Registry.register_post_process() +def clip_post_process(output): + return output["logits_per_image"].argmax(axis=-1) diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/.gitignore b/meta-llama-Llama-3.2-1B-Instruct/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/README.md b/meta-llama-Llama-3.2-1B-Instruct/aitk/README.md new file mode 100644 index 00000000..1355c54f --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/README.md @@ -0,0 +1,160 @@ +# Llama-3.2-1B-Instruct Model Optimization + +This repository demonstrates the optimization of the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into three main workflows: + +- QDQ for AMD NPU +- PTQ + AOT for QNN NPU + + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** +- OpenVINO for Intel NPU + + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` + +## **QDQ Model with 4-bit Weights & 16-bit Activations** + +This workflow produces an ONNX QDQ model that is agnostic to the target hardware and accelerator, making it suitable for general inference. + +### **Optimization Process** + +The model is optimized using **weight-only quantization** and **activation quantization** for efficient deployment. The process includes: + +1. **Weight Rotation ([QuaRot](https://arxiv.org/abs/2404.00456))** + - Reduces outliers from weights and hidden states to enhance quantization efficiency. + +2. **4-bit Per-Channel Symmetric Quantization ([GPTQ](https://arxiv.org/abs/2210.17323))** + - Reduces transformer layer size while preserving accuracy. + +3. **ONNX Graph Capture** + - Exports the model to ONNX for further optimization. + +4. **4-bit Block-wise Quantization** + - Applies weight-only quantization to the **embedding layer** and **language modeling head**. + +5. **16-bit Activation Quantization** + - Uses 16-bit activations to balance precision and efficiency. + +The final output is a **QDQ model** with **4-bit weights** and **16-bit activations**. This model also leverages [GroupQueryAttention (GQA)](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.GroupQueryAttention) for efficient long-context processing and long-sequence generation. + +### **Handling Dynamic and Static Input Shapes** + +NPUs require **precompiled graphs**, meaning the model must use **static input shapes**. However, **text generation** involves two distinct processing stages: + +- **Prefill (Prompt Processing)**: Processes multiple tokens simultaneously. +- **Token Generation (Iteration)**: Processes one token at a time. + +To support both efficiently, we create **two model instances**: +1. **Prefill model**: Optimized for batch processing. +2. **Token generation model**: Optimized for one-token-at-a-time inference. + +## **PTQ + AOT Compilation for Qualcomm NPUs using QNN EP** + +This process extends the [**QDQ Model with 4-bit Weights & 16-bit Activations**](#qdq-model-with-4-bit-weights--16-bit-activations) by compiling it specifically for **Qualcomm NPUs** using the **QNN Execution Provider**. + +### **Resource Optimization Strategy** + +To maximize efficiency while supporting dynamic input handling: + +- **Embedding Layer & Language Model Head** → Executed on CPU (handles dynamic input). +- **Transformer Layers** → Executed on NPU (requires static input shapes). +- **Weight Sharing** → Prefill & token generation models reuse weights to minimize memory usage. + +> ⚠️ **Note:** GQA is an ONNX Runtime *contrib operator* and must be executed on the CPU. The model graph is partitioned into **CPU (GQA nodes)** and **NPU (other nodes)** for execution. + +### **Compilation for Qualcomm NPU Deployment** + +Once optimized, the model is compiled for Qualcomm NPUs using **ONNX Runtime QNNExecutionProvider**. The steps include: + +1. **Split the Quantized Model** → Divide into three parts: + - **Embedding Layer** + - **Transformer Layers** + - **Language Model Head** +2. **Set Static Input Shapes**: + - **(1, 64)** for prefill (batch size, sequence length). + - **(1, 1)** for token generation. +3. **Compile using QNNExecutionProvider**: + - Leverages **weight sharing** across the prefill and token generation models. + +### **Usage** + +This workflow is configured using the `qnn_config.json` file. It contains all of the quantization and compilation steps. It requires two separate Python environments described below. + +#### A workable version + +- python=3.10 +- CUDA=12.1 +- cudnn=9.2.0 + +#### Quantization Python Environment Setup + +Quantization is resource-intensive and requires GPU acceleration. In an [x64 Python environment with Olive installed](https://github.com/microsoft/Olive/blob/main/examples/README.md#important), install the required packages: + +```bash +# Install common dependencies +pip install -r requirements.txt + +# Install ONNX Runtime GPU packages +pip install "onnxruntime-gpu>=1.21.0" "onnxruntime-genai-cuda>=0.6.0" + +# AutoGPTQ: Install from source (stable package may be slow for weight packing) +# Disable CUDA extension build (not required) +# Linux +export BUILD_CUDA_EXT=0 +# Windows +# set BUILD_CUDA_EXT=0 + +# Install AutoGPTQ from source +pip install --no-build-isolation git+https://github.com/PanQiWei/AutoGPTQ.git +``` + +> ⚠️ Only set up the environment and install the packages. Do not run the `olive run` command at this point. + +#### AOT Compilation Python Environment Setup + +Model compilation using QNN Execution Provider requires a Python environment with onnxruntime-qnn installed. In a separate Python environment with Olive installed, install the required packages: + +```bash +# Install ONNX Runtime QNN +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +``` + +Replace `/path/to/qnn/env/bin` in `qnn_config.json` with the path to the directory containing your QNN environment's Python executable. This path can be found by running the following command in the environment: + +```bash +# Linux +command -v python +# Windows +# where python +``` + +This command will return the path to the Python executable. Set the parent directory of the executable as the `/path/to/qnn/env/bin` in the config file. + +#### **Run the Quantization + Compilation Config** + +Activate the **Quantization Python Environment** and run the workflow: + +```bash +olive run --config qnn_config.json +``` + +Olive will run the AOT compilation step in the **AOT Compilation Python Environment** specified in the config file using a subprocess. All other steps will run in the **Quantization Python Environment** natively. + +✅ Optimized model saved in: `./model` + +> ⚠️ If optimization fails due to out of memory, please remove `calibration_providers` in config file. + +> ⚠️ If optimization fails during context binary generation, rerun the command. The process will resume from the last completed step. + +### **Inference** + +The optimized model can be used for inference using ONNX Runtime QNNExecutionProvider and ONNX Runtime GenAI. **Inference must be run on a Windows Copilot+ PC with a Qualcomm NPU.** + +#### **Install Required Packages (arm64 Python)** +```bash +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +pip install "onnxruntime-genai>=0.7.0rc2" +``` + +#### **Run Console-Based Chat Interface** +Execute the provided `inference_sample.ipynb` notebook. + + diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/_copy.json.config b/meta-llama-Llama-3.2-1B-Instruct/aitk/_copy.json.config new file mode 100644 index 00000000..b6457585 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/_copy.json.config @@ -0,0 +1,160 @@ +{ + "copies": [ + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/model_project.config", + "dst": "model_project.config", + "replacements": [ + { + "find": "deepseek_qnn_config", + "replace": "llama3_2_qnn_config" + }, + { + "find": "deepseek_vitis_ai_config", + "replace": "llama3_2_vitis_ai_config" + }, + { + "find": "deepseek_ov_config", + "replace": "llama3_2_ov_config" + }, + { + "find": "deepseek_dml_config", + "replace": "llama3_2_dml_config" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_qnn_config.json", + "dst": "llama3_2_qnn_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "meta-llama/Llama-3.2-1B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/llama3_2" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_qnn_config.json.config", + "dst": "llama3_2_qnn_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_vitis_ai_config.json", + "dst": "llama3_2_vitis_ai_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "meta-llama/Llama-3.2-1B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/llama3_2" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_vitis_ai_config.json.config", + "dst": "llama3_2_vitis_ai_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json", + "dst": "llama3_2_ov_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "meta-llama/Llama-3.2-1B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/llama3_2" + }, + { + "find": "\"awq\": false", + "replace": "\"awq\": true" + }, + { + "find": "\"scale_estimation\": false", + "replace": "\"scale_estimation\": true" + }, + { + "find": "\"sensitivity_metric\": \"weight_quantization_error\",", + "replace": "" + }, + { + "find": "\"backup_precision\": \"int8_asym\"", + "replace": "\"backup_precision\": \"int8_sym\"" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json.config", + "dst": "llama3_2_ov_config.json.config", + "replacements": [ + { + "find": "deepseek/openvino/DeepSeek-R1-Distill-Qwen-1.5B_context_ov_dynamic_sym_gs128_bkp_int8_sym_r1.json", + "replace": "llama3/openvino/Llama-3.2-1B-Instruct_context_ov_dynamic_sym_bkp_int8_sym.json" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_dml_config.json", + "dst": "llama3_2_dml_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "meta-llama/Llama-3.2-1B-Instruct" + }, + { + "find": "model/deepseek", + "replace": "model/llama3_2" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_dml_config.json.config", + "dst": "llama3_2_dml_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/README.md", + "dst": "README.md", + "replacements": [ + { + "find": "# DeepSeek-R1-Distill-Qwen-1.5B Model Optimization", + "replace": "# Llama-3.2-1B-Instruct Model Optimization" + }, + { + "find": "[DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)", + "replace": "[Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)" + }, + { + "find": "> ⚠️ If got 6033 error, replace `genai_config.json` in `./model` folder", + "replace": "" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/requirements.txt", + "dst": "requirements.txt", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/inference_sample.ipynb", + "dst": "inference_sample.ipynb", + "replacements": [ + { + "find": "<|User|>{input}<|Assistant|>", + "replace": "<|start_header_id|>user<|end_header_id|>\\\\n{input}<|start_header_id|>assistant<|end_header_id|>\\\\n" + } + ] + } + ] +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/inference_model.json b/meta-llama-Llama-3.2-1B-Instruct/aitk/inference_model.json new file mode 100644 index 00000000..5ec359e2 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/inference_model.json @@ -0,0 +1,31 @@ +{ + "Name": "Llama-3.2-1B-Instruct", + "PromptTemplate": { + "assistant": "{Content}", + "prompt":"<|start_header_id|>user<|end_header_id|>\n{Content}<|start_header_id|>assistant<|end_header_id|>\n" + }, + "ParameterSchema": { + "enabled": [ + { + "name": "max_tokens", + "default": 512 + }, + { + "name": "temperature", + "default": 0.6 + }, + { + "name": "top_p", + "default": 0.9 + }, + { + "name": "top_k", + "default": 5 + }, + { + "name": "random_seed", + "default": 42 + } + ] + } +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/inference_sample.ipynb b/meta-llama-Llama-3.2-1B-Instruct/aitk/inference_sample.ipynb new file mode 100644 index 00000000..77a3070b --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/inference_sample.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = 'Who is Isaac Newton?'\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "model_folder = \"./model\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime_genai as og\n", + "import json\n", + "import time\n", + "from pathlib import Path\n", + "\n", + "def get_session_options(obj):\n", + " if type(obj) is dict:\n", + " for k, v in obj.items():\n", + " if k == \"session_options\":\n", + " yield v\n", + " else:\n", + " for x in get_session_options(v):\n", + " yield x\n", + " elif type(obj) is list:\n", + " for v in obj:\n", + " for x in get_session_options(v):\n", + " yield x\n", + "\n", + "\n", + "def remove_provider_options(model_path):\n", + " genai_config_path = Path(model_path) / \"genai_config.json\"\n", + " data = json.loads(genai_config_path.read_text())\n", + " for session_option in get_session_options(data):\n", + " if 'provider_options' in session_option:\n", + " session_option['provider_options'] = [{k: dict() for k in opts.keys()} for opts in session_option['provider_options']]\n", + "\n", + " json.dump(data, genai_config_path.open(\"w\"), indent=4)\n", + "\n", + "if ExecutionProvider == \"QNNExecutionProvider\":\n", + " remove_provider_options(model_folder)\n", + "\n", + "# Load the base model and tokenizer\n", + "model = og.Model(model_folder)\n", + "tokenizer = og.Tokenizer(model)\n", + "tokenizer_stream = tokenizer.create_stream()\n", + "\n", + "# Set the max length to something sensible by default,\n", + "# since otherwise it will be set to the entire context length\n", + "search_options = {}\n", + "search_options[\"max_length\"] = 200\n", + "\n", + "chat_template = \"<|start_header_id|>user<|end_header_id|>\\n{input}<|start_header_id|>assistant<|end_header_id|>\\n\"\n", + "\n", + "# Generate prompt (prompt template + input)\n", + "prompt = f\"{chat_template.format(input=text)}\"\n", + "\n", + "# Encode the prompt using the tokenizer\n", + "input_tokens = tokenizer.encode(prompt)\n", + "\n", + "# Create params and generator\n", + "params = og.GeneratorParams(model)\n", + "params.set_search_options(**search_options)\n", + "generator = og.Generator(model, params)\n", + "\n", + "# Append input tokens to the generator\n", + "generator.append_tokens(input_tokens)\n", + "\n", + "print(\"\")\n", + "print(\"Output: \", end=\"\", flush=True)\n", + "\n", + "token_times = []\n", + "\n", + "# Stream the output\n", + "while not generator.is_done():\n", + " start_time = time.time()\n", + " generator.generate_next_token()\n", + " end_time = time.time()\n", + " \n", + " # Record the time for this token generation\n", + " token_time = end_time - start_time\n", + " token_times.append(token_time)\n", + "\n", + " new_token = generator.get_next_tokens()[0]\n", + " print(tokenizer_stream.decode(new_token), end=\"\", flush=True)\n", + "\n", + "print()\n", + "\n", + "# Calculate and display timing statistics\n", + "if token_times:\n", + " total_tokens = len(token_times)\n", + " avg_time = sum(token_times) / total_tokens\n", + " \n", + " print(f\"Total tokens generated: {total_tokens}\")\n", + " print(f\"Average time per token: {avg_time:.4f} seconds\")\n", + " print(f\"Tokens per second: {total_tokens / sum(token_times):.2f}\")\n", + "\n", + "del generator\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/info.yml b/meta-llama-Llama-3.2-1B-Instruct/aitk/info.yml new file mode 100644 index 00000000..59e77800 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/info.yml @@ -0,0 +1,20 @@ +keywords: + aitk +arch: llama +recipes: + - file: "llama3_2_qnn_config.json" + device: npu + ep: QNNExecutionProvider + - file: "llama3_2_vitis_ai_config.json" + device: npu + ep: VitisAIExecutionProvider + - file: "llama3_2_ov_config.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "llama3_2_dml_config.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/meta-llama/Llama-3.2-1B-Instruct" + version: 1 diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_dml_config.json b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_dml_config.json new file mode 100644 index 00000000..6965e946 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_dml_config.json @@ -0,0 +1,46 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "meta-llama/Llama-3.2-1B-Instruct" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "passes": { + "q": { + "type": "AutoAWQQuantizer" + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4" + } + }, + "host": "host_system", + "target": "target_system", + "log_severity_level": 1, + "output_dir": "model/llama3_2", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_dml_config.json.config b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_dml_config.json.config new file mode 100644 index 00000000..5778ef75 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_dml_config.json.config @@ -0,0 +1,48 @@ +{ + "name": "Convert to DirectML", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "executeRuntimeFeatures": [ + "AutoAwq" + ], + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_ov_config.json b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_ov_config.json new file mode 100644 index 00000000..73cd8c82 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_ov_config.json @@ -0,0 +1,56 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "meta-llama/Llama-3.2-1B-Instruct" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + }, + "ov_quant_config": { + "weight_format": "int4", + "group_size": 128, + "dataset": "wikitext2", + "ratio": 1, + "sym": true, + "trust_remote_code": true, + "awq": true, + "scale_estimation": true, + + "backup_precision": "int8_sym" + } + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "static": false, + "reuse_cache": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "keep_ov_dynamic_dims": true, + "ov_version": "2025.1", + "reuse_cache": true + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/llama3_2" +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_ov_config.json.config b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_ov_config.json.config new file mode 100644 index 00000000..d594204f --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_ov_config.json.config @@ -0,0 +1,153 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "llama3/openvino/Llama-3.2-1B-Instruct_context_ov_dynamic_sym_bkp_int8_sym.json", + "isLLM": true, + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ] + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": { + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_qnn_config.json b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_qnn_config.json new file mode 100644 index 00000000..4a699670 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_qnn_config.json @@ -0,0 +1,132 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "meta-llama/Llama-3.2-1B-Instruct" + }, + "systems": { + "qnn_system": { + "type": "PythonEnvironment", + "python_environment_path": "/path/to/qnn/env/bin", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 4, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64 + }, + "cb": { + "type": "EPContextBinaryGenerator", + "provider_options": { + "htp_performance_mode": "burst", + "htp_graph_finalization_optimization_mode": "3", + "soc_model": "60" + }, + "session_options": { + "intra_op_num_threads": 2, + "inter_op_num_threads": 1 + }, + "weight_sharing": true + }, + "cp": { + "type": "ComposeOnnxModels" + } + }, + "target": "qnn_system", + "log_severity_level": 1, + "output_dir": "model/llama3_2", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_qnn_config.json.config b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_qnn_config.json.config new file mode 100644 index 00000000..032429d1 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_qnn_config.json.config @@ -0,0 +1,197 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "phi3_5/qnn_config.json", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isQNNLLM": true, + "isGPURequired": true, + "runtimeOverwrite": { + "autoGenerated": true, + "pyEnvPath": "systems.qnn_system.python_environment_path", + "executeEp": "CUDAExecutionProvider", + "evaluateUsedInExecute": true + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "pyEnvRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_vitis_ai_config.json b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_vitis_ai_config.json new file mode 100644 index 00000000..97a54b26 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_vitis_ai_config.json @@ -0,0 +1,134 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "meta-llama/Llama-3.2-1B-Instruct" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 1, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "int4", + "quant_type": "QuaRot" + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64, + "group_session_options": { + "log_id": "onnxruntime-genai", + "provider_options": [ + { + "VitisAI": {} + } + ], + "graph_optimization_level": "ORT_ENABLE_ALL" + } + } + }, + "target": "local_system", + "log_severity_level": 1, + "output_dir": "model/llama3_2", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_vitis_ai_config.json.config b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_vitis_ai_config.json.config new file mode 100644 index 00000000..f6624c83 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/llama3_2_vitis_ai_config.json.config @@ -0,0 +1,191 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "phi3_5/qdq_config_vitis_ai.json", + "isLLM": true, + "evalRuntime": "AMDNPU", + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "runtimeOverwrite": { + "executeEp": "CUDAExecutionProvider" + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/model_project.config b/meta-llama-Llama-3.2-1B-Instruct/aitk/model_project.config new file mode 100644 index 00000000..f5a73299 --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/model_project.config @@ -0,0 +1,24 @@ +{ + "workflows": [ + { + "file": "llama3_2_qnn_config.json", + "templateName": "llama3_2_qnn_config" + }, + { + "file": "llama3_2_vitis_ai_config.json", + "templateName": "llama3_2_vitis_ai_config" + }, + { + "file": "llama3_2_ov_config.json", + "templateName": "llama3_2_ov_config" + }, + { + "file": "llama3_2_dml_config.json", + "templateName": "llama3_2_dml_config" + } + ], + "modelInfo": { + "id": "huggingface/meta-llama/Llama-3.2-1B-Instruct", + "version": 1 + } +} diff --git a/meta-llama-Llama-3.2-1B-Instruct/aitk/requirements.txt b/meta-llama-Llama-3.2-1B-Instruct/aitk/requirements.txt new file mode 100644 index 00000000..03275c3e --- /dev/null +++ b/meta-llama-Llama-3.2-1B-Instruct/aitk/requirements.txt @@ -0,0 +1,2 @@ +datasets +optimum diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/.gitignore b/microsoft-Phi-3.5-mini-instruct/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/README.md b/microsoft-Phi-3.5-mini-instruct/aitk/README.md new file mode 100644 index 00000000..b8df9630 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/README.md @@ -0,0 +1,160 @@ +# Phi-3.5 Model Optimization + +This repository demonstrates the optimization of the [Microsoft Phi-3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into three main workflows: + +- QDQ for AMD NPU +- PTQ + AOT for QNN NPU + + This process extends the QDQ flow and compiling specifically for **Qualcomm NPUs** +- OpenVINO for Intel NPU + + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` + +## **QDQ Model with 4-bit Weights & 16-bit Activations** + +This workflow produces an ONNX QDQ model that is agnostic to the target hardware and accelerator, making it suitable for general inference. + +### **Optimization Process** + +The model is optimized using **weight-only quantization** and **activation quantization** for efficient deployment. The process includes: + +1. **Weight Rotation ([QuaRot](https://arxiv.org/abs/2404.00456))** + - Reduces outliers from weights and hidden states to enhance quantization efficiency. + +2. **4-bit Per-Channel Symmetric Quantization ([GPTQ](https://arxiv.org/abs/2210.17323))** + - Reduces transformer layer size while preserving accuracy. + +3. **ONNX Graph Capture** + - Exports the model to ONNX for further optimization. + +4. **4-bit Block-wise Quantization** + - Applies weight-only quantization to the **embedding layer** and **language modeling head**. + +5. **16-bit Activation Quantization** + - Uses 16-bit activations to balance precision and efficiency. + +The final output is a **QDQ model** with **4-bit weights** and **16-bit activations**. This model also leverages [GroupQueryAttention (GQA)](https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#com.microsoft.GroupQueryAttention) for efficient long-context processing and long-sequence generation. + +### **Handling Dynamic and Static Input Shapes** + +NPUs require **precompiled graphs**, meaning the model must use **static input shapes**. However, **text generation** involves two distinct processing stages: + +- **Prefill (Prompt Processing)**: Processes multiple tokens simultaneously. +- **Token Generation (Iteration)**: Processes one token at a time. + +To support both efficiently, we create **two model instances**: +1. **Prefill model**: Optimized for batch processing. +2. **Token generation model**: Optimized for one-token-at-a-time inference. + +## **PTQ + AOT Compilation for Qualcomm NPUs using QNN EP** + +This process extends the [**QDQ Model with 4-bit Weights & 16-bit Activations**](#qdq-model-with-4-bit-weights--16-bit-activations) by compiling it specifically for **Qualcomm NPUs** using the **QNN Execution Provider**. + +### **Resource Optimization Strategy** + +To maximize efficiency while supporting dynamic input handling: + +- **Embedding Layer & Language Model Head** → Executed on CPU (handles dynamic input). +- **Transformer Layers** → Executed on NPU (requires static input shapes). +- **Weight Sharing** → Prefill & token generation models reuse weights to minimize memory usage. + +> ⚠️ **Note:** GQA is an ONNX Runtime *contrib operator* and must be executed on the CPU. The model graph is partitioned into **CPU (GQA nodes)** and **NPU (other nodes)** for execution. + +### **Compilation for Qualcomm NPU Deployment** + +Once optimized, the model is compiled for Qualcomm NPUs using **ONNX Runtime QNNExecutionProvider**. The steps include: + +1. **Split the Quantized Model** → Divide into three parts: + - **Embedding Layer** + - **Transformer Layers** + - **Language Model Head** +2. **Set Static Input Shapes**: + - **(1, 64)** for prefill (batch size, sequence length). + - **(1, 1)** for token generation. +3. **Compile using QNNExecutionProvider**: + - Leverages **weight sharing** across the prefill and token generation models. + +### **Usage** + +This workflow is configured using the `qnn_config.json` file. It contains all of the quantization and compilation steps. It requires two separate Python environments described below. + +#### A workable version + +- python=3.10 +- CUDA=12.1 +- cudnn=9.2.0 + +#### Quantization Python Environment Setup + +Quantization is resource-intensive and requires GPU acceleration. In an [x64 Python environment with Olive installed](https://github.com/microsoft/Olive/blob/main/examples/README.md#important), install the required packages: + +```bash +# Install common dependencies +pip install -r requirements.txt + +# Install ONNX Runtime GPU packages +pip install "onnxruntime-gpu>=1.21.0" "onnxruntime-genai-cuda>=0.6.0" + +# AutoGPTQ: Install from source (stable package may be slow for weight packing) +# Disable CUDA extension build (not required) +# Linux +export BUILD_CUDA_EXT=0 +# Windows +# set BUILD_CUDA_EXT=0 + +# Install AutoGPTQ from source +pip install --no-build-isolation git+https://github.com/PanQiWei/AutoGPTQ.git +``` + +> ⚠️ Only set up the environment and install the packages. Do not run the `olive run` command at this point. + +#### AOT Compilation Python Environment Setup + +Model compilation using QNN Execution Provider requires a Python environment with onnxruntime-qnn installed. In a separate Python environment with Olive installed, install the required packages: + +```bash +# Install ONNX Runtime QNN +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +``` + +Replace `/path/to/qnn/env/bin` in `qnn_config.json` with the path to the directory containing your QNN environment's Python executable. This path can be found by running the following command in the environment: + +```bash +# Linux +command -v python +# Windows +# where python +``` + +This command will return the path to the Python executable. Set the parent directory of the executable as the `/path/to/qnn/env/bin` in the config file. + +#### **Run the Quantization + Compilation Config** + +Activate the **Quantization Python Environment** and run the workflow: + +```bash +olive run --config qnn_config.json +``` + +Olive will run the AOT compilation step in the **AOT Compilation Python Environment** specified in the config file using a subprocess. All other steps will run in the **Quantization Python Environment** natively. + +✅ Optimized model saved in: `./model` + +> ⚠️ If optimization fails due to out of memory, please remove `calibration_providers` in config file. + +> ⚠️ If optimization fails during context binary generation, rerun the command. The process will resume from the last completed step. + +### **Inference** + +The optimized model can be used for inference using ONNX Runtime QNNExecutionProvider and ONNX Runtime GenAI. **Inference must be run on a Windows Copilot+ PC with a Qualcomm NPU.** + +#### **Install Required Packages (arm64 Python)** +```bash +pip install -r https://raw.githubusercontent.com/microsoft/onnxruntime/refs/heads/main/requirements.txt +pip install -U --pre --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple onnxruntime-qnn --no-deps +pip install "onnxruntime-genai>=0.7.0rc2" +``` + +#### **Run Console-Based Chat Interface** +Execute the provided `inference_sample.ipynb` notebook. + +> ⚠️ If got 6033 error, replace `genai_config.json` in `./model` folder diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/_copy.json.config b/microsoft-Phi-3.5-mini-instruct/aitk/_copy.json.config new file mode 100644 index 00000000..cfda4ffc --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/_copy.json.config @@ -0,0 +1,140 @@ +{ + "copies": [ + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/model_project.config", + "dst": "model_project.config", + "replacements": [ + { + "find": "deepseek_qnn_config", + "replace": "phi3_5_qnn_config" + }, + { + "find": "deepseek_vitis_ai_config", + "replace": "phi3_5_vitis_ai_config" + }, + { + "find": "deepseek_ov_config", + "replace": "phi3_5_ov_config" + }, + { + "find": "deepseek_dml_config", + "replace": "phi3_5_dml_config" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_qnn_config.json", + "dst": "phi3_5_qnn_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "microsoft/Phi-3.5-mini-instruct" + }, + { + "find": "model/deepseek", + "replace": "model/phi3_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_qnn_config.json.config", + "dst": "phi3_5_qnn_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_vitis_ai_config.json", + "dst": "phi3_5_vitis_ai_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "microsoft/Phi-3.5-mini-instruct" + }, + { + "find": "model/deepseek", + "replace": "model/phi3_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_vitis_ai_config.json.config", + "dst": "phi3_5_vitis_ai_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json", + "dst": "phi3_5_ov_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "microsoft/Phi-3.5-mini-instruct" + }, + { + "find": "model/deepseek", + "replace": "model/phi3_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json.config", + "dst": "phi3_5_ov_config.json.config", + "replacements": [ + { + "find": "deepseek/openvino/DeepSeek-R1-Distill-Qwen-1.5B_context_ov_dynamic_sym_gs128_bkp_int8_sym_r1.json", + "replace": "phi3_5/openvino/Phi-3.5-mini-instruct_context_ov_dynamic_sym_gs128_bkp_int8_sym.json" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_dml_config.json", + "dst": "phi3_5_dml_config.json", + "replacements": [ + { + "find": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", + "replace": "microsoft/Phi-3.5-mini-instruct" + }, + { + "find": "model/deepseek", + "replace": "model/phi3_5" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_dml_config.json.config", + "dst": "phi3_5_dml_config.json.config", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/README.md", + "dst": "README.md", + "replacements": [ + { + "find": "# DeepSeek-R1-Distill-Qwen-1.5B Model Optimization", + "replace": "# Phi-3.5 Model Optimization" + }, + { + "find": "[DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)", + "replace": "[Microsoft Phi-3.5 Mini Instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)" + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/requirements.txt", + "dst": "requirements.txt", + "replacements": [ + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/inference_sample.ipynb", + "dst": "inference_sample.ipynb", + "replacements": [ + { + "find": "<|User|>{input}<|Assistant|>", + "replace": "<|user|>\\\\n{input} <|end|>\\\\n<|assistant|>" + } + ] + } + ] +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/inference_model.json b/microsoft-Phi-3.5-mini-instruct/aitk/inference_model.json new file mode 100644 index 00000000..319c2d42 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/inference_model.json @@ -0,0 +1,31 @@ +{ + "Name": "Phi-3.5-mini-instruct-onnx", + "PromptTemplate": { + "assistant": "{Content}", + "prompt":"<|user|>\n{Content} <|end|>\n<|assistant|>" + }, + "ParameterSchema": { + "enabled": [ + { + "name": "max_tokens", + "default": 512 + }, + { + "name": "temperature", + "default": 0.6 + }, + { + "name": "top_p", + "default": 0.9 + }, + { + "name": "top_k", + "default": 5 + }, + { + "name": "random_seed", + "default": 57894 + } + ] + } +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/inference_sample.ipynb b/microsoft-Phi-3.5-mini-instruct/aitk/inference_sample.ipynb new file mode 100644 index 00000000..a47cdc58 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/inference_sample.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = 'Who is Isaac Newton?'\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "model_folder = \"./model\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime_genai as og\n", + "import json\n", + "import time\n", + "from pathlib import Path\n", + "\n", + "def get_session_options(obj):\n", + " if type(obj) is dict:\n", + " for k, v in obj.items():\n", + " if k == \"session_options\":\n", + " yield v\n", + " else:\n", + " for x in get_session_options(v):\n", + " yield x\n", + " elif type(obj) is list:\n", + " for v in obj:\n", + " for x in get_session_options(v):\n", + " yield x\n", + "\n", + "\n", + "def remove_provider_options(model_path):\n", + " genai_config_path = Path(model_path) / \"genai_config.json\"\n", + " data = json.loads(genai_config_path.read_text())\n", + " for session_option in get_session_options(data):\n", + " if 'provider_options' in session_option:\n", + " session_option['provider_options'] = [{k: dict() for k in opts.keys()} for opts in session_option['provider_options']]\n", + "\n", + " json.dump(data, genai_config_path.open(\"w\"), indent=4)\n", + "\n", + "if ExecutionProvider == \"QNNExecutionProvider\":\n", + " remove_provider_options(model_folder)\n", + "\n", + "# Load the base model and tokenizer\n", + "model = og.Model(model_folder)\n", + "tokenizer = og.Tokenizer(model)\n", + "tokenizer_stream = tokenizer.create_stream()\n", + "\n", + "# Set the max length to something sensible by default,\n", + "# since otherwise it will be set to the entire context length\n", + "search_options = {}\n", + "search_options[\"max_length\"] = 200\n", + "\n", + "chat_template = \"<|user|>\\n{input} <|end|>\\n<|assistant|>\"\n", + "\n", + "# Generate prompt (prompt template + input)\n", + "prompt = f\"{chat_template.format(input=text)}\"\n", + "\n", + "# Encode the prompt using the tokenizer\n", + "input_tokens = tokenizer.encode(prompt)\n", + "\n", + "# Create params and generator\n", + "params = og.GeneratorParams(model)\n", + "params.set_search_options(**search_options)\n", + "generator = og.Generator(model, params)\n", + "\n", + "# Append input tokens to the generator\n", + "generator.append_tokens(input_tokens)\n", + "\n", + "print(\"\")\n", + "print(\"Output: \", end=\"\", flush=True)\n", + "\n", + "token_times = []\n", + "\n", + "# Stream the output\n", + "while not generator.is_done():\n", + " start_time = time.time()\n", + " generator.generate_next_token()\n", + " end_time = time.time()\n", + " \n", + " # Record the time for this token generation\n", + " token_time = end_time - start_time\n", + " token_times.append(token_time)\n", + "\n", + " new_token = generator.get_next_tokens()[0]\n", + " print(tokenizer_stream.decode(new_token), end=\"\", flush=True)\n", + "\n", + "print()\n", + "\n", + "# Calculate and display timing statistics\n", + "if token_times:\n", + " total_tokens = len(token_times)\n", + " avg_time = sum(token_times) / total_tokens\n", + " \n", + " print(f\"Total tokens generated: {total_tokens}\")\n", + " print(f\"Average time per token: {avg_time:.4f} seconds\")\n", + " print(f\"Tokens per second: {total_tokens / sum(token_times):.2f}\")\n", + "\n", + "del generator\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/info.yml b/microsoft-Phi-3.5-mini-instruct/aitk/info.yml new file mode 100644 index 00000000..d0332445 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/info.yml @@ -0,0 +1,20 @@ +keywords: + aitk +arch: phi +recipes: + - file: "phi3_5_qnn_config.json" + device: npu + ep: QNNExecutionProvider + - file: "phi3_5_vitis_ai_config.json" + device: npu + ep: VitisAIExecutionProvider + - file: "phi3_5_ov_config.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "phi3_5_dml_config.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/microsoft/Phi-3.5-mini-instruct" + version: 1 diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/model_project.config b/microsoft-Phi-3.5-mini-instruct/aitk/model_project.config new file mode 100644 index 00000000..a5f764fe --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/model_project.config @@ -0,0 +1,24 @@ +{ + "workflows": [ + { + "file": "phi3_5_qnn_config.json", + "templateName": "phi3_5_qnn_config" + }, + { + "file": "phi3_5_vitis_ai_config.json", + "templateName": "phi3_5_vitis_ai_config" + }, + { + "file": "phi3_5_ov_config.json", + "templateName": "phi3_5_ov_config" + }, + { + "file": "phi3_5_dml_config.json", + "templateName": "phi3_5_dml_config" + } + ], + "modelInfo": { + "id": "huggingface/microsoft/Phi-3.5-mini-instruct", + "version": 1 + } +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_dml_config.json b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_dml_config.json new file mode 100644 index 00000000..9e401bf4 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_dml_config.json @@ -0,0 +1,46 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/Phi-3.5-mini-instruct" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device":"gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "passes": { + "q": { + "type": "AutoAWQQuantizer" + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4" + } + }, + "host": "host_system", + "target": "target_system", + "log_severity_level": 1, + "output_dir": "model/phi3_5", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_dml_config.json.config b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_dml_config.json.config new file mode 100644 index 00000000..5778ef75 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_dml_config.json.config @@ -0,0 +1,48 @@ +{ + "name": "Convert to DirectML", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "executeRuntimeFeatures": [ + "AutoAwq" + ], + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_ov_config.json b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_ov_config.json new file mode 100644 index 00000000..904638ab --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_ov_config.json @@ -0,0 +1,56 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/Phi-3.5-mini-instruct" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + }, + "ov_quant_config": { + "weight_format": "int4", + "group_size": 128, + "dataset": "wikitext2", + "ratio": 1, + "sym": true, + "trust_remote_code": true, + "awq": false, + "scale_estimation": false, + "sensitivity_metric": "weight_quantization_error", + "backup_precision": "int8_asym" + } + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "static": false, + "reuse_cache": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "keep_ov_dynamic_dims": true, + "ov_version": "2025.1", + "reuse_cache": true + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/phi3_5" +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_ov_config.json.config b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_ov_config.json.config new file mode 100644 index 00000000..768b5505 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_ov_config.json.config @@ -0,0 +1,153 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "phi3_5/openvino/Phi-3.5-mini-instruct_context_ov_dynamic_sym_gs128_bkp_int8_sym.json", + "isLLM": true, + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ] + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": { + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_qnn_config.json b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_qnn_config.json new file mode 100644 index 00000000..1e8648d4 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_qnn_config.json @@ -0,0 +1,132 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/Phi-3.5-mini-instruct" + }, + "systems": { + "qnn_system": { + "type": "PythonEnvironment", + "python_environment_path": "/path/to/qnn/env/bin", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 4, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64 + }, + "cb": { + "type": "EPContextBinaryGenerator", + "provider_options": { + "htp_performance_mode": "burst", + "htp_graph_finalization_optimization_mode": "3", + "soc_model": "60" + }, + "session_options": { + "intra_op_num_threads": 2, + "inter_op_num_threads": 1 + }, + "weight_sharing": true + }, + "cp": { + "type": "ComposeOnnxModels" + } + }, + "target": "qnn_system", + "log_severity_level": 1, + "output_dir": "model/phi3_5", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_qnn_config.json.config b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_qnn_config.json.config new file mode 100644 index 00000000..032429d1 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_qnn_config.json.config @@ -0,0 +1,197 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "phi3_5/qnn_config.json", + "isLLM": true, + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isQNNLLM": true, + "isGPURequired": true, + "runtimeOverwrite": { + "autoGenerated": true, + "pyEnvPath": "systems.qnn_system.python_environment_path", + "executeEp": "CUDAExecutionProvider", + "evaluateUsedInExecute": true + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "pyEnvRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_vitis_ai_config.json b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_vitis_ai_config.json new file mode 100644 index 00000000..889e4d82 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_vitis_ai_config.json @@ -0,0 +1,134 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/Phi-3.5-mini-instruct" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "wikitext2_train", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "wikitext", + "subset": "wikitext-2-raw-v1", + "split": "train" + }, + "pre_process_data_config": { + "strategy": "line-by-line", + "add_special_tokens": false, + "max_samples": 128, + "max_seq_len": 512 + } + } + ], + "passes": { + "q": { + "type": "QuaRot" + }, + "g": { + "type": "GptqQuantizer", + "sym": true, + "group_size": -1 + }, + "cs": { + "type": "CaptureSplitInfo", + "num_splits": 1, + "unique_embeds_lm_head_splits": true + }, + "mb": { + "type": "ModelBuilder", + "precision": "int4", + "int4_block_size": 32, + "int4_accuracy_level": 4, + "int4_op_types_to_quantize": [ + "MatMul", + "Gather" + ], + "save_as_external_data": true + }, + "mq": { + "type": "MatMulNBitsToQDQ", + "use_int4": true, + "add_zero_point": true, + "nodes_to_exclude": [ + "/lm_head/MatMul_Q4" + ], + "save_as_external_data": true + }, + "gs": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "RemoveRopeMultiCache" + }, + { + "surgeon": "AttentionMaskToSequenceLengths" + }, + { + "surgeon": "SimplifiedLayerNormToL2Norm" + } + ], + "save_as_external_data": true + }, + "sq": { + "type": "OnnxStaticQuantization", + "data_config": "wikitext2_train", + "activation_type": "uint16", + "precision": "uint8", + "calibration_providers": [ + "CUDAExecutionProvider" + ], + "quant_preprocess": true, + "op_types_to_exclude": [ + "GatherBlockQuantized", + "GroupQueryAttention", + "MatMulNBits" + ], + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "int4", + "quant_type": "QuaRot" + }, + "sp": { + "type": "SplitModel" + }, + "st": { + "type": "StaticLLM", + "batch_size": 1, + "context_length": 64, + "group_session_options": { + "log_id": "onnxruntime-genai", + "provider_options": [ + { + "VitisAI": {} + } + ], + "graph_optimization_level": "ORT_ENABLE_ALL" + } + } + }, + "target": "local_system", + "log_severity_level": 1, + "output_dir": "model/phi3_5", + "cache_dir": "cache", + "no_artifacts": true, + "evaluate_input_model": false +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_vitis_ai_config.json.config b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_vitis_ai_config.json.config new file mode 100644 index 00000000..f6624c83 --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/phi3_5_vitis_ai_config.json.config @@ -0,0 +1,191 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "phi3_5/qdq_config_vitis_ai.json", + "isLLM": true, + "evalRuntime": "AMDNPU", + "debugInfo": { + "autoGenerated": true, + "useModelBuilder": "mb" + }, + "isGPURequired": true, + "runtimeOverwrite": { + "executeEp": "CUDAExecutionProvider" + }, + "executeRuntimeFeatures": [ + "AutoGptq" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.sq.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.sq.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "wikitext" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Subset", + "tags": [ + "QuantizationDatasetSubset", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": { + "path": "data_configs[0].load_dataset_config.subset", + "values": [ + "wikitext-103-raw-v1", + "wikitext-103-v1", + "wikitext-2-raw-v1", + "wikitext-2-v1" + ], + "template": "QuantizationDatasetSubset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.mb", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/microsoft-Phi-3.5-mini-instruct/aitk/requirements.txt b/microsoft-Phi-3.5-mini-instruct/aitk/requirements.txt new file mode 100644 index 00000000..03275c3e --- /dev/null +++ b/microsoft-Phi-3.5-mini-instruct/aitk/requirements.txt @@ -0,0 +1,2 @@ +datasets +optimum diff --git a/microsoft-Phi-4-mini-reasoning/aitk/.gitignore b/microsoft-Phi-4-mini-reasoning/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/microsoft-Phi-4-mini-reasoning/aitk/README.md b/microsoft-Phi-4-mini-reasoning/aitk/README.md new file mode 100644 index 00000000..52c59381 --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/README.md @@ -0,0 +1,6 @@ +# Phi-4 Model Optimization + +This repository demonstrates the optimization of the [Microsoft Phi-4 Mini Reasoning](https://huggingface.co/microsoft/Phi-4-mini-reasoning) model using **post-training quantization (PTQ)** techniques. The optimization process is divided into these main workflows: + +- OpenVINO for Intel NPU + + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` diff --git a/microsoft-Phi-4-mini-reasoning/aitk/_copy.json.config b/microsoft-Phi-4-mini-reasoning/aitk/_copy.json.config new file mode 100644 index 00000000..1b769d18 --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/_copy.json.config @@ -0,0 +1,42 @@ +{ + "copies": [ + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/deepseek_ov_config.json.config", + "dst": "phi4_ov_config.json.config", + "replacements": [ + { + "find": "deepseek/openvino/DeepSeek-R1-Distill-Qwen-1.5B_context_ov_dynamic_sym_gs128_bkp_int8_sym_r1.json", + "replace": "phi4/openvino/phi_4_mini_reasoning/Phi-4-mini-reasoning_context_ov_dynamic_sym_gs128_bkp_int8_sym.json" + }, + { + "find": "\"addCpu\": false,", + "replace": "\"executeRuntimeFeatures\": [\"Nightly\"],\"addCpu\": false," + } + ] + }, + { + "src": "../../../deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B/1/inference_sample.ipynb", + "dst": "inference_sample.ipynb", + "replacements": [ + { + "find": "<|User|>{input}<|Assistant|>", + "replace": "<|user|>\\\\n{input} <|end|>\\\\n<|assistant|>" + }, + { + "find": "ExecutionProvider=\\\"QNNExecutionProvider\\\"", + "replace": "ExecutionProvider=\\\"OpenVINOExecutionProvider\\\"" + } + ] + }, + { + "src": "../../Phi-3.5-mini-instruct/1/inference_model.json", + "dst": "inference_model.json", + "replacements": [ + { + "find": "Phi-3.5-mini-instruct-onnx", + "replace": "Phi-4-mini-reasoning-onnx" + } + ] + } + ] +} diff --git a/microsoft-Phi-4-mini-reasoning/aitk/inference_model.json b/microsoft-Phi-4-mini-reasoning/aitk/inference_model.json new file mode 100644 index 00000000..c86373cf --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/inference_model.json @@ -0,0 +1,31 @@ +{ + "Name": "Phi-4-mini-reasoning-onnx", + "PromptTemplate": { + "assistant": "{Content}", + "prompt":"<|user|>\n{Content} <|end|>\n<|assistant|>" + }, + "ParameterSchema": { + "enabled": [ + { + "name": "max_tokens", + "default": 512 + }, + { + "name": "temperature", + "default": 0.6 + }, + { + "name": "top_p", + "default": 0.9 + }, + { + "name": "top_k", + "default": 5 + }, + { + "name": "random_seed", + "default": 57894 + } + ] + } +} diff --git a/microsoft-Phi-4-mini-reasoning/aitk/inference_sample.ipynb b/microsoft-Phi-4-mini-reasoning/aitk/inference_sample.ipynb new file mode 100644 index 00000000..70e1b959 --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/inference_sample.ipynb @@ -0,0 +1,131 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = 'Who is Isaac Newton?'\n", + "ExecutionProvider=\"OpenVINOExecutionProvider\"\n", + "model_folder = \"./model\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime_genai as og\n", + "import json\n", + "import time\n", + "from pathlib import Path\n", + "\n", + "def get_session_options(obj):\n", + " if type(obj) is dict:\n", + " for k, v in obj.items():\n", + " if k == \"session_options\":\n", + " yield v\n", + " else:\n", + " for x in get_session_options(v):\n", + " yield x\n", + " elif type(obj) is list:\n", + " for v in obj:\n", + " for x in get_session_options(v):\n", + " yield x\n", + "\n", + "\n", + "def remove_provider_options(model_path):\n", + " genai_config_path = Path(model_path) / \"genai_config.json\"\n", + " data = json.loads(genai_config_path.read_text())\n", + " for session_option in get_session_options(data):\n", + " if 'provider_options' in session_option:\n", + " session_option['provider_options'] = [{k: dict() for k in opts.keys()} for opts in session_option['provider_options']]\n", + "\n", + " json.dump(data, genai_config_path.open(\"w\"), indent=4)\n", + "\n", + "if ExecutionProvider == \"QNNExecutionProvider\":\n", + " remove_provider_options(model_folder)\n", + "\n", + "# Load the base model and tokenizer\n", + "model = og.Model(model_folder)\n", + "tokenizer = og.Tokenizer(model)\n", + "tokenizer_stream = tokenizer.create_stream()\n", + "\n", + "# Set the max length to something sensible by default,\n", + "# since otherwise it will be set to the entire context length\n", + "search_options = {}\n", + "search_options[\"max_length\"] = 200\n", + "\n", + "chat_template = \"<|user|>\\n{input} <|end|>\\n<|assistant|>\"\n", + "\n", + "# Generate prompt (prompt template + input)\n", + "prompt = f\"{chat_template.format(input=text)}\"\n", + "\n", + "# Encode the prompt using the tokenizer\n", + "input_tokens = tokenizer.encode(prompt)\n", + "\n", + "# Create params and generator\n", + "params = og.GeneratorParams(model)\n", + "params.set_search_options(**search_options)\n", + "generator = og.Generator(model, params)\n", + "\n", + "# Append input tokens to the generator\n", + "generator.append_tokens(input_tokens)\n", + "\n", + "print(\"\")\n", + "print(\"Output: \", end=\"\", flush=True)\n", + "\n", + "token_times = []\n", + "\n", + "# Stream the output\n", + "while not generator.is_done():\n", + " start_time = time.time()\n", + " generator.generate_next_token()\n", + " end_time = time.time()\n", + " \n", + " # Record the time for this token generation\n", + " token_time = end_time - start_time\n", + " token_times.append(token_time)\n", + "\n", + " new_token = generator.get_next_tokens()[0]\n", + " print(tokenizer_stream.decode(new_token), end=\"\", flush=True)\n", + "\n", + "print()\n", + "\n", + "# Calculate and display timing statistics\n", + "if token_times:\n", + " total_tokens = len(token_times)\n", + " avg_time = sum(token_times) / total_tokens\n", + " \n", + " print(f\"Total tokens generated: {total_tokens}\")\n", + " print(f\"Average time per token: {avg_time:.4f} seconds\")\n", + " print(f\"Tokens per second: {total_tokens / sum(token_times):.2f}\")\n", + "\n", + "del generator\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/microsoft-Phi-4-mini-reasoning/aitk/info.yml b/microsoft-Phi-4-mini-reasoning/aitk/info.yml new file mode 100644 index 00000000..07948e46 --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/info.yml @@ -0,0 +1,11 @@ +keywords: + aitk +arch: phi +recipes: + - file: "phi4_ov_config.json" + device: npu + ep: OpenVINOExecutionProvider +aitk: + modelInfo: + id: "huggingface/microsoft/Phi-4-mini-reasoning" + version: 1 diff --git a/microsoft-Phi-4-mini-reasoning/aitk/model_project.config b/microsoft-Phi-4-mini-reasoning/aitk/model_project.config new file mode 100644 index 00000000..13c6e9ab --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/model_project.config @@ -0,0 +1,12 @@ +{ + "workflows": [ + { + "file": "phi4_ov_config.json", + "templateName": "phi4_ov_config" + } + ], + "modelInfo": { + "id": "huggingface/microsoft/Phi-4-mini-reasoning", + "version": 1 + } +} diff --git a/microsoft-Phi-4-mini-reasoning/aitk/phi4_ov_config.json b/microsoft-Phi-4-mini-reasoning/aitk/phi4_ov_config.json new file mode 100644 index 00000000..578fc1db --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/phi4_ov_config.json @@ -0,0 +1,55 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/Phi-4-mini-reasoning" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + }, + "ov_quant_config": { + "weight_format": "int4", + "group_size": 128, + "dataset": "wikitext2", + "ratio": 1, + "awq": true, + "scale_estimation": true, + "sym": true, + "trust_remote_code": true, + "backup_precision": "int8_sym" + } + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "static": false, + "reuse_cache": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "keep_ov_dynamic_dims": true, + "ov_version": "2025.1", + "reuse_cache": true + } + }, + "search_strategy": false, + "host": "local_system", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/Phi-4-mini-reasoning_context_ov_dynamic_sym_gs128_bkp_int8_sym", + "target": "local_system" +} diff --git a/microsoft-Phi-4-mini-reasoning/aitk/phi4_ov_config.json.config b/microsoft-Phi-4-mini-reasoning/aitk/phi4_ov_config.json.config new file mode 100644 index 00000000..0b15f17c --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/phi4_ov_config.json.config @@ -0,0 +1,156 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "phi4/openvino/phi_4_mini_reasoning/Phi-4-mini-reasoning_context_ov_dynamic_sym_gs128_bkp_int8_sym.json", + "isLLM": true, + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "executeRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ] + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": { + "path": "passes.optimum_convert.ov_quant_config.dataset", + "values": [ + "wikitext2" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/microsoft-Phi-4-mini-reasoning/aitk/requirements.txt b/microsoft-Phi-4-mini-reasoning/aitk/requirements.txt new file mode 100644 index 00000000..4a41a3ef --- /dev/null +++ b/microsoft-Phi-4-mini-reasoning/aitk/requirements.txt @@ -0,0 +1 @@ +olive-ai diff --git a/microsoft-resnet-50/aitk/.gitignore b/microsoft-resnet-50/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/microsoft-resnet-50/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/microsoft-resnet-50/aitk/README.md b/microsoft-resnet-50/aitk/README.md new file mode 100644 index 00000000..d4d440e6 --- /dev/null +++ b/microsoft-resnet-50/aitk/README.md @@ -0,0 +1,21 @@ +# ResNet optimization + +This folder contains examples of ResNet optimization using different workflows. + +- QDQ for Qualcomm NPU / AMD NPU +- OpenVINO for Intel NPU + +## QDQ for Qualcomm NPU / AMD NPU + +This workflow performs ResNet optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +## Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + +| Activation Type  | Weight Type  | Size  | Accuracy  | Latency (avg)  | +| --------------------- | ----------------- | ---------- | -------------- | ------------------- | +| float32 | float32 | 97.3 MB | - | - | +| QUInt16 | QUInt8 | 24.5MB | 0.78515625 | 2.53724 ms | diff --git a/microsoft-resnet-50/aitk/_copy.json.config b/microsoft-resnet-50/aitk/_copy.json.config new file mode 100644 index 00000000..953a59db --- /dev/null +++ b/microsoft-resnet-50/aitk/_copy.json.config @@ -0,0 +1,28 @@ +{ + "copies": [ + { + "src": "resnet_qdq_amd.json.config", + "dst": "resnet_qdq_qnn.json.config", + "replacements": [ + { + "find": "resnet/resnet_ptq_qdq_vitis_ai.json", + "replace": "resnet/resnet_ptq_qdq.json" + }, + { + "find": "Convert to AMD NPU", + "replace": "Convert to Qualcomm NPU" + } + ] + }, + { + "src": "resnet_trtrtx_inference_sample.ipynb", + "dst": "resnet_dml_inference_sample.ipynb", + "replacements": [ + { + "find": "NvTensorRTRTXExecutionProvider", + "replace": "DmlExecutionProvider" + } + ] + } + ] +} \ No newline at end of file diff --git a/microsoft-resnet-50/aitk/imagenet.py b/microsoft-resnet-50/aitk/imagenet.py new file mode 100644 index 00000000..41aa142e --- /dev/null +++ b/microsoft-resnet-50/aitk/imagenet.py @@ -0,0 +1,105 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- +from logging import getLogger +from pathlib import Path + +import numpy as np +import torchvision.transforms as transforms +import transformers +from torch import from_numpy, permute +from torch.utils.data import Dataset + +from olive.data.registry import Registry + +logger = getLogger(__name__) + +def get_imagenet_label_map(): + import json + cache_file = Path(f"./cache/data/imagenet_class_index.json") + if not cache_file.exists(): + import requests + imagenet_class_index_url = ( + "https://raw.githubusercontent.com/pytorch/vision/main/gallery/assets/imagenet_class_index.json" + ) + response = requests.get(imagenet_class_index_url) + response.raise_for_status() # Ensure the request was successful + content = response.json() + cache_file.parent.resolve().mkdir(parents=True, exist_ok=True) + with open(cache_file, "w") as f: + json.dump(content, f) + else: + with open(cache_file) as f: + content = json.loads(f.read()) + + return {v[0]: int(k) for k, v in content.items()} + +def adapt_label_for_mini_imagenet(labels: list, label_names: list): + label_map = get_imagenet_label_map() + return [label_map[label_names[x]] for x in labels] + +class ImagenetDataset(Dataset): + def __init__(self, data): + self.images = from_numpy(data["images"]) + self.labels = from_numpy(data["labels"]) + + def __len__(self): + return min(len(self.images), len(self.labels)) + + def __getitem__(self, idx): + return {"pixel_values": self.images[idx]}, self.labels[idx] + + +@Registry.register_post_process() +def dataset_post_process(output): + return ( + output.logits.argmax(axis=1) + if isinstance(output, transformers.modeling_outputs.ModelOutput) + else output.argmax(axis=1) + ) + + +from transformers import AutoImageProcessor +processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50", use_fast=True) + +@Registry.register_pre_process() +def dataset_pre_process(output_data, **kwargs): + shuffle = kwargs.get("shuffle", True) + if shuffle: + seed = kwargs.get("seed", 42) + output_data = output_data.shuffle(seed=seed) + cache_key = kwargs.get("cache_key") + size = kwargs.get("size", 256) + transpose = kwargs.get("transpose", False) + cache_file = None + if cache_key: + suffix = "nhwc" if transpose else "nchw" + cache_file = Path(f"./cache/data/{cache_key}_{output_data.info.dataset_name}_{size}_{suffix}.npz") + if cache_file.exists(): + with np.load(Path(cache_file)) as data: + return ImagenetDataset(data) + + labels = [] + images = [] + for i, sample in enumerate(output_data): + if i >= size: + break + image = sample["image"] + label = sample["label"] + image = image.convert("RGB") + image = processor(image)["pixel_values"][0] + if transpose: + image = permute(image, (1, 2, 0)) + images.append(image) + labels.append(label) + + if(output_data.info.dataset_name == "mini-imagenet"): + labels = adapt_label_for_mini_imagenet(labels, output_data.features["label"].names) + result_data = ImagenetDataset({"images": np.array(images), "labels": np.array(labels)}) + + if cache_file: + cache_file.parent.resolve().mkdir(parents=True, exist_ok=True) + np.savez(cache_file, images=np.array(images), labels=np.array(labels)) + + return result_data \ No newline at end of file diff --git a/microsoft-resnet-50/aitk/inference_sample.ipynb b/microsoft-resnet-50/aitk/inference_sample.ipynb new file mode 100644 index 00000000..c167ae59 --- /dev/null +++ b/microsoft-resnet-50/aitk/inference_sample.ipynb @@ -0,0 +1,128 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"QNNExecutionProvider\"\n", + "transpose = False\n", + "if ExecutionProvider == \"OpenVINOExecutionProvider\":\n", + " onnx_model_path = \"./model/ov_model_st_quant.onnx\"\n", + "elif ExecutionProvider == \"VitisAIExecutionProvider\":\n", + " transpose = True" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from PIL import Image\n", + "url = \"https://onnxruntime.ai/images/dog.jpeg\"\n", + "response = requests.get(url)\n", + "# Save the image to a file\n", + "with open(\"dog.jpeg\", \"wb\") as file:\n", + " file.write(response.content)\n", + "img = Image.open(\"dog.jpeg\")\n", + "img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "from PIL import Image\n", + "import torch\n", + "import torchvision.transforms as transforms\n", + "from torchvision.models.resnet import ResNet50_Weights\n", + "\n", + "image_file_path = \"dog.jpeg\"\n", + "\n", + "# Create ONNX runtime session\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "print(\"Available providers:\", session.get_providers())\n", + "print(\"Current provider:\", session.get_provider_options())\n", + "\n", + "# Read and preprocess image\n", + "image = Image.open(image_file_path)\n", + "preprocess = transforms.Compose([\n", + " transforms.Resize(256),\n", + " transforms.CenterCrop(224),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", + "])\n", + "input_tensor = preprocess(image)\n", + "if transpose:\n", + " input_tensor = input_tensor.permute(1, 2, 0)\n", + "input_batch = input_tensor.unsqueeze(0)\n", + "\n", + "# Run inference\n", + "ort_inputs = {session.get_inputs()[0].name: input_batch.numpy()}\n", + "ort_outputs = session.run(None, ort_inputs)\n", + "\n", + "# Postprocess to get softmax vector\n", + "output = ort_outputs[0]\n", + "softmax = torch.nn.functional.softmax(torch.tensor(output), dim=1)\n", + "\n", + "# Extract top 10 predicted classes\n", + "top10 = torch.topk(softmax, 10)\n", + "\n", + "# Get label mapping\n", + "weights = ResNet50_Weights.DEFAULT\n", + "labels = weights.meta[\"categories\"]\n", + "\n", + "# Print results to console\n", + "print(\"Top 10 predictions for ResNet50 v2...\")\n", + "print(\"--------------------------------------------------------------\")\n", + "for i in range(10):\n", + " print(f\"Label: {labels[top10.indices[0][i]]}, Confidence: {top10.values[0][i].item():.4f}\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "cpu", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/microsoft-resnet-50/aitk/info.yml b/microsoft-resnet-50/aitk/info.yml new file mode 100644 index 00000000..feadf0e4 --- /dev/null +++ b/microsoft-resnet-50/aitk/info.yml @@ -0,0 +1,23 @@ +keywords: + aitk +arch: resnet +recipes: + - file: "resnet_qdq_qnn.json" + device: npu + ep: QNNExecutionProvider + - file: "resnet_qdq_amd.json" + device: npu + ep: VitisAIExecutionProvider + - file: "resnet_context_ov_static.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "resnet_trtrtx.json" + device: gpu + ep: NvTensorRTRTXExecutionProvider + - file: "resnet_dml.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/microsoft/resnet-50" + version: 1 diff --git a/microsoft-resnet-50/aitk/model_project.config b/microsoft-resnet-50/aitk/model_project.config new file mode 100644 index 00000000..2a944b44 --- /dev/null +++ b/microsoft-resnet-50/aitk/model_project.config @@ -0,0 +1,28 @@ +{ + "workflows": [ + { + "file": "resnet_qdq_qnn.json", + "templateName": "resnet_qdq_qnn" + }, + { + "file": "resnet_qdq_amd.json", + "templateName": "resnet_qdq_amd" + }, + { + "file": "resnet_context_ov_static.json", + "templateName": "resnet_context_ov_static" + }, + { + "file": "resnet_trtrtx.json", + "templateName": "resnet_trtrtx" + }, + { + "file": "resnet_dml.json", + "templateName": "resnet_dml" + } + ], + "modelInfo": { + "id": "huggingface/microsoft/resnet-50", + "version": 1 + } +} diff --git a/microsoft-resnet-50/aitk/requirements.txt b/microsoft-resnet-50/aitk/requirements.txt new file mode 100644 index 00000000..4598395d --- /dev/null +++ b/microsoft-resnet-50/aitk/requirements.txt @@ -0,0 +1,4 @@ +olive-ai +torchvision +pillow +requests diff --git a/microsoft-resnet-50/aitk/resnet_context_ov_static.json b/microsoft-resnet-50/aitk/resnet_context_ov_static.json new file mode 100644 index 00000000..30ebfd55 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_context_ov_static.json @@ -0,0 +1,139 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/resnet-50", + "task": "image-classification" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 256, + "cache_key": "imagedata_quantization" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1001 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "ov_convert": { + "type": "OpenVINOConversion", + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "compress_to_fp16": true, + "static": true + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "static": true, + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "reuse_cache": true + }, + "ov_quantize": { + "type": "OpenVINOQuantization", + "target_device": "npu", + "data_config": "quantize_data_config", + "reuse_cache": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "ov_version": "2025.1", + "reuse_cache": true + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/resnet_context_ov_static" +} diff --git a/microsoft-resnet-50/aitk/resnet_context_ov_static.json.config b/microsoft-resnet-50/aitk/resnet_context_ov_static.json.config new file mode 100644 index 00000000..aa4fccdb --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_context_ov_static.json.config @@ -0,0 +1,261 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "resnet/openvino/resnet_context_ov_static.json", + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOConversion": "ov_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.ov_quantize.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.ov_quantize.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ], + [ + { + "type": "delete", + "path": "passes.io_update.reuse_cache" + }, + { + "type": "delete", + "path": "passes.ov_quantize.reuse_cache" + }, + { + "type": "delete", + "path": "passes.encapsulation.reuse_cache" + } + ] + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.ov_quantize.target_device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.ov_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.ov_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.size", + "template": { + "path": "data_configs[1].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/microsoft-resnet-50/aitk/resnet_dml.json b/microsoft-resnet-50/aitk/resnet_dml.json new file mode 100644 index 00000000..95e52a9e --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_dml.json @@ -0,0 +1,121 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/resnet-50", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "logits" + ] + } + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1001 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "device": "cpu", + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "dynamic": false, + "use_dynamo_exporter": false + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + } + }, + "host": "host_system", + "target": "target_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/resnet_dml", + "evaluate_input_model": false +} diff --git a/microsoft-resnet-50/aitk/resnet_dml.json.config b/microsoft-resnet-50/aitk/resnet_dml.json.config new file mode 100644 index 00000000..7216c02e --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_dml.json.config @@ -0,0 +1,107 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/microsoft-resnet-50/aitk/resnet_dml_inference_sample.ipynb b/microsoft-resnet-50/aitk/resnet_dml_inference_sample.ipynb new file mode 100644 index 00000000..489618e6 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_dml_inference_sample.ipynb @@ -0,0 +1,121 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"DmlExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from PIL import Image\n", + "url = \"https://onnxruntime.ai/images/dog.jpeg\"\n", + "response = requests.get(url)\n", + "# Save the image to a file\n", + "with open(\"dog.jpeg\", \"wb\") as file:\n", + " file.write(response.content)\n", + "img = Image.open(\"dog.jpeg\")\n", + "img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "from PIL import Image\n", + "import torch\n", + "import torchvision.transforms as transforms\n", + "from torchvision.models.resnet import ResNet50_Weights\n", + "import numpy as np\n", + "\n", + "image_file_path = \"dog.jpeg\"\n", + "\n", + "# Create ONNX runtime session\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "print(\"Available providers:\", session.get_providers())\n", + "print(\"Current provider:\", session.get_provider_options())\n", + "\n", + "# Read and preprocess image\n", + "image = Image.open(image_file_path)\n", + "preprocess = transforms.Compose([\n", + " transforms.Resize(256),\n", + " transforms.CenterCrop(224),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", + "])\n", + "input_tensor = preprocess(image)\n", + "input_batch = input_tensor.unsqueeze(0)\n", + "\n", + "# Run inference\n", + "ort_inputs = {session.get_inputs()[0].name: input_batch.numpy().astype(np.float16)}\n", + "ort_outputs = session.run(None, ort_inputs)\n", + "\n", + "# Postprocess to get softmax vector\n", + "output = ort_outputs[0]\n", + "softmax = torch.nn.functional.softmax(torch.tensor(output), dim=1)\n", + "\n", + "# Extract top 10 predicted classes\n", + "top10 = torch.topk(softmax, 10)\n", + "\n", + "# Get label mapping\n", + "weights = ResNet50_Weights.DEFAULT\n", + "labels = weights.meta[\"categories\"]\n", + "\n", + "# Print results to console\n", + "print(\"Top 10 predictions for ResNet50 v2...\")\n", + "print(\"--------------------------------------------------------------\")\n", + "for i in range(10):\n", + " print(f\"Label: {labels[top10.indices[0][i]]}, Confidence: {top10.values[0][i].item():.4f}\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "cpu", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/microsoft-resnet-50/aitk/resnet_qdq_amd.json b/microsoft-resnet-50/aitk/resnet_qdq_amd.json new file mode 100644 index 00000000..ea681095 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_qdq_amd.json @@ -0,0 +1,147 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/resnet-50", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "logits" + ] + } + }, + "systems": { + "qnn_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "VitisAIExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 64, + "cache_key": "imagedata_quantization", + "transpose": true + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation", + "transpose": true + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1001 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "device": "cpu", + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "dynamic": false, + "use_dynamo_exporter": false + }, + "transpose_input": { + "type": "InputNCHWtoNHWC" + }, + "OnnxQuantization": { + "type": "OnnxQuantization", + "data_config": "quantize_data_config", + "activation_type": "uint8", + "precision": "uint8", + "calibrate_method": "MinMax", + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint8", + "weight_type": "uint8", + "quant_type": "OnnxStaticQuantization" + } + }, + "host": "qnn_system", + "target": "qnn_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/resnet_ptq_qnn", + "evaluate_input_model": false +} diff --git a/microsoft-resnet-50/aitk/resnet_qdq_amd.json.config b/microsoft-resnet-50/aitk/resnet_qdq_amd.json.config new file mode 100644 index 00000000..eabd4ce7 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_qdq_amd.json.config @@ -0,0 +1,239 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "resnet/resnet_ptq_qdq_vitis_ai.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "AMD NPU", + "CPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "VitisAIExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.OnnxQuantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "device": "cpu", + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "dynamic": false, + "use_dynamo_exporter": false + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.size", + "template": { + "path": "data_configs[1].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/microsoft-resnet-50/aitk/resnet_qdq_qnn.json b/microsoft-resnet-50/aitk/resnet_qdq_qnn.json new file mode 100644 index 00000000..2a8b9c16 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_qdq_qnn.json @@ -0,0 +1,132 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/resnet-50", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "logits" + ] + } + }, + "systems": { + "qnn_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 64, + "cache_key": "imagedata_quantization" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + }, + { + "name": "evaluation_data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "validation", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 1000, + "cache_key": "imagedata_evaluation" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1001 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "evaluation_data_config", + "sub_types": [ + { + "name": "avg", + "priority": 2 + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "use_dynamo_exporter": false + }, + "OnnxQuantization": { + "type": "OnnxQuantization", + "data_config": "quantize_data_config", + "activation_type": "uint16", + "precision": "uint8", + "calibrate_method": "MinMax", + "quant_preprocess": true, + "prepare_qnn_config": true, + "save_as_external_data": true + } + }, + "host": "qnn_system", + "target": "qnn_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/resnet_ptq_qnn", + "evaluate_input_model": false +} diff --git a/microsoft-resnet-50/aitk/resnet_qdq_qnn.json.config b/microsoft-resnet-50/aitk/resnet_qdq_qnn.json.config new file mode 100644 index 00000000..9dde5538 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_qdq_qnn.json.config @@ -0,0 +1,237 @@ +{ + "name": "Convert to Qualcomm NPU", + "oliveFile": "resnet/resnet_ptq_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.qnn_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.OnnxQuantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.OnnxQuantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.OnnxQuantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true, + "all_tensors_to_one_file": true, + "use_dynamo_exporter": false + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.size", + "template": { + "path": "data_configs[1].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/microsoft-resnet-50/aitk/resnet_trtrtx.json b/microsoft-resnet-50/aitk/resnet_trtrtx.json new file mode 100644 index 00000000..ed10f746 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_trtrtx.json @@ -0,0 +1,110 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "microsoft/resnet-50", + "task": "image-classification", + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "logits" + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "NvTensorRTRTXExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "data_config", + "type": "HuggingfaceContainer", + "user_script": "imagenet.py", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "train", + "streaming": true, + "trust_remote_code": true + }, + "pre_process_data_config": { + "type": "dataset_pre_process", + "size": 256, + "cache_key": "imagenet" + }, + "post_process_data_config": { + "type": "dataset_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "data_config": "data_config", + "sub_types": [ + { + "name": "accuracy_score", + "priority": 1, + "metric_config": { + "task": "multiclass", + "num_classes": 1001 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "data_config", + "sub_types": [ + { + "name": "avg" + } + ] + } + ] + } + }, + "passes": { + "onnx_conversion": { + "type": "OnnxConversion", + "target_opset": 13, + "save_as_external_data": true + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + }, + "session_params_tuning": { + "type": "OrtSessionParamsTuning", + "io_bind": false, + "data_config": "data_config" + } + }, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/resnet_trtrtx", + "evaluate_input_model": false +} diff --git a/microsoft-resnet-50/aitk/resnet_trtrtx.json.config b/microsoft-resnet-50/aitk/resnet_trtrtx.json.config new file mode 100644 index 00000000..838f3301 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_trtrtx.json.config @@ -0,0 +1,106 @@ +{ + "name": "Convert to NVIDIA TRT for RTX", + "oliveFile": "resnet/resnet_trtrtx.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "NVIDIA TensorRT for RTX", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "NvTensorRTRTXExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.onnx_conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "imagenet-1k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.size", + "template": { + "path": "data_configs[0].pre_process_data_config.size", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/microsoft-resnet-50/aitk/resnet_trtrtx_inference_sample.ipynb b/microsoft-resnet-50/aitk/resnet_trtrtx_inference_sample.ipynb new file mode 100644 index 00000000..25eebee1 --- /dev/null +++ b/microsoft-resnet-50/aitk/resnet_trtrtx_inference_sample.ipynb @@ -0,0 +1,121 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"NvTensorRTRTXExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from PIL import Image\n", + "url = \"https://onnxruntime.ai/images/dog.jpeg\"\n", + "response = requests.get(url)\n", + "# Save the image to a file\n", + "with open(\"dog.jpeg\", \"wb\") as file:\n", + " file.write(response.content)\n", + "img = Image.open(\"dog.jpeg\")\n", + "img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "from PIL import Image\n", + "import torch\n", + "import torchvision.transforms as transforms\n", + "from torchvision.models.resnet import ResNet50_Weights\n", + "import numpy as np\n", + "\n", + "image_file_path = \"dog.jpeg\"\n", + "\n", + "# Create ONNX runtime session\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "session = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "print(\"Available providers:\", session.get_providers())\n", + "print(\"Current provider:\", session.get_provider_options())\n", + "\n", + "# Read and preprocess image\n", + "image = Image.open(image_file_path)\n", + "preprocess = transforms.Compose([\n", + " transforms.Resize(256),\n", + " transforms.CenterCrop(224),\n", + " transforms.ToTensor(),\n", + " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", + "])\n", + "input_tensor = preprocess(image)\n", + "input_batch = input_tensor.unsqueeze(0)\n", + "\n", + "# Run inference\n", + "ort_inputs = {session.get_inputs()[0].name: input_batch.numpy().astype(np.float16)}\n", + "ort_outputs = session.run(None, ort_inputs)\n", + "\n", + "# Postprocess to get softmax vector\n", + "output = ort_outputs[0]\n", + "softmax = torch.nn.functional.softmax(torch.tensor(output), dim=1)\n", + "\n", + "# Extract top 10 predicted classes\n", + "top10 = torch.topk(softmax, 10)\n", + "\n", + "# Get label mapping\n", + "weights = ResNet50_Weights.DEFAULT\n", + "labels = weights.meta[\"categories\"]\n", + "\n", + "# Print results to console\n", + "print(\"Top 10 predictions for ResNet50 v2...\")\n", + "print(\"--------------------------------------------------------------\")\n", + "for i in range(10):\n", + " print(f\"Label: {labels[top10.indices[0][i]]}, Confidence: {top10.values[0][i].item():.4f}\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "cpu", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/.gitignore b/mistralai-Mistral-7B-Instruct-v0.3/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/README.md b/mistralai-Mistral-7B-Instruct-v0.3/aitk/README.md new file mode 100644 index 00000000..d6c8ba9a --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/README.md @@ -0,0 +1,7 @@ +# Mistral-7B-Instruct-v0.3 Optimization + +This repository demonstrates the optimization of the [Mistral 7B Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) model. + +The optimization process is divided into these main workflows: +- OpenVINO for Intel GPU + + This process uses OpenVINO specific passes like `OpenVINOOptimumConversion`, `OpenVINOIoUpdate` and `OpenVINOEncapsulation` diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/inference_sample.ipynb b/mistralai-Mistral-7B-Instruct-v0.3/aitk/inference_sample.ipynb new file mode 100644 index 00000000..cb939cad --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/inference_sample.ipynb @@ -0,0 +1,112 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "text = 'Who is Isaac Newton?'\n", + "ExecutionProvider=\"OpenVINOExecutionProvider\"\n", + "model_folder = \"./model\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime_genai as og\n", + "import json\n", + "from pathlib import Path\n", + "\n", + "def get_session_options(obj):\n", + " if type(obj) is dict:\n", + " for k, v in obj.items():\n", + " if k == \"session_options\":\n", + " yield v\n", + " else:\n", + " for x in get_session_options(v):\n", + " yield x\n", + " elif type(obj) is list:\n", + " for v in obj:\n", + " for x in get_session_options(v):\n", + " yield x\n", + "\n", + "\n", + "def remove_provider_options(model_path):\n", + " genai_config_path = Path(model_path) / \"genai_config.json\"\n", + " data = json.loads(genai_config_path.read_text())\n", + " for session_option in get_session_options(data):\n", + " if 'provider_options' in session_option:\n", + " session_option['provider_options'] = [{k: dict() for k in opts.keys()} for opts in session_option['provider_options']]\n", + "\n", + " json.dump(data, genai_config_path.open(\"w\"), indent=4)\n", + "\n", + "if ExecutionProvider == \"QNNExecutionProvider\":\n", + " remove_provider_options(model_folder)\n", + "\n", + "# Load the base model and tokenizer\n", + "model = og.Model(model_folder)\n", + "tokenizer = og.Tokenizer(model)\n", + "tokenizer_stream = tokenizer.create_stream()\n", + "\n", + "# Set the max length to something sensible by default,\n", + "# since otherwise it will be set to the entire context length\n", + "search_options = {}\n", + "search_options[\"max_length\"] = 200\n", + "\n", + "chat_template = \"<|im_start|>user\\n{input}<|im_end|>\\n<|im_start|>assistant\\n\"\n", + "\n", + "# Generate prompt (prompt template + input)\n", + "prompt = f\"{chat_template.format(input=text)}\"\n", + "\n", + "# Encode the prompt using the tokenizer\n", + "input_tokens = tokenizer.encode(prompt)\n", + "\n", + "# Create params and generator\n", + "params = og.GeneratorParams(model)\n", + "params.set_search_options(**search_options)\n", + "generator = og.Generator(model, params)\n", + "\n", + "# Append input tokens to the generator\n", + "generator.append_tokens(input_tokens)\n", + "\n", + "print(\"\")\n", + "print(\"Output: \", end=\"\", flush=True)\n", + "# Stream the output\n", + "while not generator.is_done():\n", + " generator.generate_next_token()\n", + "\n", + " new_token = generator.get_next_tokens()[0]\n", + " print(tokenizer_stream.decode(new_token), end=\"\", flush=True)\n", + "\n", + "print()\n", + "\n", + "del generator\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/info.yml b/mistralai-Mistral-7B-Instruct-v0.3/aitk/info.yml new file mode 100644 index 00000000..b8708f72 --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/info.yml @@ -0,0 +1,11 @@ +keywords: + aitk +arch: mistral +recipes: + - file: "mistral-7b-instruct-v0.3-ov.json" + device: gpu + ep: OpenVINOExecutionProvider +aitk: + modelInfo: + id: "huggingface/mistralai/Mistral-7B-Instruct-v0.3" + version: 1 diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/mistral-7b-instruct-v0.3-ov.json b/mistralai-Mistral-7B-Instruct-v0.3/aitk/mistral-7b-instruct-v0.3-ov.json new file mode 100644 index 00000000..06b14d17 --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/mistral-7b-instruct-v0.3-ov.json @@ -0,0 +1,34 @@ +{ + "input_model": { "type": "HfModel", "model_path": "mistralai/Mistral-7B-Instruct-v0.3" }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ { "execution_providers": [ "OpenVINOExecutionProvider" ] } ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { "device": "gpu" }, + "ov_quant_config": { + "task": "text-generation-with-past", + "weight_format": "int4", + "group_size": 128, + "ratio": 0.8 + } + }, + "io_update": { "type": "OpenVINOIoUpdate", "static": false }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "gpu", + "keep_ov_dynamic_dims": true, + "ov_version": "2025.1" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluate_input_model": false, + "output_dir": "model/mistralai" +} diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/mistral-7b-instruct-v0.3-ov.json.config b/mistralai-Mistral-7B-Instruct-v0.3/aitk/mistral-7b-instruct-v0.3-ov.json.config new file mode 100644 index 00000000..800869a2 --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/mistral-7b-instruct-v0.3-ov.json.config @@ -0,0 +1,67 @@ +{ + "name": "Convert to Intel GPU", + "oliveFile": "mistral/openvino/Mistral-7B-Instruct-v0.3-gpu-context-ov-dy.json", + "isLLM": true, + "isIntel": true, + "intelRuntimeValues": [ + "gpu" + ], + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel GPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "gpu" + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel GPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "gpu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + } + ] +} diff --git a/mistralai-Mistral-7B-Instruct-v0.3/aitk/model_project.config b/mistralai-Mistral-7B-Instruct-v0.3/aitk/model_project.config new file mode 100644 index 00000000..40434dc0 --- /dev/null +++ b/mistralai-Mistral-7B-Instruct-v0.3/aitk/model_project.config @@ -0,0 +1,12 @@ +{ + "workflows": [ + { + "file": "mistral-7b-instruct-v0.3-ov.json", + "templateName": "mistral-7b-instruct-v0.3-ov" + } + ], + "modelInfo": { + "id": "huggingface/mistralai/Mistral-7B-Instruct-v0.3", + "version": 1 + } +} diff --git a/openai-clip-vit-base-patch16/aitk/.gitignore b/openai-clip-vit-base-patch16/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/openai-clip-vit-base-patch16/aitk/README.md b/openai-clip-vit-base-patch16/aitk/README.md new file mode 100644 index 00000000..35dfb8fe --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/README.md @@ -0,0 +1,48 @@ +# Openai Clip optimization + +This folder contains examples of Openai Clip optimization using different workflows. + +- Text and vision model QDQ for Qualcomm NPU +- QDQ for AMD NPU +- OpenVINO for Intel NPU + +## Openai Clip text optimization with QDQ for Qualcomm NPU + +This example performs Openai Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +### Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + + +| Activation Type  | Weight Type  | Size  | Latency ms (avg)  | +| --------------------- | ----------------- | ---------- | ---------------------- | +| QUInt16 | QUInt8 | 100 | 6.53724 | + +## Openai Clip vision optimization with QDQ for Qualcomm NPU + +This example performs Openai Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +### Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + + +| Activation Type  | Weight Type  | Size  | Latency ms (avg)  | +| --------------------- | ----------------- | ---------- | ---------------------- | +| QUInt16 | QUInt8 | 100 | 20.13231 | + + +## Openai Clip optimization with QDQ for AMD NPU + +This example performs Openai Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +## Openai Clip optimization with OpenVINO + +This example performs Openai Clip optimization with OpenVINO in one workflow for Intel NPU. diff --git a/openai-clip-vit-base-patch16/aitk/_copy.json.config b/openai-clip-vit-base-patch16/aitk/_copy.json.config new file mode 100644 index 00000000..abd20714 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/_copy.json.config @@ -0,0 +1,28 @@ +{ + "copies": [ + { + "src": "openai_clip_ov_inference_sample.ipynb", + "dst": "openai_clip_qdq_amd_inference_sample.ipynb", + "replacements": [ + { + "find": "OpenVINOExecutionProvider", + "replace": "VitisAIExecutionProvider" + }, + { + "find": "./model/openvino_model_quant_st.onnx", + "replace": "./model/model.onnx" + } + ] + }, + { + "src": "openai_clip_trtrtx_inference_sample.ipynb", + "dst": "openai_clip_dml_inference_sample.ipynb", + "replacements": [ + { + "find": "NvTensorRTRTXExecutionProvider", + "replace": "DmlExecutionProvider" + } + ] + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/clip_script.py b/openai-clip-vit-base-patch16/aitk/clip_script.py new file mode 100644 index 00000000..6f775697 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/clip_script.py @@ -0,0 +1,151 @@ +from __future__ import annotations + +from collections import OrderedDict +from itertools import chain + +import torch +from transformers import ( + AutoProcessor, + CLIPTextModelWithProjection, + CLIPVisionModelWithProjection, +) + +from olive.data.component.dataset import BaseDataset +from olive.data.registry import Registry + +HF_MODEL_SUBFOLDER_MAPPING = { + "sentence-transformers/clip-ViT-B-32": "0_CLIPModel", +} + + +def load_image_encoder(model_name): + return CLIPVisionModelWithProjection.from_pretrained( + model_name, + subfolder=HF_MODEL_SUBFOLDER_MAPPING.get(model_name, ""), + ).eval() + + +def load_text_encoder(model_name): + if model_name == "sentence-transformers/clip-ViT-B-32-multilingual-v1": + from sbert_clip_script import SDistilBertTextEncoder + + return SDistilBertTextEncoder(model_name).eval() + + return CLIPTextModelWithProjection.from_pretrained( + model_name, + subfolder=HF_MODEL_SUBFOLDER_MAPPING.get(model_name, ""), + ).eval() + + +def hfdataset_pre_process_for_clip( + dataset, + processor, + torch_model=None, + image_col: str | None = None, + caption_col: str | None = None, + label_col: str = "label", + max_samples: int | None = None, + max_length: int = 77, + batch_size: int = 32, +): + def generate_inputs(sample, indices): + captions = sample.get(caption_col, None) + images = sample.get(image_col, None) + + kwargs = { + "padding": "max_length", + "max_length": max_length, + "truncation": True, + "add_special_tokens": True, + "return_tensors": "pt", + } + if images: + kwargs["images"] = [img.convert("RGB") for img in images] + if captions: + kwargs["text"] = list(chain([x[0] for x in captions])) + + encoded_input = processor(**kwargs) + + return { + **encoded_input, + label_col: torch_model(**encoded_input)[0] if torch_model else sample.get(label_col, indices), + } + + if max_samples is not None and max_samples < len(dataset): + dataset = dataset.select(range(max_samples)) + + tokenized_datasets = dataset.map( + generate_inputs, + batched=True, + batch_size=batch_size, + with_indices=True, + remove_columns=dataset.column_names, + desc="Processing dataset", + ) + tokenized_datasets.set_format("torch", output_all_columns=True) + + return tokenized_datasets + + +@Registry.register_pre_process() +def pre_process_dataset( + dataset, + model_name: str, + generate_ground_truth: bool = False, + image_col: str | None = None, + caption_col: str | None = None, + label_col: str = "label", + max_samples: int | None = None, + max_length: int = 77, + **kwargs, +): + if image_col is None and caption_col is None: + raise ValueError("Either image_col or caption_col must be provided.") + + if generate_ground_truth: + if image_col and caption_col: + raise ValueError("Can not generate two types of embedding at the same time.") + + torch_model = load_image_encoder(model_name) if image_col else load_text_encoder(model_name) + else: + torch_model = None + + processor = AutoProcessor.from_pretrained(model_name) + dataset = hfdataset_pre_process_for_clip( + dataset, + processor, + torch_model=torch_model, + image_col=image_col, + caption_col=caption_col, + label_col=label_col, + max_length=max_length, + max_samples=max_samples, + ) + return BaseDataset(dataset, label_col) + + +@Registry.register_post_process() +def embed_post_process(output): + """Post-processing for CLIP output.""" + match output: + case dict() | OrderedDict() as out: + if "embeds" in out: + return out["embeds"] + elif "text_embeds" in out: + return out["text_embeds"] + elif "image_embeds" in out: + return out["image_embeds"] + case torch.Tensor(): + return output.argmax(dim=-1) + raise ValueError(f"Unsupported output type: {type(output)}") + + +def eval_similarity_degrad(output, targets, batch_size=1024): + import torch.nn.functional as F + + preds = output.preds + scores = [ + F.cosine_similarity(preds[i : i + batch_size], targets[i : i + batch_size]) + for i in range(0, preds.size(0), batch_size) + ] + return {"percentage": f"{100.0 - torch.mean(torch.cat(scores)) * 100.0:.2f}"} diff --git a/openai-clip-vit-base-patch16/aitk/info.yml b/openai-clip-vit-base-patch16/aitk/info.yml new file mode 100644 index 00000000..fd842b36 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/info.yml @@ -0,0 +1,28 @@ +keywords: + aitk +arch: clip +recipes: + - file: "openai_clip_text_qnn.json" + device: npu + ep: QNNExecutionProvider + name: "openai-clip-vit-base-patch16 (Text)" + - file: "openai_clip_vision_qnn.json" + device: npu + ep: QNNExecutionProvider + name: "openai-clip-vit-base-patch16 (Vision)" + - file: "openai_clip_qdq_amd.json" + device: npu + ep: VitisAIExecutionProvider + - file: "openai_clip_ov.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "openai_clip_trtrtx.json" + device: gpu + ep: NvTensorRTRTXExecutionProvider + - file: "openai_clip_dml.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/openai/clip-vit-base-patch16" + version: 1 diff --git a/openai-clip-vit-base-patch16/aitk/model_project.config b/openai-clip-vit-base-patch16/aitk/model_project.config new file mode 100644 index 00000000..c2d569bd --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/model_project.config @@ -0,0 +1,32 @@ +{ + "workflows": [ + { + "file": "openai_clip_text_qnn.json", + "templateName": "openai_clip_text_qnn" + }, + { + "file": "openai_clip_vision_qnn.json", + "templateName": "openai_clip_vision_qnn" + }, + { + "file": "openai_clip_qdq_amd.json", + "templateName": "openai_clip_qdq_amd" + }, + { + "file": "openai_clip_ov.json", + "templateName": "openai_clip_ov" + }, + { + "file": "openai_clip_trtrtx.json", + "templateName": "openai_clip_trtrtx" + }, + { + "file": "openai_clip_dml.json", + "templateName": "openai_clip_dml" + } + ], + "modelInfo": { + "id": "huggingface/openai/clip-vit-base-patch16", + "version": 1 + } +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_dml.json b/openai-clip-vit-base-patch16/aitk/openai_clip_dml.json new file mode 100644 index 00000000..ee99adaa --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_dml.json @@ -0,0 +1,192 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch16", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch16", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "clip", + "opt_level": 0, + "float16": true, + "use_gpu": true, + "keep_io_types": false, + "optimization_options": { + "enable_gelu": true, + "enable_layer_norm": true, + "enable_attention": true, + "use_multi_head_attention": true, + "enable_skip_layer_norm": false, + "enable_embed_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_bias_gelu": false, + "enable_gelu_approximation": false, + "enable_qordered_matmul": false, + "enable_shape_inference": true, + "enable_gemm_fast_gelu": false, + "enable_nhwc_conv": false, + "enable_group_norm": false, + "enable_bias_splitgelu": false, + "enable_packed_qkv": true, + "enable_packed_kv": true, + "enable_bias_add": false, + "enable_rotary_embeddings": true + }, + "save_as_external_data": true + } + }, + "search_strategy": false, + "host": "host_system", + "target": "target_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip" +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_dml.json.config b/openai-clip-vit-base-patch16/aitk/openai_clip_dml.json.config new file mode 100644 index 00000000..ed09dcf4 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_dml.json.config @@ -0,0 +1,87 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.end", + "template": { + "path": "data_configs[0].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_dml_inference_sample.ipynb b/openai-clip-vit-base-patch16/aitk/openai_clip_dml_inference_sample.ipynb new file mode 100644 index 00000000..19f4bc70 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_dml_inference_sample.ipynb @@ -0,0 +1,90 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"DmlExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch16\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.GPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values'].astype(np.float16)\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "winml", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_ov.json b/openai-clip-vit-base-patch16/aitk/openai_clip_ov.json new file mode 100644 index 00000000..e368f2fb --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_ov.json @@ -0,0 +1,125 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch16" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "user_script": "openai_clip_ov.py", + "load_dataset_config": { + "type": "conceptual_captions_dataset", + "data_name": "google-research-datasets/conceptual_captions", + "model_path": "openai/clip-vit-base-patch16" + }, + "dataloader_config": { + "batch_size": 1, + "drop_last": true + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch16", + "dataset_name": "nlphuji/flickr30k", + "start": 10, + "end": 20 + }, + "dataloader_config": { "type": "no_auto_batch_dataloader" }, + "post_process_data_config": { "type": "clip_post_process" } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { "name": "accuracy", "priority": 1, "goal": { "type": "max-degradation", "value": 0.05 } } + ] + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { "name": "avg", "priority": 2, "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } }, + { "name": "p90", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } } + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + } + }, + "ov_quantize": { + "type": "OpenVINOQuantization", + "target_device": "npu", + "data_config": "quantize_data_config", + "model_type": "TRANSFORMER", + "user_script": "openai_clip_ov.py", + "transform_fn": "custom_transform_func", + "extra_configs": [ + { + "advanced_quantization_parameters": { + "smooth_quant_alpha": 0.6 + } + } + ] + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "static": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "ov_version": "2025.1" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip_vit_base_patch16_context_ov_static" +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_ov.json.config b/openai-clip-vit-base-patch16/aitk/openai_clip_ov.json.config new file mode 100644 index 00000000..67325a6e --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_ov.json.config @@ -0,0 +1,174 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "clip/openvino/clip_vit_base_patch16_context_ov_static.json", + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "cpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "gpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "npu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "google-research-datasets/conceptual_captions" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "google-research-datasets/conceptual_captions" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_ov.py b/openai-clip-vit-base-patch16/aitk/openai_clip_ov.py new file mode 100644 index 00000000..d1971b50 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_ov.py @@ -0,0 +1,124 @@ +from io import BytesIO + +import requests +import torch +from datasets import load_dataset +from PIL import Image +from requests.packages.urllib3.exceptions import InsecureRequestWarning +from tqdm import tqdm +from transformers import CLIPModel, CLIPProcessor + +from olive.data.registry import Registry + +requests.packages.urllib3.disable_warnings(InsecureRequestWarning) + +# ------------------------------------------------------------------------- +# Common Dataset +# ------------------------------------------------------------------------- + +seed = 0 +# seed everything to 0 for reproducibility, https://pytorch.org/docs/stable/notes/randomness.html +# do not set random seed and np.random.seed for aml test, since it will cause aml job name conflict +torch.manual_seed(seed) +# the following are needed only for GPU +torch.cuda.manual_seed(seed) +torch.backends.cudnn.deterministic = True +torch.backends.cudnn.benchmark = False + + +def check_text_data(data): + """Check if the given data is text-based.""" + if isinstance(data, str): + return True + if isinstance(data, list): + return all(isinstance(x, str) for x in data) + return False + + +def get_pil_from_url(url): + """Download and convert an image from a URL to a PIL Image object.""" + response = requests.get(url, verify=True, timeout=20) + image = Image.open(BytesIO(response.content)) + return image.convert("RGB") + + +def wrap_collate_fn(processor, max_length): + def collate_fn(example, image_column="image_url", text_column="caption"): + """Preprocess an example by loading and transforming image and text data. + + Check if the text data in the example is valid by calling the `check_text_data` function. + Download the image specified by the URL in the image_column by calling the `get_pil_from_url` function. + If there is any error during the download process, return None. + Return the preprocessed inputs with transformed image and text data. + """ + if len(example) != 1: + raise ValueError(f"Expected 'example' to have exactly one element, but got {len(example)}.") + example = example[0] + + if not check_text_data(example[text_column]): + raise ValueError("Text data is not valid") + + url = example[image_column] + try: + image = get_pil_from_url(url) + w, h = image.size + if h == 1 or w == 1: + return None + except Exception: + return None + + inputs = processor(text=example[text_column], images=[image], return_tensors="pt", padding=True) + if inputs["input_ids"].shape[1] > max_length: + return None + return inputs + + return collate_fn + + +def prepare_calibration_data(dataloader, init_steps): + """Prepare calibration data from a dataloader for a specified number of initialization steps. + + Iterate over the dataloader, fetching batches and storing the relevant data. + """ + data = [] + with tqdm(total=init_steps) as pbar: + for batch in dataloader: + if len(data) == init_steps: + break + if batch: + pbar.update(1) + with torch.no_grad(): + data.append( + { + "input_ids": batch["input_ids"].to("cpu"), + "pixel_values": batch["pixel_values"].to("cpu"), + "attention_mask": batch["attention_mask"].to("cpu"), + } + ) + return data + + +@Registry.register_dataset() +def conceptual_captions_dataset(data_name,opt_init_steps=200, max_train_samples=1000, **kwargs): + """Prepare a vision-text dataset for quantization.""" + dataset = load_dataset(data_name, trust_remote_code=True) + model_path = kwargs.get("model_path") + if not model_path: + raise ValueError( + "The 'model_path' parameter is required in data_configs.load_dataset_config but was not provided." + ) + model = CLIPModel.from_pretrained(model_path) + processor = CLIPProcessor.from_pretrained(model_path) + max_length = model.config.text_config.max_position_embeddings + train_dataset = dataset["train"].shuffle(seed=seed) + collate_fn = wrap_collate_fn(processor, max_length) + dataloader = torch.utils.data.DataLoader(train_dataset, collate_fn=collate_fn, batch_size=1) + return prepare_calibration_data(dataloader, opt_init_steps) + + +def custom_transform_func(data_item): + np_inputs = {} + for inp in data_item: + # Drop the first dimension using slicing + np_inputs[inp] = data_item[inp].numpy()[0, ...] + return np_inputs diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_ov_inference_sample.ipynb b/openai-clip-vit-base-patch16/aitk/openai_clip_ov_inference_sample.ipynb new file mode 100644 index 00000000..18a7aa58 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_ov_inference_sample.ipynb @@ -0,0 +1,84 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/openvino_model_quant_st.onnx\"\n", + "ExecutionProvider=\"OpenVINOExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch16\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values']\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd.json b/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd.json new file mode 100644 index 00000000..25b4782c --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd.json @@ -0,0 +1,209 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch16", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "VitisAIExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quant_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch16", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch16", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "clip", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue" + }, + { + "surgeon": "PowReduceSumPowDiv2LpNorm" + } + ] + }, + "quantization": { + "type": "OnnxStaticQuantization", + "quant_preprocess": true, + "data_config": "quant_data_config", + "activation_type": "uint16", + "precision": "uint8", + "calibrate_method": "MinMax", + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "uint8", + "quant_type": "OnnxStaticQuantization" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip_vit_base_patch16" +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd.json.config b/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd.json.config new file mode 100644 index 00000000..e86474b6 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd.json.config @@ -0,0 +1,195 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "clip/openai_clip-vit-base-patch16_ptq_qdq_vitis_ai.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "AMD NPU", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "VitisAIExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.end", + "template": { + "path": "data_configs[0].load_dataset_config.end", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].load_dataset_config.end", + "template": { + "path": "data_configs[1].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd_inference_sample.ipynb b/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd_inference_sample.ipynb new file mode 100644 index 00000000..a4cb3eb3 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_qdq_amd_inference_sample.ipynb @@ -0,0 +1,84 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"VitisAIExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch16\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values']\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn.json b/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn.json new file mode 100644 index 00000000..f1821df2 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn.json @@ -0,0 +1,193 @@ +{ + "input_model": { + "type": "PytorchModel", + "model_path": "openai/clip-vit-base-patch16", + "generative": false, + "io_config": { + "input_names": [ + "input_ids", + "attention_mask" + ], + "input_shapes": [ + [ + 1, + 77 + ], + [ + 1, + 77 + ] + ], + "input_types": [ + "int32", + "int32" + ], + "output_names": [ + "embeds", + "last_hidden_state" + ] + }, + "model_loader": "load_text_encoder", + "model_script": "clip_script.py" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "host": "host_system", + "target": "host_system", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "log_to_file": false, + "data_configs": [ + { + "name": "calib_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "nlphuji/flickr30k", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch16", + "caption_col": "caption", + "max_length": 77, + "max_samples": 12 + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + }, + { + "name": "eval_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "nlphuji/flickr30k", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch16", + "generate_ground_truth": true, + "caption_col": "caption", + "max_length": 77, + "max_samples": 100 + }, + "post_process_data_config": { + "type": "embed_post_process" + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "degrad", + "type": "custom", + "data_config": "eval_data", + "sub_types": [ + { + "name": "percentage", + "priority": 1, + "higher_is_better": false + } + ], + "user_config": { + "user_script": "clip_script.py", + "metric_func": "eval_similarity_degrad" + } + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { + "name": "avg", + "priority": 2, + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + }, + { + "name": "p90", + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "sequence_length" + ], + "dim_value": [ + 1, + 77 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue", + "replacement": -100.0 + }, + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "bert", + "opt_level": 1, + "optimization_options": { + "enable_gelu": false, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "quantization": { + "type": "OnnxStaticQuantization", + "data_config": "calib_data", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "cache_dir": "cache", + "output_dir": "model/clip_text" +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn.json.config b/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn.json.config new file mode 100644 index 00000000..0904f12d --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn.json.config @@ -0,0 +1,235 @@ +{ + "name": "Convert Text Model to Qualcomm NPU", + "oliveFile": "clip/qdq/openai_clip_text_b16_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.host_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "test" + ], + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "test" + ], + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn_inference_sample.ipynb b/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn_inference_sample.ipynb new file mode 100644 index 00000000..9f0a36b2 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_text_qnn_inference_sample.ipynb @@ -0,0 +1,141 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "43751a72", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"QNNExecutionProvider\"" + ] + }, + { + "cell_type": "markdown", + "id": "897ffb42-3569-4d78-b99d-355a38fdce35", + "metadata": {}, + "source": [ + "### Data Processor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa8d84cd-4853-4746-bce3-b281bfc23d8b", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import CLIPProcessor\n", + "\n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch16\")" + ] + }, + { + "cell_type": "markdown", + "id": "5568eb71-5812-4c74-989c-c12271d33b12", + "metadata": {}, + "source": [ + "### Model Inference with ORT-QNN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02bad4ec-f477-4659-8584-00735f6ed5a9", + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import numpy as np\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "text_model = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "def get_text_embedding(text):\n", + " inputs = processor(\n", + " text=text,\n", + " padding=\"max_length\",\n", + " max_length=77,#text_model.sequence_length,\n", + " truncation=True,\n", + " add_special_tokens=True,\n", + " return_tensors=\"np\",\n", + " )\n", + " output = text_model.run(None, {\n", + " \"input_ids\": inputs[\"input_ids\"].astype(np.int32),\n", + " \"attention_mask\": inputs[\"attention_mask\"].astype(np.int32),\n", + " })\n", + " return torch.from_numpy(output[0])\n", + "\n", + "def calculate_score(emb_1, emb_2):\n", + " emb_1 /= torch.norm(emb_1, dim=-1, keepdim=True)\n", + " emb_2 /= torch.norm(emb_2, dim=-1, keepdim=True)\n", + " return torch.matmul(emb_1, emb_2.T) * 100.0\n", + "\n", + "# Get source embedding and calculate the similarity score for each target\n", + "# We need to process one by one because to static quantization, we fixed the batch size to 1\n", + "def ask(source, targets):\n", + " source_emb = get_text_embedding(source)\n", + " scores = []\n", + " for i, target in enumerate(targets):\n", + " target_emb = get_text_embedding(target)\n", + " score = calculate_score(source_emb, target_emb)\n", + " print(f\"Similarity score of sentence {i}:{score.item()}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3477e36c-2e72-432b-ae81-602073a3754c", + "metadata": {}, + "source": [ + "### Play with Samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cdc2a6-4c81-4f93-8426-065ee4c2b013", + "metadata": {}, + "outputs": [], + "source": [ + "ask(\"a photo containing two cats\", [\"a photo of tshirt\", \"a photo of two cats\"])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx.json b/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx.json new file mode 100644 index 00000000..0d8f7581 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx.json @@ -0,0 +1,173 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch16", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "NvTensorRTRTXExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quant_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch16", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch16", + "dataset_name": "nlphuji/flickr30k", + "start": 10, + "end": 20 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + }, + "session_params_tuning": { + "type": "OrtSessionParamsTuning", + "io_bind": false, + "data_config": "quant_data_config" + } + }, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/clip-vit-base-patch16", + "evaluate_input_model": false +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx.json.config b/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx.json.config new file mode 100644 index 00000000..c61c1395 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx.json.config @@ -0,0 +1,86 @@ +{ + "name": "Convert to NVIDIA TRT for RTX", + "oliveFile": "clip/openai_clip-vit-base-patch16_trtrtx.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "NVIDIA TensorRT for RTX", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "NvTensorRTRTXExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].load_dataset_config.end", + "template": { + "path": "data_configs[1].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx_inference_sample.ipynb b/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx_inference_sample.ipynb new file mode 100644 index 00000000..a3c6f084 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_trtrtx_inference_sample.ipynb @@ -0,0 +1,90 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"NvTensorRTRTXExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch16\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.GPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values'].astype(np.float16)\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "winml", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn.json b/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn.json new file mode 100644 index 00000000..b58a975f --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn.json @@ -0,0 +1,186 @@ +{ + "input_model": { + "type": "PytorchModel", + "model_path": "openai/clip-vit-base-patch16", + "generative": false, + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "embeds" + ] + }, + "model_loader": "load_image_encoder", + "model_script": "clip_script.py" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "host": "host_system", + "target": "host_system", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "log_to_file": false, + "data_configs": [ + { + "name": "calib_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch16", + "image_col": "image", + "max_samples": 12 + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + }, + { + "name": "eval_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch16", + "generate_ground_truth": true, + "image_col": "image", + "max_samples": 100 + }, + "post_process_data_config": { + "type": "embed_post_process" + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "degrad", + "type": "custom", + "data_config": "eval_data", + "sub_types": [ + { + "name": "percentage", + "priority": 1, + "higher_is_better": false + } + ], + "user_config": { + "user_script": "clip_script.py", + "metric_func": "eval_similarity_degrad", + "metric_func_kwargs": { + "batch_size": 32 + } + } + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { + "name": "avg", + "priority": 2, + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + }, + { + "name": "p90", + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "num_channels", + "height", + "width" + ], + "dim_value": [ + 1, + 3, + 224, + 224 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "vit", + "opt_level": 1, + "optimization_options": { + "enable_gelu": false, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "quantization": { + "type": "OnnxStaticQuantization", + "data_config": "calib_data", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "cache_dir": "cache", + "output_dir": "model/clip_vision" +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn.json.config b/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn.json.config new file mode 100644 index 00000000..61ec81c9 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn.json.config @@ -0,0 +1,237 @@ +{ + "name": "Convert Vision Model to Qualcomm NPU", + "oliveFile": "clip/qdq/openai_clip_vision_b16_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.host_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn_inference_sample.ipynb b/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn_inference_sample.ipynb new file mode 100644 index 00000000..f3609ed0 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/openai_clip_vision_qnn_inference_sample.ipynb @@ -0,0 +1,170 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "3c18a7d6", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"QNNExecutionProvider\"" + ] + }, + { + "cell_type": "markdown", + "id": "897ffb42-3569-4d78-b99d-355a38fdce35", + "metadata": {}, + "source": [ + "### Data Processor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa8d84cd-4853-4746-bce3-b281bfc23d8b", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import CLIPProcessor\n", + "\n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch16\")" + ] + }, + { + "cell_type": "markdown", + "id": "5568eb71-5812-4c74-989c-c12271d33b12", + "metadata": {}, + "source": [ + "### Model Inference with ORT-QNN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02bad4ec-f477-4659-8584-00735f6ed5a9", + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import numpy as np\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "vision_model = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "def get_image_embedding(image):\n", + " inputs = processor(images=image, return_tensors=\"np\")\n", + " output = vision_model.run(None, { \"pixel_values\": inputs[\"pixel_values\"] })\n", + " return torch.from_numpy(output[0])\n", + "\n", + "def calculate_score(emb_1, emb_2):\n", + " emb_1 /= torch.norm(emb_1, dim=-1, keepdim=True)\n", + " emb_2 /= torch.norm(emb_2, dim=-1, keepdim=True)\n", + " return torch.matmul(emb_1, emb_2.T) * 100.0\n", + "\n", + "# Get source embedding and calculate the similarity score for each target\n", + "# We need to process one by one because to static quantization, we fixed the batch size to 1\n", + "def ask(source, targets):\n", + " source_emb = get_image_embedding(source)\n", + " for i, target in enumerate(targets):\n", + " target_emb = get_image_embedding(target)\n", + " score = calculate_score(source_emb, target_emb)\n", + " print(f\"Similarity score of image {i}:{score.item()}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3477e36c-2e72-432b-ae81-602073a3754c", + "metadata": {}, + "source": [ + "### Play with Samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16868fbd-e447-4866-af7d-eb6e49975bcc", + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from PIL import Image\n", + "\n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + "image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07076b9a", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"http://images.cocodataset.org/train2017/000000208833.jpg\"\n", + "image1 = Image.open(requests.get(url, stream=True).raw)\n", + "image1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c10de7cd", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"http://images.cocodataset.org/train2017/000000125690.jpg\"\n", + "image2 = Image.open(requests.get(url, stream=True).raw)\n", + "image2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cdc2a6-4c81-4f93-8426-065ee4c2b013", + "metadata": {}, + "outputs": [], + "source": [ + "ask(image, [image1, image2])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch16/aitk/requirements.txt b/openai-clip-vit-base-patch16/aitk/requirements.txt new file mode 100644 index 00000000..0cddd58d --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/requirements.txt @@ -0,0 +1,5 @@ +olive-ai +cachetools==5.5.0 +nltk>=3.9.1 +accelerate>=1.4.0 +pillow>=10.0.1 diff --git a/openai-clip-vit-base-patch16/aitk/user_script.py b/openai-clip-vit-base-patch16/aitk/user_script.py new file mode 100644 index 00000000..2d0051f0 --- /dev/null +++ b/openai-clip-vit-base-patch16/aitk/user_script.py @@ -0,0 +1,64 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- +import numpy as np +import torch +from datasets import load_dataset +from torch.utils.data import Dataset +from transformers import CLIPProcessor + +from olive.data.registry import Registry + + +class CLIPDataset(Dataset): + def __init__( + self, + model_name, + dataset_name, + start=0, + end=500, + image_size=(224, 224), + ): + assert 0 <= start < end + self.start = start + self.end = end + self.model_name = model_name + self.dataset_name = dataset_name + self.processor = CLIPProcessor.from_pretrained(self.model_name) + self.length = self.end - self.start + self.image_size = image_size + self.dataset = load_dataset(self.dataset_name, split=f"test[{0}:{self.end + 10}]") + + def __len__(self): + return self.length + + def __getitem__(self, idx): + text_inputs = self.processor( + text=[" ".join(item) for item in self.dataset[idx : idx + 10]["caption"]], + return_tensors="np", + padding="max_length", + truncation=True, + ) + + image_input = self.processor(images=self.dataset[idx]["image"].resize(self.image_size), return_tensors="np") + model_inputs = [ + { + "input_ids": text_inputs["input_ids"].astype(np.int64), + "pixel_values": image_input["pixel_values"], + "attention_mask": text_inputs["attention_mask"].astype(np.int64), + } + ] + + target = torch.Tensor([0]).to(torch.int32) + return model_inputs[0], target + + +@Registry.register_dataset() +def clip_dataset(**kwargs): + return CLIPDataset(**kwargs) + + +@Registry.register_post_process() +def clip_post_process(output): + return output["logits_per_image"].argmax(axis=-1) diff --git a/openai-clip-vit-base-patch32/aitk/.gitignore b/openai-clip-vit-base-patch32/aitk/.gitignore new file mode 100644 index 00000000..48c03882 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/.gitignore @@ -0,0 +1,5 @@ +__pycache__ +/cache +/history/*/* +!/history/*/history.config +!/history/*/olive_config.json diff --git a/openai-clip-vit-base-patch32/aitk/README.md b/openai-clip-vit-base-patch32/aitk/README.md new file mode 100644 index 00000000..35dfb8fe --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/README.md @@ -0,0 +1,48 @@ +# Openai Clip optimization + +This folder contains examples of Openai Clip optimization using different workflows. + +- Text and vision model QDQ for Qualcomm NPU +- QDQ for AMD NPU +- OpenVINO for Intel NPU + +## Openai Clip text optimization with QDQ for Qualcomm NPU + +This example performs Openai Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +### Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + + +| Activation Type  | Weight Type  | Size  | Latency ms (avg)  | +| --------------------- | ----------------- | ---------- | ---------------------- | +| QUInt16 | QUInt8 | 100 | 6.53724 | + +## Openai Clip vision optimization with QDQ for Qualcomm NPU + +This example performs Openai Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +### Evaluation result + +The quantization uses 256 samples from train split of imagenet-1k dataset and the evaluations uses 256 samples from test split of imagenet-1k dataset. + + +| Activation Type  | Weight Type  | Size  | Latency ms (avg)  | +| --------------------- | ----------------- | ---------- | ---------------------- | +| QUInt16 | QUInt8 | 100 | 20.13231 | + + +## Openai Clip optimization with QDQ for AMD NPU + +This example performs Openai Clip optimization with QDQ in one workflow. It performs the optimization pipeline: + +- *PyTorch Model -> Onnx Model -> Quantized Onnx Model* + +## Openai Clip optimization with OpenVINO + +This example performs Openai Clip optimization with OpenVINO in one workflow for Intel NPU. diff --git a/openai-clip-vit-base-patch32/aitk/_copy.json.config b/openai-clip-vit-base-patch32/aitk/_copy.json.config new file mode 100644 index 00000000..16b1d573 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/_copy.json.config @@ -0,0 +1,206 @@ +{ + "copies": [ + { + "src": "../../clip-vit-base-patch16/1/model_project.config", + "dst": "model_project.config", + "replacements": [ + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_text_qnn_inference_sample.ipynb", + "dst": "openai_clip_text_qnn_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_text_qnn.json", + "dst": "openai_clip_text_qnn.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_text_qnn.json.config", + "dst": "openai_clip_text_qnn.json.config", + "replacements": [ + { + "find": "clip/qdq/openai_clip_text_b16_qdq.json", + "replace": "clip/qdq/openai_clip_text_b32_qdq.json" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_vision_qnn_inference_sample.ipynb", + "dst": "openai_clip_vision_qnn_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_vision_qnn.json", + "dst": "openai_clip_vision_qnn.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_vision_qnn.json.config", + "dst": "openai_clip_vision_qnn.json.config", + "replacements": [ + { + "find": "clip/qdq/openai_clip_vision_b16_qdq.json", + "replace": "clip/qdq/openai_clip_vision_b32_qdq.json" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_ov_inference_sample.ipynb", + "dst": "openai_clip_ov_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_ov.json", + "dst": "openai_clip_ov.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_ov.json.config", + "dst": "openai_clip_ov.json.config", + "replacements": [ + { + "find": "clip/openvino/clip_vit_base_patch16_context_ov_static.json", + "replace": "clip/openvino/clip_vit_base_patch32_context_ov_static.json" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_qdq_amd_inference_sample.ipynb", + "dst": "openai_clip_qdq_amd_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_qdq_amd.json", + "dst": "openai_clip_qdq_amd.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_qdq_amd.json.config", + "dst": "openai_clip_qdq_amd.json.config", + "replacements": [ + { + "find": "clip/openai_clip-vit-base-patch16_ptq_qdq_vitis_ai.json", + "replace": "clip/openai_clip-vit-base-patch32_ptq_qdq_vitis_ai.json" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_trtrtx.json", + "dst": "openai_clip_trtrtx.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_trtrtx.json.config", + "dst": "openai_clip_trtrtx.json.config", + "replacements": [ + { + "find": "clip/openai_clip-vit-base-patch16_trtrtx.json", + "replace": "clip/openai_clip-vit-base-patch32_trtrtx.json" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_trtrtx_inference_sample.ipynb", + "dst": "openai_clip_trtrtx_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_dml.json", + "dst": "openai_clip_dml.json", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_dml.json.config", + "dst": "openai_clip_dml.json.config", + "replacements": [ + ] + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_dml_inference_sample.ipynb", + "dst": "openai_clip_dml_inference_sample.ipynb", + "replacements": [ + { + "find": "openai/clip-vit-base-patch16", + "replace": "openai/clip-vit-base-patch32" + } + ] + }, + { + "src": "../../clip-vit-base-patch16/1/clip_script.py", + "dst": "clip_script.py" + }, + { + "src": "../../clip-vit-base-patch16/1/user_script.py", + "dst": "user_script.py" + }, + { + "src": "../../clip-vit-base-patch16/1/openai_clip_ov.py", + "dst": "openai_clip_ov.py" + }, + { + "src": "../../clip-vit-base-patch16/1/README.md", + "dst": "README.md" + }, + { + "src": "../../clip-vit-base-patch16/1/requirements.txt", + "dst": "requirements.txt" + } + ] +} \ No newline at end of file diff --git a/openai-clip-vit-base-patch32/aitk/clip_script.py b/openai-clip-vit-base-patch32/aitk/clip_script.py new file mode 100644 index 00000000..6f775697 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/clip_script.py @@ -0,0 +1,151 @@ +from __future__ import annotations + +from collections import OrderedDict +from itertools import chain + +import torch +from transformers import ( + AutoProcessor, + CLIPTextModelWithProjection, + CLIPVisionModelWithProjection, +) + +from olive.data.component.dataset import BaseDataset +from olive.data.registry import Registry + +HF_MODEL_SUBFOLDER_MAPPING = { + "sentence-transformers/clip-ViT-B-32": "0_CLIPModel", +} + + +def load_image_encoder(model_name): + return CLIPVisionModelWithProjection.from_pretrained( + model_name, + subfolder=HF_MODEL_SUBFOLDER_MAPPING.get(model_name, ""), + ).eval() + + +def load_text_encoder(model_name): + if model_name == "sentence-transformers/clip-ViT-B-32-multilingual-v1": + from sbert_clip_script import SDistilBertTextEncoder + + return SDistilBertTextEncoder(model_name).eval() + + return CLIPTextModelWithProjection.from_pretrained( + model_name, + subfolder=HF_MODEL_SUBFOLDER_MAPPING.get(model_name, ""), + ).eval() + + +def hfdataset_pre_process_for_clip( + dataset, + processor, + torch_model=None, + image_col: str | None = None, + caption_col: str | None = None, + label_col: str = "label", + max_samples: int | None = None, + max_length: int = 77, + batch_size: int = 32, +): + def generate_inputs(sample, indices): + captions = sample.get(caption_col, None) + images = sample.get(image_col, None) + + kwargs = { + "padding": "max_length", + "max_length": max_length, + "truncation": True, + "add_special_tokens": True, + "return_tensors": "pt", + } + if images: + kwargs["images"] = [img.convert("RGB") for img in images] + if captions: + kwargs["text"] = list(chain([x[0] for x in captions])) + + encoded_input = processor(**kwargs) + + return { + **encoded_input, + label_col: torch_model(**encoded_input)[0] if torch_model else sample.get(label_col, indices), + } + + if max_samples is not None and max_samples < len(dataset): + dataset = dataset.select(range(max_samples)) + + tokenized_datasets = dataset.map( + generate_inputs, + batched=True, + batch_size=batch_size, + with_indices=True, + remove_columns=dataset.column_names, + desc="Processing dataset", + ) + tokenized_datasets.set_format("torch", output_all_columns=True) + + return tokenized_datasets + + +@Registry.register_pre_process() +def pre_process_dataset( + dataset, + model_name: str, + generate_ground_truth: bool = False, + image_col: str | None = None, + caption_col: str | None = None, + label_col: str = "label", + max_samples: int | None = None, + max_length: int = 77, + **kwargs, +): + if image_col is None and caption_col is None: + raise ValueError("Either image_col or caption_col must be provided.") + + if generate_ground_truth: + if image_col and caption_col: + raise ValueError("Can not generate two types of embedding at the same time.") + + torch_model = load_image_encoder(model_name) if image_col else load_text_encoder(model_name) + else: + torch_model = None + + processor = AutoProcessor.from_pretrained(model_name) + dataset = hfdataset_pre_process_for_clip( + dataset, + processor, + torch_model=torch_model, + image_col=image_col, + caption_col=caption_col, + label_col=label_col, + max_length=max_length, + max_samples=max_samples, + ) + return BaseDataset(dataset, label_col) + + +@Registry.register_post_process() +def embed_post_process(output): + """Post-processing for CLIP output.""" + match output: + case dict() | OrderedDict() as out: + if "embeds" in out: + return out["embeds"] + elif "text_embeds" in out: + return out["text_embeds"] + elif "image_embeds" in out: + return out["image_embeds"] + case torch.Tensor(): + return output.argmax(dim=-1) + raise ValueError(f"Unsupported output type: {type(output)}") + + +def eval_similarity_degrad(output, targets, batch_size=1024): + import torch.nn.functional as F + + preds = output.preds + scores = [ + F.cosine_similarity(preds[i : i + batch_size], targets[i : i + batch_size]) + for i in range(0, preds.size(0), batch_size) + ] + return {"percentage": f"{100.0 - torch.mean(torch.cat(scores)) * 100.0:.2f}"} diff --git a/openai-clip-vit-base-patch32/aitk/info.yml b/openai-clip-vit-base-patch32/aitk/info.yml new file mode 100644 index 00000000..c0691592 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/info.yml @@ -0,0 +1,28 @@ +keywords: + aitk +arch: clip +recipes: + - file: "openai_clip_text_qnn.json" + device: npu + ep: QNNExecutionProvider + name: "openai-clip-vit-base-patch32 (Text)" + - file: "openai_clip_vision_qnn.json" + device: npu + ep: QNNExecutionProvider + name: "openai-clip-vit-base-patch32 (Vision)" + - file: "openai_clip_qdq_amd.json" + device: npu + ep: VitisAIExecutionProvider + - file: "openai_clip_ov.json" + device: npu + ep: OpenVINOExecutionProvider + - file: "openai_clip_trtrtx.json" + device: gpu + ep: NvTensorRTRTXExecutionProvider + - file: "openai_clip_dml.json" + device: gpu + ep: DmlExecutionProvider +aitk: + modelInfo: + id: "huggingface/openai/clip-vit-base-patch32" + version: 1 diff --git a/openai-clip-vit-base-patch32/aitk/model_project.config b/openai-clip-vit-base-patch32/aitk/model_project.config new file mode 100644 index 00000000..4f2dd495 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/model_project.config @@ -0,0 +1,32 @@ +{ + "workflows": [ + { + "file": "openai_clip_text_qnn.json", + "templateName": "openai_clip_text_qnn" + }, + { + "file": "openai_clip_vision_qnn.json", + "templateName": "openai_clip_vision_qnn" + }, + { + "file": "openai_clip_qdq_amd.json", + "templateName": "openai_clip_qdq_amd" + }, + { + "file": "openai_clip_ov.json", + "templateName": "openai_clip_ov" + }, + { + "file": "openai_clip_trtrtx.json", + "templateName": "openai_clip_trtrtx" + }, + { + "file": "openai_clip_dml.json", + "templateName": "openai_clip_dml" + } + ], + "modelInfo": { + "id": "huggingface/openai/clip-vit-base-patch32", + "version": 1 + } +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_dml.json b/openai-clip-vit-base-patch32/aitk/openai_clip_dml.json new file mode 100644 index 00000000..aa1c716d --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_dml.json @@ -0,0 +1,192 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch32", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "cpu", + "execution_providers": [ + "CPUExecutionProvider" + ] + } + ] + }, + "target_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "DmlExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch32", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "clip", + "opt_level": 0, + "float16": true, + "use_gpu": true, + "keep_io_types": false, + "optimization_options": { + "enable_gelu": true, + "enable_layer_norm": true, + "enable_attention": true, + "use_multi_head_attention": true, + "enable_skip_layer_norm": false, + "enable_embed_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_bias_gelu": false, + "enable_gelu_approximation": false, + "enable_qordered_matmul": false, + "enable_shape_inference": true, + "enable_gemm_fast_gelu": false, + "enable_nhwc_conv": false, + "enable_group_norm": false, + "enable_bias_splitgelu": false, + "enable_packed_qkv": true, + "enable_packed_kv": true, + "enable_bias_add": false, + "enable_rotary_embeddings": true + }, + "save_as_external_data": true + } + }, + "search_strategy": false, + "host": "host_system", + "target": "target_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip" +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_dml.json.config b/openai-clip-vit-base-patch32/aitk/openai_clip_dml.json.config new file mode 100644 index 00000000..ed09dcf4 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_dml.json.config @@ -0,0 +1,87 @@ +{ + "name": "Convert to DirectML", + "evaluationRuntimeFeatures": [ + "Nightly" + ], + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "DirectML" + ], + "path": "systems.target_system.accelerators.0.execution_providers.0", + "values": [ + "DmlExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.end", + "template": { + "path": "data_configs[0].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_dml_inference_sample.ipynb b/openai-clip-vit-base-patch32/aitk/openai_clip_dml_inference_sample.ipynb new file mode 100644 index 00000000..db21746c --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_dml_inference_sample.ipynb @@ -0,0 +1,90 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"DmlExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch32\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.GPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values'].astype(np.float16)\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "winml", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_ov.json b/openai-clip-vit-base-patch32/aitk/openai_clip_ov.json new file mode 100644 index 00000000..de22a30c --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_ov.json @@ -0,0 +1,125 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch32" + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "OpenVINOExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quantize_data_config", + "user_script": "openai_clip_ov.py", + "load_dataset_config": { + "type": "conceptual_captions_dataset", + "data_name": "google-research-datasets/conceptual_captions", + "model_path": "openai/clip-vit-base-patch32" + }, + "dataloader_config": { + "batch_size": 1, + "drop_last": true + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch32", + "dataset_name": "nlphuji/flickr30k", + "start": 10, + "end": 20 + }, + "dataloader_config": { "type": "no_auto_batch_dataloader" }, + "post_process_data_config": { "type": "clip_post_process" } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { "name": "accuracy", "priority": 1, "goal": { "type": "max-degradation", "value": 0.05 } } + ] + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { "name": "avg", "priority": 2, "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } }, + { "name": "p90", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } } + ] + } + ] + } + }, + "passes": { + "optimum_convert": { + "type": "OpenVINOOptimumConversion", + "extra_args": { + "device": "npu" + } + }, + "ov_quantize": { + "type": "OpenVINOQuantization", + "target_device": "npu", + "data_config": "quantize_data_config", + "model_type": "TRANSFORMER", + "user_script": "openai_clip_ov.py", + "transform_fn": "custom_transform_func", + "extra_configs": [ + { + "advanced_quantization_parameters": { + "smooth_quant_alpha": 0.6 + } + } + ] + }, + "io_update": { + "type": "OpenVINOIoUpdate", + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "static": true + }, + "encapsulation": { + "type": "OpenVINOEncapsulation", + "target_device": "npu", + "ov_version": "2025.1" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip_vit_base_patch16_context_ov_static" +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_ov.json.config b/openai-clip-vit-base-patch32/aitk/openai_clip_ov.json.config new file mode 100644 index 00000000..25acd717 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_ov.json.config @@ -0,0 +1,174 @@ +{ + "name": "Convert to Intel CPU/NPU/GPU", + "oliveFile": "clip/openvino/clip_vit_base_patch32_context_ov_static.json", + "isIntel": true, + "debugInfo": { + "autoGenerated": true, + "useOpenVINOOptimumConversion": "optimum_convert" + }, + "addCpu": false, + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "systems.local_system.accelerators.0.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "readOnly": false + }, + "runtimeInConversion": { + "autoGenerated": true, + "name": "Convert/Quantize to", + "type": "enum", + "displayNames": [ + "Intel CPU", + "Intel GPU", + "Intel NPU" + ], + "path": "passes.optimum_convert.extra_args.device", + "values": [ + "cpu", + "gpu", + "npu" + ], + "actions": [ + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "cpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "cpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "gpu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "gpu" + } + ], + [ + { + "type": "update", + "path": "passes.ov_quantize.target_device", + "value": "npu" + }, + { + "type": "update", + "path": "passes.encapsulation.target_device", + "value": "npu" + } + ] + ] + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "google-research-datasets/conceptual_captions" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "google-research-datasets/conceptual_captions" + ], + "template": "QuantizationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.optimum_convert", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_ov.py b/openai-clip-vit-base-patch32/aitk/openai_clip_ov.py new file mode 100644 index 00000000..d1971b50 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_ov.py @@ -0,0 +1,124 @@ +from io import BytesIO + +import requests +import torch +from datasets import load_dataset +from PIL import Image +from requests.packages.urllib3.exceptions import InsecureRequestWarning +from tqdm import tqdm +from transformers import CLIPModel, CLIPProcessor + +from olive.data.registry import Registry + +requests.packages.urllib3.disable_warnings(InsecureRequestWarning) + +# ------------------------------------------------------------------------- +# Common Dataset +# ------------------------------------------------------------------------- + +seed = 0 +# seed everything to 0 for reproducibility, https://pytorch.org/docs/stable/notes/randomness.html +# do not set random seed and np.random.seed for aml test, since it will cause aml job name conflict +torch.manual_seed(seed) +# the following are needed only for GPU +torch.cuda.manual_seed(seed) +torch.backends.cudnn.deterministic = True +torch.backends.cudnn.benchmark = False + + +def check_text_data(data): + """Check if the given data is text-based.""" + if isinstance(data, str): + return True + if isinstance(data, list): + return all(isinstance(x, str) for x in data) + return False + + +def get_pil_from_url(url): + """Download and convert an image from a URL to a PIL Image object.""" + response = requests.get(url, verify=True, timeout=20) + image = Image.open(BytesIO(response.content)) + return image.convert("RGB") + + +def wrap_collate_fn(processor, max_length): + def collate_fn(example, image_column="image_url", text_column="caption"): + """Preprocess an example by loading and transforming image and text data. + + Check if the text data in the example is valid by calling the `check_text_data` function. + Download the image specified by the URL in the image_column by calling the `get_pil_from_url` function. + If there is any error during the download process, return None. + Return the preprocessed inputs with transformed image and text data. + """ + if len(example) != 1: + raise ValueError(f"Expected 'example' to have exactly one element, but got {len(example)}.") + example = example[0] + + if not check_text_data(example[text_column]): + raise ValueError("Text data is not valid") + + url = example[image_column] + try: + image = get_pil_from_url(url) + w, h = image.size + if h == 1 or w == 1: + return None + except Exception: + return None + + inputs = processor(text=example[text_column], images=[image], return_tensors="pt", padding=True) + if inputs["input_ids"].shape[1] > max_length: + return None + return inputs + + return collate_fn + + +def prepare_calibration_data(dataloader, init_steps): + """Prepare calibration data from a dataloader for a specified number of initialization steps. + + Iterate over the dataloader, fetching batches and storing the relevant data. + """ + data = [] + with tqdm(total=init_steps) as pbar: + for batch in dataloader: + if len(data) == init_steps: + break + if batch: + pbar.update(1) + with torch.no_grad(): + data.append( + { + "input_ids": batch["input_ids"].to("cpu"), + "pixel_values": batch["pixel_values"].to("cpu"), + "attention_mask": batch["attention_mask"].to("cpu"), + } + ) + return data + + +@Registry.register_dataset() +def conceptual_captions_dataset(data_name,opt_init_steps=200, max_train_samples=1000, **kwargs): + """Prepare a vision-text dataset for quantization.""" + dataset = load_dataset(data_name, trust_remote_code=True) + model_path = kwargs.get("model_path") + if not model_path: + raise ValueError( + "The 'model_path' parameter is required in data_configs.load_dataset_config but was not provided." + ) + model = CLIPModel.from_pretrained(model_path) + processor = CLIPProcessor.from_pretrained(model_path) + max_length = model.config.text_config.max_position_embeddings + train_dataset = dataset["train"].shuffle(seed=seed) + collate_fn = wrap_collate_fn(processor, max_length) + dataloader = torch.utils.data.DataLoader(train_dataset, collate_fn=collate_fn, batch_size=1) + return prepare_calibration_data(dataloader, opt_init_steps) + + +def custom_transform_func(data_item): + np_inputs = {} + for inp in data_item: + # Drop the first dimension using slicing + np_inputs[inp] = data_item[inp].numpy()[0, ...] + return np_inputs diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_ov_inference_sample.ipynb b/openai-clip-vit-base-patch32/aitk/openai_clip_ov_inference_sample.ipynb new file mode 100644 index 00000000..ef626f4c --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_ov_inference_sample.ipynb @@ -0,0 +1,84 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/openvino_model_quant_st.onnx\"\n", + "ExecutionProvider=\"OpenVINOExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch32\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values']\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd.json b/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd.json new file mode 100644 index 00000000..2283c80c --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd.json @@ -0,0 +1,209 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch32", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "npu", + "execution_providers": [ + "VitisAIExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quant_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch32", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch32", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "transformer_optimizer": { + "type": "orttransformersoptimization", + "model_type": "clip", + "opt_level": 1, + "optimization_options": { + "enable_gelu": true, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue" + }, + { + "surgeon": "PowReduceSumPowDiv2LpNorm" + } + ] + }, + "quantization": { + "type": "OnnxStaticQuantization", + "quant_preprocess": true, + "data_config": "quant_data_config", + "activation_type": "uint16", + "precision": "uint8", + "calibrate_method": "MinMax", + "save_as_external_data": true + }, + "addmetadata": { + "type": "VitisAIAddMetaData", + "config_meta_data_keys": [ + "architectures", + "model_type" + ], + "activation_type": "uint16", + "weight_type": "uint8", + "quant_type": "OnnxStaticQuantization" + } + }, + "search_strategy": false, + "host": "local_system", + "target": "local_system", + "cache_dir": "cache", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "output_dir": "model/clip_vit_base_patch16" +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd.json.config b/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd.json.config new file mode 100644 index 00000000..0e95bfd2 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd.json.config @@ -0,0 +1,195 @@ +{ + "name": "Convert to AMD NPU", + "oliveFile": "clip/openai_clip-vit-base-patch32_ptq_qdq_vitis_ai.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "AMD NPU", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "VitisAIExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].load_dataset_config.end", + "template": { + "path": "data_configs[0].load_dataset_config.end", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].load_dataset_config.end", + "template": { + "path": "data_configs[1].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd_inference_sample.ipynb b/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd_inference_sample.ipynb new file mode 100644 index 00000000..95bfb0a4 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_qdq_amd_inference_sample.ipynb @@ -0,0 +1,84 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"VitisAIExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": { + "vscode": { + "languageId": "plaintext" + } + }, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch32\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values']\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn.json b/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn.json new file mode 100644 index 00000000..469e1cfe --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn.json @@ -0,0 +1,193 @@ +{ + "input_model": { + "type": "PytorchModel", + "model_path": "openai/clip-vit-base-patch32", + "generative": false, + "io_config": { + "input_names": [ + "input_ids", + "attention_mask" + ], + "input_shapes": [ + [ + 1, + 77 + ], + [ + 1, + 77 + ] + ], + "input_types": [ + "int32", + "int32" + ], + "output_names": [ + "embeds", + "last_hidden_state" + ] + }, + "model_loader": "load_text_encoder", + "model_script": "clip_script.py" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "host": "host_system", + "target": "host_system", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "log_to_file": false, + "data_configs": [ + { + "name": "calib_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "nlphuji/flickr30k", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch32", + "caption_col": "caption", + "max_length": 77, + "max_samples": 12 + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + }, + { + "name": "eval_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "nlphuji/flickr30k", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch32", + "generate_ground_truth": true, + "caption_col": "caption", + "max_length": 77, + "max_samples": 100 + }, + "post_process_data_config": { + "type": "embed_post_process" + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "degrad", + "type": "custom", + "data_config": "eval_data", + "sub_types": [ + { + "name": "percentage", + "priority": 1, + "higher_is_better": false + } + ], + "user_config": { + "user_script": "clip_script.py", + "metric_func": "eval_similarity_degrad" + } + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { + "name": "avg", + "priority": 2, + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + }, + { + "name": "p90", + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "sequence_length" + ], + "dim_value": [ + 1, + 77 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "ReplaceAttentionMaskValue", + "replacement": -100.0 + }, + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "bert", + "opt_level": 1, + "optimization_options": { + "enable_gelu": false, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "quantization": { + "type": "OnnxStaticQuantization", + "data_config": "calib_data", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "cache_dir": "cache", + "output_dir": "model/clip_text" +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn.json.config b/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn.json.config new file mode 100644 index 00000000..5d7b93e7 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn.json.config @@ -0,0 +1,235 @@ +{ + "name": "Convert Text Model to Qualcomm NPU", + "oliveFile": "clip/qdq/openai_clip_text_b32_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.host_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "test" + ], + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "test" + ], + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn_inference_sample.ipynb b/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn_inference_sample.ipynb new file mode 100644 index 00000000..0a120030 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_text_qnn_inference_sample.ipynb @@ -0,0 +1,141 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "43751a72", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"QNNExecutionProvider\"" + ] + }, + { + "cell_type": "markdown", + "id": "897ffb42-3569-4d78-b99d-355a38fdce35", + "metadata": {}, + "source": [ + "### Data Processor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa8d84cd-4853-4746-bce3-b281bfc23d8b", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import CLIPProcessor\n", + "\n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch32\")" + ] + }, + { + "cell_type": "markdown", + "id": "5568eb71-5812-4c74-989c-c12271d33b12", + "metadata": {}, + "source": [ + "### Model Inference with ORT-QNN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02bad4ec-f477-4659-8584-00735f6ed5a9", + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import numpy as np\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "text_model = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "def get_text_embedding(text):\n", + " inputs = processor(\n", + " text=text,\n", + " padding=\"max_length\",\n", + " max_length=77,#text_model.sequence_length,\n", + " truncation=True,\n", + " add_special_tokens=True,\n", + " return_tensors=\"np\",\n", + " )\n", + " output = text_model.run(None, {\n", + " \"input_ids\": inputs[\"input_ids\"].astype(np.int32),\n", + " \"attention_mask\": inputs[\"attention_mask\"].astype(np.int32),\n", + " })\n", + " return torch.from_numpy(output[0])\n", + "\n", + "def calculate_score(emb_1, emb_2):\n", + " emb_1 /= torch.norm(emb_1, dim=-1, keepdim=True)\n", + " emb_2 /= torch.norm(emb_2, dim=-1, keepdim=True)\n", + " return torch.matmul(emb_1, emb_2.T) * 100.0\n", + "\n", + "# Get source embedding and calculate the similarity score for each target\n", + "# We need to process one by one because to static quantization, we fixed the batch size to 1\n", + "def ask(source, targets):\n", + " source_emb = get_text_embedding(source)\n", + " scores = []\n", + " for i, target in enumerate(targets):\n", + " target_emb = get_text_embedding(target)\n", + " score = calculate_score(source_emb, target_emb)\n", + " print(f\"Similarity score of sentence {i}:{score.item()}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3477e36c-2e72-432b-ae81-602073a3754c", + "metadata": {}, + "source": [ + "### Play with Samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cdc2a6-4c81-4f93-8426-065ee4c2b013", + "metadata": {}, + "outputs": [], + "source": [ + "ask(\"a photo containing two cats\", [\"a photo of tshirt\", \"a photo of two cats\"])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx.json b/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx.json new file mode 100644 index 00000000..f6cd9515 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx.json @@ -0,0 +1,173 @@ +{ + "input_model": { + "type": "HfModel", + "model_path": "openai/clip-vit-base-patch32", + "task": "zero-shot-image-classification", + "load_kwargs": { + "attn_implementation": "eager" + }, + "io_config": { + "input_names": [ + "input_ids", + "pixel_values", + "attention_mask" + ], + "input_shapes": [ + [ + 10, + 77 + ], + [ + 1, + 3, + 224, + 224 + ], + [ + 10, + 77 + ] + ], + "input_types": [ + "int64", + "float32", + "int64" + ], + "output_names": [ + "logits_per_image" + ], + "output_shapes": [ + [ + 1, + 2 + ] + ] + } + }, + "systems": { + "local_system": { + "type": "LocalSystem", + "accelerators": [ + { + "device": "gpu", + "execution_providers": [ + "NvTensorRTRTXExecutionProvider" + ] + } + ] + } + }, + "data_configs": [ + { + "name": "quant_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch32", + "dataset_name": "nlphuji/flickr30k", + "start": 0, + "end": 10 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + } + }, + { + "name": "metric_data_config", + "user_script": "user_script.py", + "load_dataset_config": { + "type": "clip_dataset", + "model_name": "openai/clip-vit-base-patch32", + "dataset_name": "nlphuji/flickr30k", + "start": 10, + "end": 20 + }, + "dataloader_config": { + "type": "no_auto_batch_dataloader" + }, + "post_process_data_config": { + "type": "clip_post_process" + } + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "accuracy", + "type": "accuracy", + "backend": "huggingface_metrics", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "accuracy", + "priority": 1, + "goal": { + "type": "max-degradation", + "value": 0.05 + } + } + ] + }, + { + "name": "latency", + "type": "latency", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg", + "goal": { + "type": "percent-min-improvement", + "value": 0.1 + } + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + }, + { + "name": "throughput", + "type": "throughput", + "data_config": "metric_data_config", + "sub_types": [ + { + "name": "avg" + }, + { + "name": "max" + }, + { + "name": "min" + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 17, + "save_as_external_data": true + }, + "onnx_float_to_float16": { + "type": "OnnxFloatToFloat16", + "save_as_external_data": true + }, + "session_params_tuning": { + "type": "OrtSessionParamsTuning", + "io_bind": false, + "data_config": "quant_data_config" + } + }, + "host": "local_system", + "target": "local_system", + "evaluator": "common_evaluator", + "cache_dir": "cache", + "output_dir": "model/clip-vit-base-patch16", + "evaluate_input_model": false +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx.json.config b/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx.json.config new file mode 100644 index 00000000..6b98187a --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx.json.config @@ -0,0 +1,86 @@ +{ + "name": "Convert to NVIDIA TRT for RTX", + "oliveFile": "clip/openai_clip-vit-base-patch32_trtrtx.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "NVIDIA TensorRT for RTX", + "CPU" + ], + "path": "systems.local_system.accelerators.0.execution_providers.0", + "values": [ + "NvTensorRTRTXExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.dataset_name", + "values": [ + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].load_dataset_config.end", + "template": { + "path": "data_configs[1].load_dataset_config.end", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx_inference_sample.ipynb b/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx_inference_sample.ipynb new file mode 100644 index 00000000..ee2b42fd --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_trtrtx_inference_sample.ipynb @@ -0,0 +1,90 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "aeb33f1a", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "ExecutionProvider=\"NvTensorRTRTXExecutionProvider\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "307fcca8", + "metadata": {}, + "outputs": [], + "source": [ + "from PIL import Image\n", + "import requests\n", + " \n", + "from transformers import CLIPProcessor\n", + "import onnxruntime as ort\n", + "import numpy as np\n", + "import torch\n", + " \n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch32\", use_fast=False)\n", + " \n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + " \n", + "inputs = processor(text=[\"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\", \"a photo of a cat\", \"a photo of a dog\"],\n", + " images=image, return_tensors=\"np\", padding=\"max_length\",\n", + " max_length= 77, truncation=True)\n", + " \n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + " \n", + "opts = ort.SessionOptions()\n", + " \n", + "add_ep_for_device(opts, ExecutionProvider, ort.OrtHardwareDeviceType.GPU)\n", + "assert opts.has_providers()\n", + "\n", + "# options = ort.SessionOptions()\n", + "session = ort.InferenceSession(onnx_model_path,\n", + " sess_options=opts,\n", + " # providers=[ExecutionProvider],\n", + " # provider_options=[provider_options]\n", + ")\n", + "logits_per_image = session.run([\"logits_per_image\"],\n", + " {\n", + " \"input_ids\": inputs['input_ids'].astype(np.int64),\n", + " \"attention_mask\": inputs['attention_mask'].astype(np.int64),\n", + " \"pixel_values\": inputs['pixel_values'].astype(np.float16)\n", + " })\n", + " \n", + "probs = torch.tensor(logits_per_image[0]).softmax(dim=1)\n", + "print(\"Label probs:\", probs)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "winml", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn.json b/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn.json new file mode 100644 index 00000000..a12522a0 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn.json @@ -0,0 +1,186 @@ +{ + "input_model": { + "type": "PytorchModel", + "model_path": "openai/clip-vit-base-patch32", + "generative": false, + "io_config": { + "input_names": [ + "pixel_values" + ], + "input_shapes": [ + [ + 1, + 3, + 224, + 224 + ] + ], + "output_names": [ + "embeds" + ] + }, + "model_loader": "load_image_encoder", + "model_script": "clip_script.py" + }, + "systems": { + "host_system": { + "type": "LocalSystem", + "accelerators": [ + { + "execution_providers": [ + "QNNExecutionProvider" + ] + } + ] + } + }, + "host": "host_system", + "target": "host_system", + "evaluator": "common_evaluator", + "evaluate_input_model": false, + "log_to_file": false, + "data_configs": [ + { + "name": "calib_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch32", + "image_col": "image", + "max_samples": 12 + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + }, + { + "name": "eval_data", + "type": "HuggingfaceContainer", + "load_dataset_config": { + "data_name": "timm/mini-imagenet", + "split": "test" + }, + "pre_process_data_config": { + "type": "pre_process_dataset", + "model_name": "openai/clip-vit-base-patch32", + "generate_ground_truth": true, + "image_col": "image", + "max_samples": 100 + }, + "post_process_data_config": { + "type": "embed_post_process" + }, + "dataloader_config": { + "batch_size": 1 + }, + "user_script": "clip_script.py" + } + ], + "evaluators": { + "common_evaluator": { + "metrics": [ + { + "name": "degrad", + "type": "custom", + "data_config": "eval_data", + "sub_types": [ + { + "name": "percentage", + "priority": 1, + "higher_is_better": false + } + ], + "user_config": { + "user_script": "clip_script.py", + "metric_func": "eval_similarity_degrad", + "metric_func_kwargs": { + "batch_size": 32 + } + } + }, + { + "name": "latency", + "type": "latency", + "sub_types": [ + { + "name": "avg", + "priority": 2, + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + }, + { + "name": "p90", + "metric_config": { + "warmup_num": 20, + "repeat_test_num": 100 + } + } + ] + } + ] + } + }, + "passes": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + }, + "to_fixed_shape": { + "type": "DynamicToFixedShape", + "dim_param": [ + "batch_size", + "num_channels", + "height", + "width" + ], + "dim_value": [ + 1, + 3, + 224, + 224 + ] + }, + "surgery": { + "type": "GraphSurgeries", + "surgeries": [ + { + "surgeon": "MatMulAddToGemm" + } + ] + }, + "transformer_optimizer": { + "type": "OrtTransformersOptimization", + "model_type": "vit", + "opt_level": 1, + "optimization_options": { + "enable_gelu": false, + "enable_bias_gelu": false, + "enable_layer_norm": true, + "enable_skip_layer_norm": false, + "enable_bias_skip_layer_norm": false, + "enable_attention": false + }, + "save_as_external_data": true + }, + "quantization": { + "type": "OnnxStaticQuantization", + "data_config": "calib_data", + "quant_preprocess": true, + "activation_type": "uint16", + "precision": "uint8", + "save_as_external_data": true + } + }, + "cache_dir": "cache", + "output_dir": "model/clip_vision" +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn.json.config b/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn.json.config new file mode 100644 index 00000000..66db16db --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn.json.config @@ -0,0 +1,237 @@ +{ + "name": "Convert Vision Model to Qualcomm NPU", + "oliveFile": "clip/qdq/openai_clip_vision_b32_qdq.json", + "runtime": { + "autoGenerated": true, + "name": "Evaluate on", + "type": "enum", + "displayNames": [ + "Qualcomm NPU", + "CPU" + ], + "path": "systems.host_system.accelerators.0.execution_providers.0", + "values": [ + "QNNExecutionProvider", + "CPUExecutionProvider" + ], + "readOnly": false + }, + "sections": [ + { + "autoGenerated": true, + "name": "Convert", + "phase": "Conversion", + "parameters": [], + "toggle": { + "autoGenerated": true, + "name": "Convert to ONNX format", + "type": "bool", + "path": "passes.conversion", + "actions": [ + [], + [] + ], + "readOnly": true + } + }, + { + "name": "Quantize", + "phase": "Quantization", + "parameters": [ + { + "name": "Activation Type", + "tags": [ + "ActivationType" + ], + "description": "Quantization data type of activation. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.activation_type", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.activation_type", + "template": "ActivationType" + } + }, + { + "name": "Weight Type", + "tags": [ + "WeightType" + ], + "description": "Data type for quantizing weights. ‘Int8’ for signed 8-bit integer, ‘UInt8’ for unsigned 8-bit integer etc.", + "descriptionLink": "https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html", + "type": "enum", + "displayNames": [ + "Int8", + "UInt8", + "Int16", + "UInt16" + ], + "displayType": "RadioGroup", + "path": "passes.quantization.precision", + "values": [ + "int8", + "uint8", + "int16", + "uint16" + ], + "template": { + "path": "passes.quantization.precision", + "template": "WeightType" + } + }, + { + "name": "Quantization Dataset", + "tags": [ + "QuantizationDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[0].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": "QuantizationDataset" + } + }, + { + "name": "Quantization Dataset Split", + "tags": [ + "QuantizationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[0].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[0].load_dataset_config.split", + "template": "QuantizationDatasetSplit" + } + }, + { + "name": "Quantization Dataset Size", + "type": "int", + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[0].pre_process_data_config.max_samples", + "template": "QuantizationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Quantize model", + "type": "bool", + "path": "passes.quantization", + "actions": [ + [], + [ + { + "type": "update", + "path": "passes", + "value": { + "conversion": { + "type": "OnnxConversion", + "target_opset": 20, + "dynamic": true, + "use_dynamo_exporter": false, + "save_as_external_data": true + } + } + } + ] + ] + } + }, + { + "name": "Evaluate", + "phase": "Evaluation", + "parameters": [ + { + "name": "Evaluation Dataset", + "tags": [ + "EvaluationDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": { + "path": "data_configs[1].load_dataset_config.data_name", + "values": [ + "timm/mini-imagenet", + "nlphuji/flickr30k" + ], + "template": "EvaluationDataset" + } + }, + { + "name": "Evaluation Dataset Split", + "tags": [ + "EvaluationDatasetSplit", + "DependsOnDataset" + ], + "type": "enum", + "path": "data_configs[1].load_dataset_config.split", + "values": [ + "train", + "validation", + "test" + ], + "template": { + "path": "data_configs[1].load_dataset_config.split", + "template": "EvaluationDatasetSplit" + } + }, + { + "name": "Evaluation Dataset Size", + "type": "int", + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": { + "path": "data_configs[1].pre_process_data_config.max_samples", + "template": "EvaluationDatasetSize" + } + } + ], + "toggle": { + "autoGenerated": true, + "name": "Evaluate model performance", + "type": "bool", + "path": "evaluator", + "actions": [ + [], + [ + { + "type": "delete", + "path": "evaluator" + } + ] + ] + } + } + ] +} diff --git a/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn_inference_sample.ipynb b/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn_inference_sample.ipynb new file mode 100644 index 00000000..518a97c7 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/openai_clip_vision_qnn_inference_sample.ipynb @@ -0,0 +1,170 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "3c18a7d6", + "metadata": {}, + "outputs": [], + "source": [ + "onnx_model_path = \"./model/model.onnx\"\n", + "\n", + "ExecutionProvider=\"QNNExecutionProvider\"" + ] + }, + { + "cell_type": "markdown", + "id": "897ffb42-3569-4d78-b99d-355a38fdce35", + "metadata": {}, + "source": [ + "### Data Processor" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa8d84cd-4853-4746-bce3-b281bfc23d8b", + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import CLIPProcessor\n", + "\n", + "processor = CLIPProcessor.from_pretrained(\"openai/clip-vit-base-patch32\")" + ] + }, + { + "cell_type": "markdown", + "id": "5568eb71-5812-4c74-989c-c12271d33b12", + "metadata": {}, + "source": [ + "### Model Inference with ORT-QNN" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "02bad4ec-f477-4659-8584-00735f6ed5a9", + "metadata": {}, + "outputs": [], + "source": [ + "import onnxruntime as ort\n", + "import torch\n", + "import numpy as np\n", + "\n", + "def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):\n", + " ep_devices = ort.get_ep_devices()\n", + " for ep_device in ep_devices:\n", + " if ep_device.ep_name == ep_name and ep_device.device.type == device_type:\n", + " print(f\"Adding {ep_name} for {device_type}\")\n", + " session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)\n", + "\n", + "\n", + "session_options = ort.SessionOptions()\n", + "\n", + "add_ep_for_device(session_options, ExecutionProvider, ort.OrtHardwareDeviceType.NPU)\n", + "\n", + "vision_model = ort.InferenceSession(\n", + " onnx_model_path, # a model with QNN EPContext nodes\n", + " sess_options=session_options,\n", + ")\n", + "\n", + "def get_image_embedding(image):\n", + " inputs = processor(images=image, return_tensors=\"np\")\n", + " output = vision_model.run(None, { \"pixel_values\": inputs[\"pixel_values\"] })\n", + " return torch.from_numpy(output[0])\n", + "\n", + "def calculate_score(emb_1, emb_2):\n", + " emb_1 /= torch.norm(emb_1, dim=-1, keepdim=True)\n", + " emb_2 /= torch.norm(emb_2, dim=-1, keepdim=True)\n", + " return torch.matmul(emb_1, emb_2.T) * 100.0\n", + "\n", + "# Get source embedding and calculate the similarity score for each target\n", + "# We need to process one by one because to static quantization, we fixed the batch size to 1\n", + "def ask(source, targets):\n", + " source_emb = get_image_embedding(source)\n", + " for i, target in enumerate(targets):\n", + " target_emb = get_image_embedding(target)\n", + " score = calculate_score(source_emb, target_emb)\n", + " print(f\"Similarity score of image {i}:{score.item()}\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "3477e36c-2e72-432b-ae81-602073a3754c", + "metadata": {}, + "source": [ + "### Play with Samples" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "16868fbd-e447-4866-af7d-eb6e49975bcc", + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "from PIL import Image\n", + "\n", + "url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n", + "image = Image.open(requests.get(url, stream=True).raw)\n", + "image" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "07076b9a", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"http://images.cocodataset.org/train2017/000000208833.jpg\"\n", + "image1 = Image.open(requests.get(url, stream=True).raw)\n", + "image1" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c10de7cd", + "metadata": {}, + "outputs": [], + "source": [ + "url = \"http://images.cocodataset.org/train2017/000000125690.jpg\"\n", + "image2 = Image.open(requests.get(url, stream=True).raw)\n", + "image2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8cdc2a6-4c81-4f93-8426-065ee4c2b013", + "metadata": {}, + "outputs": [], + "source": [ + "ask(image, [image1, image2])" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/openai-clip-vit-base-patch32/aitk/requirements.txt b/openai-clip-vit-base-patch32/aitk/requirements.txt new file mode 100644 index 00000000..0cddd58d --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/requirements.txt @@ -0,0 +1,5 @@ +olive-ai +cachetools==5.5.0 +nltk>=3.9.1 +accelerate>=1.4.0 +pillow>=10.0.1 diff --git a/openai-clip-vit-base-patch32/aitk/user_script.py b/openai-clip-vit-base-patch32/aitk/user_script.py new file mode 100644 index 00000000..2d0051f0 --- /dev/null +++ b/openai-clip-vit-base-patch32/aitk/user_script.py @@ -0,0 +1,64 @@ +# ------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------- +import numpy as np +import torch +from datasets import load_dataset +from torch.utils.data import Dataset +from transformers import CLIPProcessor + +from olive.data.registry import Registry + + +class CLIPDataset(Dataset): + def __init__( + self, + model_name, + dataset_name, + start=0, + end=500, + image_size=(224, 224), + ): + assert 0 <= start < end + self.start = start + self.end = end + self.model_name = model_name + self.dataset_name = dataset_name + self.processor = CLIPProcessor.from_pretrained(self.model_name) + self.length = self.end - self.start + self.image_size = image_size + self.dataset = load_dataset(self.dataset_name, split=f"test[{0}:{self.end + 10}]") + + def __len__(self): + return self.length + + def __getitem__(self, idx): + text_inputs = self.processor( + text=[" ".join(item) for item in self.dataset[idx : idx + 10]["caption"]], + return_tensors="np", + padding="max_length", + truncation=True, + ) + + image_input = self.processor(images=self.dataset[idx]["image"].resize(self.image_size), return_tensors="np") + model_inputs = [ + { + "input_ids": text_inputs["input_ids"].astype(np.int64), + "pixel_values": image_input["pixel_values"], + "attention_mask": text_inputs["attention_mask"].astype(np.int64), + } + ] + + target = torch.Tensor([0]).to(torch.int32) + return model_inputs[0], target + + +@Registry.register_dataset() +def clip_dataset(**kwargs): + return CLIPDataset(**kwargs) + + +@Registry.register_post_process() +def clip_post_process(output): + return output["logits_per_image"].argmax(axis=-1)