speechmatics · giorgosHadji · Jun 18, 2026 · Jun 24, 2026 · Jun 24, 2026
diff --git a/docs-style-guide.md b/docs-style-guide.md
@@ -209,7 +209,7 @@ Help readers find information quickly by organizing content into clear levels of
 
 2.  **Section heading (H2)** -- key steps or concepts, e.g., *Realtime processing*.
 
-3.  **Subheading (H3+)** -- finer details within a section, e.g., *Operating points*.
+3.  **Subheading (H3+)** -- finer details within a section, e.g., *Models*.
 
 4.  **Paragraph** -- up to 3 sentences per paragraph.
 

diff --git a/docs/deployments/container/accessing-images.mdx b/docs/deployments/container/accessing-images.mdx
@@ -64,21 +64,21 @@ See [how to run the Core Speech CPU container here.](/deployments/container/cpu-
 
 The Transcription GPU images are required to use the most accurate models.
 
-### Standard operating point
+### Standard model
 
-There is a single image available that supports all languages for the Standard Operating Point. There are language specific images available that support the Enhanced and Standard Operating Point.
+There is a single image available that supports all languages for the Standard model. There are language specific images available that support the Enhanced and Standard models.
 
 <CodeBlock>
-  {`# pulling the Standard operating point Transcription GPU inference server which supports all languages with the ${smVariables.latestContainerVersion} tag:
+  {`# pulling the Standard model Transcription GPU inference server which supports all languages with the ${smVariables.latestContainerVersion} tag:
 docker pull speechmaticspublic.azurecr.io/sm-gpu-inference-server-standard-all:${smVariables.latestContainerVersion}
 \0
-# pulling language specific Transcription GPU inference servers available for en, es, de, fr. Supports both Enhanced and Standard operating points with the ${smVariables.latestContainerVersion} tag:
+# pulling language specific Transcription GPU inference servers available for en, es, de, fr. Supports both Enhanced and Standard models with the ${smVariables.latestContainerVersion} tag:
 docker pull speechmaticspublic.azurecr.io/sm-gpu-inference-server-en:${smVariables.latestContainerVersion}`}
 </CodeBlock>
 
-### Enhanced operating point
+### Enhanced model
 
-Depending on which Enhanced Operating Point languages are required, you can pull specific images.
+Depending on which Enhanced model languages are required, you can pull specific images.
 
 <details open>
   <summary>Language Pack 1</summary>

diff --git a/docs/deployments/container/batch-persistent-worker.mdx b/docs/deployments/container/batch-persistent-worker.mdx
@@ -87,7 +87,7 @@ curl -X POST address.of.container:PORT/v2/jobs \
     "transcription_config": {
       "language": "en",
       "diarization": "speaker",
-      "operating_point": "enhanced"
+      "model": "enhanced"
     }
   }' \
   -F 'data_file=@~/audio_file.mp3'
@@ -201,7 +201,7 @@ curl -X POST address.of.container:PORT/v2/jobs \
     "transcription_config": {
       "language": "en",
       "diarization": "speaker",
-      "operating_point": "enhanced"
+      "model": "enhanced"
     }
   }' \
   -F 'data_file=@~/audio_file.mp3'

diff --git a/docs/deployments/container/cpu-speech-to-text.mdx b/docs/deployments/container/cpu-speech-to-text.mdx
@@ -257,7 +257,7 @@ The first-session loading time can be reduced down to several hundred millisecon
 You can enable this feature by setting the `SM_PREWARM_ENGINE_MODES` environment variable, with a semicolon separated list describing the required engine modes. For example, to prewarm 1 English GPU Standard and 2 English GPU Enhanced:
 `SM_PREWARM_ENGINE_MODES='en_general_gpu_standard:1;en_general_gpu_enhanced:2'`
 
-In general, the format is: `{language}_{domain}_{processor}_{operating_point}:{prewarm_connections}`.
+In general, the format is: `{language}_{domain}_{processor}_{model}:{prewarm_connections}`.
 
 The parameters are:
 - `language` - One of the supported [language codes](/speech-to-text/languages)
@@ -266,7 +266,7 @@ The parameters are:
 
 - `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text)
 
-- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/models) you want to prewarm
+- `model` - One of `standard` or `enhanced`. The [model](/speech-to-text/models) you want to prewarm
 
 - `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start.
 

diff --git a/docs/deployments/container/gpu-speech-to-text.mdx b/docs/deployments/container/gpu-speech-to-text.mdx
@@ -105,14 +105,16 @@ The server can only support one of these modes at once.
 
 Once the GPU Server is running, follow the [Instructions for Linking a CPU Container](/deployments/container/cpu-speech-to-text#linking-to-a-gpu-inference-container).
 
-### Running only one operating point
+### Running only one model
 
-[Operating Points](/speech-to-text/models) represent different levels of model complexity.
-To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the
-`SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`.
+[Models](/speech-to-text/models) (previously called Operating Points) represent different levels of model complexity.
+To save GPU memory for throughput, you can run the server with only one model loaded. To do this, pass the
+`SM_MODEL` environment variable to the container and set it to either `standard` or `enhanced`.
+
+`SM_MODEL` replaces the older `SM_OPERATING_POINT` environment variable. `SM_OPERATING_POINT` is deprecated but still works and accepts the same `standard` and `enhanced` values; use `SM_MODEL` going forward.
 
 :::info
-When running the all language standard Operating Point GPU inference server you must set the `SM_OPERATING_POINT` environment variable to `standard`
+When running the all language standard model GPU inference server you must set the `SM_MODEL` environment variable to `standard`
 :::
 
 ### Monitoring the server
@@ -121,7 +123,7 @@ The inference server is based on [Nvidia's Triton architecture](https://develope
 can be monitored using Triton's inbuilt Prometheus metrics, or the GRPC/HTTP APIs. To expose these, configure an external mapping for port
 8002(Prometheus) or 8000(HTTP).
 
-### Operating points in GPU inference
+### Models in GPU inference
 
 When inference is outsourced to a GPU server, alternative GPU-specific models are used, so you should not expect to see identical results compared to CPU-based inference. For convenience, the GPU models are also designated as 'standard' and 'enhanced'.
 

diff --git a/docs/deployments/container/gpu-translation.mdx b/docs/deployments/container/gpu-translation.mdx
@@ -103,7 +103,7 @@ Assuming the following config file:
 {
   "type": "transcription",
   "transcription_config": {
-    "operating_point": "enhanced",
+    "model": "enhanced",
     "language": "en"
   },
   "translation_config": {

diff --git a/docs/deployments/container/performance-and-cost.mdx b/docs/deployments/container/performance-and-cost.mdx
@@ -11,7 +11,7 @@ This is a comparison of the performance and estimated running costs of transcrip
 ### Batch transcription
 
 
-| Operating Point                            | [CPU Standard](./cpu-speech-to-text) | [CPU Enhanced](./cpu-speech-to-text) | [GPU Standard](./gpu-speech-to-text) | [GPU Enhanced](./gpu-speech-to-text) |
+| Models                                     | [CPU Standard](./cpu-speech-to-text) | [CPU Enhanced](./cpu-speech-to-text) | [GPU Standard](./gpu-speech-to-text) | [GPU Enhanced](./gpu-speech-to-text) |
 |--------------------------------------------|--------------|--------------|--------------|--------------|
 | Lowest Processing Cost (US ¢ per hour)     | 1.7          | 3.8          | 0.34         | 1.67         |
 | Cost vs CPU Standard (%)                   | -            | 224%         | 20%          | 98%          |
@@ -31,13 +31,13 @@ The benchmark uses the following configuration:
 | Price Basis       | Azure PAYG East US, Linux, Standard |
 
 :::note
-For GPU Operating Points, transcribers and inference servers were all run on a single VM node.
+For GPU Models, transcribers and inference servers were all run on a single VM node.
 :::
 
 
 ### Realtime transcription
 
-| Operating Point                            | [CPU Standard](./cpu-speech-to-text#realtime-transcription) | [CPU Enhanced](./cpu-speech-to-text#realtime-transcription) | [GPU Standard](./gpu-speech-to-text#batch-and-real-time-inference) | [GPU Enhanced](./gpu-speech-to-text#batch-and-real-time-inference) |
+| Models                                     | [CPU Standard](./cpu-speech-to-text#realtime-transcription) | [CPU Enhanced](./cpu-speech-to-text#realtime-transcription) | [GPU Standard](./gpu-speech-to-text#batch-and-real-time-inference) | [GPU Enhanced](./gpu-speech-to-text#batch-and-real-time-inference) |
 |--------------------------------------------|--------------|--------------|--------------|--------------|
 | Lowest Processing Cost (US ¢ per hour)     | 1.97         | 2.95         | 0.86         | 2.51         |
 | Cost vs. CPU Standard (%)                  | -            | 150%         | 44%          | 127%         |
@@ -55,7 +55,7 @@ This benchmark uses the following configuration[^4]:
 | Price Basis       | Azure PAYG East US, Linux, Standard |
 
 :::note
-For GPU Operating Points, the transcribers and inference servers were run on a single VM node.
+For GPU Models, the transcribers and inference servers were run on a single VM node.
 
 Each first session, transcriber requires 0.25 cores for both OPs, with 1.2 GB memory (Standard OP) or 3 GB memory (Enhanced OP). Every additional session consumes 0.1 cores and 100 MB of memory.
 :::

diff --git a/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js b/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js
@@ -20,7 +20,7 @@ async function transcribeFile() {
     {
       transcription_config: {
         language: "en",
-        operating_point: "enhanced",
+        model: "enhanced",
       },
     },
     "json-v2",

diff --git a/docs/speech-to-text/features/audio-events.mdx b/docs/speech-to-text/features/audio-events.mdx
@@ -293,4 +293,4 @@ An example of a request only for `applause` and `music`
 - Audio Events is supported only in the JSON type API response
 - While the occurrence of music can be detected, richer metadata about the music such as title, artist, genre, etc cannot be identified
 - Only one instance of an event type can be tracked at a point in time. e.g. seamlessly switching consecutive songs will be detected as one single music event 
-- For On-Prem Containers, Audio Events is available only for GPU Operating Points
+- For On-Prem Containers, Audio Events is available only for GPU Models