diff --git a/docs-style-guide.md b/docs-style-guide.md
index 7a0312a3..178df109 100644
--- a/docs-style-guide.md
+++ b/docs-style-guide.md
@@ -209,7 +209,7 @@ Help readers find information quickly by organizing content into clear levels of
2. **Section heading (H2)** -- key steps or concepts, e.g., *Realtime processing*.
-3. **Subheading (H3+)** -- finer details within a section, e.g., *Operating points*.
+3. **Subheading (H3+)** -- finer details within a section, e.g., *Models*.
4. **Paragraph** -- up to 3 sentences per paragraph.
diff --git a/docs/deployments/container/accessing-images.mdx b/docs/deployments/container/accessing-images.mdx
index 610763df..b959a718 100644
--- a/docs/deployments/container/accessing-images.mdx
+++ b/docs/deployments/container/accessing-images.mdx
@@ -64,21 +64,21 @@ See [how to run the Core Speech CPU container here.](/deployments/container/cpu-
The Transcription GPU images are required to use the most accurate models.
-### Standard operating point
+### Standard model
-There is a single image available that supports all languages for the Standard Operating Point. There are language specific images available that support the Enhanced and Standard Operating Point.
+There is a single image available that supports all languages for the Standard model. There are language specific images available that support the Enhanced and Standard models.
- {`# pulling the Standard operating point Transcription GPU inference server which supports all languages with the ${smVariables.latestContainerVersion} tag:
+ {`# pulling the Standard model Transcription GPU inference server which supports all languages with the ${smVariables.latestContainerVersion} tag:
docker pull speechmaticspublic.azurecr.io/sm-gpu-inference-server-standard-all:${smVariables.latestContainerVersion}
\0
-# pulling language specific Transcription GPU inference servers available for en, es, de, fr. Supports both Enhanced and Standard operating points with the ${smVariables.latestContainerVersion} tag:
+# pulling language specific Transcription GPU inference servers available for en, es, de, fr. Supports both Enhanced and Standard models with the ${smVariables.latestContainerVersion} tag:
docker pull speechmaticspublic.azurecr.io/sm-gpu-inference-server-en:${smVariables.latestContainerVersion}`}
-### Enhanced operating point
+### Enhanced model
-Depending on which Enhanced Operating Point languages are required, you can pull specific images.
+Depending on which Enhanced model languages are required, you can pull specific images.
Language Pack 1
diff --git a/docs/deployments/container/batch-persistent-worker.mdx b/docs/deployments/container/batch-persistent-worker.mdx
index 7c2acdf5..d303a696 100644
--- a/docs/deployments/container/batch-persistent-worker.mdx
+++ b/docs/deployments/container/batch-persistent-worker.mdx
@@ -87,7 +87,7 @@ curl -X POST address.of.container:PORT/v2/jobs \
"transcription_config": {
"language": "en",
"diarization": "speaker",
- "operating_point": "enhanced"
+ "model": "enhanced"
}
}' \
-F 'data_file=@~/audio_file.mp3'
@@ -201,7 +201,7 @@ curl -X POST address.of.container:PORT/v2/jobs \
"transcription_config": {
"language": "en",
"diarization": "speaker",
- "operating_point": "enhanced"
+ "model": "enhanced"
}
}' \
-F 'data_file=@~/audio_file.mp3'
diff --git a/docs/deployments/container/cpu-speech-to-text.mdx b/docs/deployments/container/cpu-speech-to-text.mdx
index 7c9c79a0..8aefcd85 100644
--- a/docs/deployments/container/cpu-speech-to-text.mdx
+++ b/docs/deployments/container/cpu-speech-to-text.mdx
@@ -257,7 +257,7 @@ The first-session loading time can be reduced down to several hundred millisecon
You can enable this feature by setting the `SM_PREWARM_ENGINE_MODES` environment variable, with a semicolon separated list describing the required engine modes. For example, to prewarm 1 English GPU Standard and 2 English GPU Enhanced:
`SM_PREWARM_ENGINE_MODES='en_general_gpu_standard:1;en_general_gpu_enhanced:2'`
-In general, the format is: `{language}_{domain}_{processor}_{operating_point}:{prewarm_connections}`.
+In general, the format is: `{language}_{domain}_{processor}_{model}:{prewarm_connections}`.
The parameters are:
- `language` - One of the supported [language codes](/speech-to-text/languages)
@@ -266,7 +266,7 @@ The parameters are:
- `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text)
-- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/models) you want to prewarm
+- `model` - One of `standard` or `enhanced`. The [model](/speech-to-text/models) you want to prewarm
- `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start.
diff --git a/docs/deployments/container/gpu-speech-to-text.mdx b/docs/deployments/container/gpu-speech-to-text.mdx
index 46d278f6..4e70f2a3 100644
--- a/docs/deployments/container/gpu-speech-to-text.mdx
+++ b/docs/deployments/container/gpu-speech-to-text.mdx
@@ -105,14 +105,16 @@ The server can only support one of these modes at once.
Once the GPU Server is running, follow the [Instructions for Linking a CPU Container](/deployments/container/cpu-speech-to-text#linking-to-a-gpu-inference-container).
-### Running only one operating point
+### Running only one model
-[Operating Points](/speech-to-text/models) represent different levels of model complexity.
-To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the
-`SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`.
+[Models](/speech-to-text/models) (previously called Operating Points) represent different levels of model complexity.
+To save GPU memory for throughput, you can run the server with only one model loaded. To do this, pass the
+`SM_MODEL` environment variable to the container and set it to either `standard` or `enhanced`.
+
+`SM_MODEL` replaces the older `SM_OPERATING_POINT` environment variable. `SM_OPERATING_POINT` is deprecated but still works and accepts the same `standard` and `enhanced` values; use `SM_MODEL` going forward.
:::info
-When running the all language standard Operating Point GPU inference server you must set the `SM_OPERATING_POINT` environment variable to `standard`
+When running the all language standard model GPU inference server you must set the `SM_MODEL` environment variable to `standard`
:::
### Monitoring the server
@@ -121,7 +123,7 @@ The inference server is based on [Nvidia's Triton architecture](https://develope
can be monitored using Triton's inbuilt Prometheus metrics, or the GRPC/HTTP APIs. To expose these, configure an external mapping for port
8002(Prometheus) or 8000(HTTP).
-### Operating points in GPU inference
+### Models in GPU inference
When inference is outsourced to a GPU server, alternative GPU-specific models are used, so you should not expect to see identical results compared to CPU-based inference. For convenience, the GPU models are also designated as 'standard' and 'enhanced'.
diff --git a/docs/deployments/container/gpu-translation.mdx b/docs/deployments/container/gpu-translation.mdx
index ed4e6fc2..80fe5740 100644
--- a/docs/deployments/container/gpu-translation.mdx
+++ b/docs/deployments/container/gpu-translation.mdx
@@ -103,7 +103,7 @@ Assuming the following config file:
{
"type": "transcription",
"transcription_config": {
- "operating_point": "enhanced",
+ "model": "enhanced",
"language": "en"
},
"translation_config": {
diff --git a/docs/deployments/container/performance-and-cost.mdx b/docs/deployments/container/performance-and-cost.mdx
index 87ec4ca9..bfc6d5b4 100644
--- a/docs/deployments/container/performance-and-cost.mdx
+++ b/docs/deployments/container/performance-and-cost.mdx
@@ -11,7 +11,7 @@ This is a comparison of the performance and estimated running costs of transcrip
### Batch transcription
-| Operating Point | [CPU Standard](./cpu-speech-to-text) | [CPU Enhanced](./cpu-speech-to-text) | [GPU Standard](./gpu-speech-to-text) | [GPU Enhanced](./gpu-speech-to-text) |
+| Models | [CPU Standard](./cpu-speech-to-text) | [CPU Enhanced](./cpu-speech-to-text) | [GPU Standard](./gpu-speech-to-text) | [GPU Enhanced](./gpu-speech-to-text) |
|--------------------------------------------|--------------|--------------|--------------|--------------|
| Lowest Processing Cost (US ¢ per hour) | 1.7 | 3.8 | 0.34 | 1.67 |
| Cost vs CPU Standard (%) | - | 224% | 20% | 98% |
@@ -31,13 +31,13 @@ The benchmark uses the following configuration:
| Price Basis | Azure PAYG East US, Linux, Standard |
:::note
-For GPU Operating Points, transcribers and inference servers were all run on a single VM node.
+For GPU Models, transcribers and inference servers were all run on a single VM node.
:::
### Realtime transcription
-| Operating Point | [CPU Standard](./cpu-speech-to-text#realtime-transcription) | [CPU Enhanced](./cpu-speech-to-text#realtime-transcription) | [GPU Standard](./gpu-speech-to-text#batch-and-real-time-inference) | [GPU Enhanced](./gpu-speech-to-text#batch-and-real-time-inference) |
+| Models | [CPU Standard](./cpu-speech-to-text#realtime-transcription) | [CPU Enhanced](./cpu-speech-to-text#realtime-transcription) | [GPU Standard](./gpu-speech-to-text#batch-and-real-time-inference) | [GPU Enhanced](./gpu-speech-to-text#batch-and-real-time-inference) |
|--------------------------------------------|--------------|--------------|--------------|--------------|
| Lowest Processing Cost (US ¢ per hour) | 1.97 | 2.95 | 0.86 | 2.51 |
| Cost vs. CPU Standard (%) | - | 150% | 44% | 127% |
@@ -55,7 +55,7 @@ This benchmark uses the following configuration[^4]:
| Price Basis | Azure PAYG East US, Linux, Standard |
:::note
-For GPU Operating Points, the transcribers and inference servers were run on a single VM node.
+For GPU Models, the transcribers and inference servers were run on a single VM node.
Each first session, transcriber requires 0.25 cores for both OPs, with 1.2 GB memory (Standard OP) or 3 GB memory (Enhanced OP). Every additional session consumes 0.1 cores and 100 MB of memory.
:::
diff --git a/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js b/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js
index bde709ce..c349ad10 100644
--- a/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js
+++ b/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js
@@ -20,7 +20,7 @@ async function transcribeFile() {
{
transcription_config: {
language: "en",
- operating_point: "enhanced",
+ model: "enhanced",
},
},
"json-v2",
diff --git a/docs/speech-to-text/features/audio-events.mdx b/docs/speech-to-text/features/audio-events.mdx
index 1457d317..d7f1a3f5 100644
--- a/docs/speech-to-text/features/audio-events.mdx
+++ b/docs/speech-to-text/features/audio-events.mdx
@@ -293,4 +293,4 @@ An example of a request only for `applause` and `music`
- Audio Events is supported only in the JSON type API response
- While the occurrence of music can be detected, richer metadata about the music such as title, artist, genre, etc cannot be identified
- Only one instance of an event type can be tracked at a point in time. e.g. seamlessly switching consecutive songs will be detected as one single music event
-- For On-Prem Containers, Audio Events is available only for GPU Operating Points
+- For On-Prem Containers, Audio Events is available only for GPU Models