diff --git a/docs-style-guide.md b/docs-style-guide.md index 7a0312a3..178df109 100644 --- a/docs-style-guide.md +++ b/docs-style-guide.md @@ -209,7 +209,7 @@ Help readers find information quickly by organizing content into clear levels of 2. **Section heading (H2)** -- key steps or concepts, e.g., *Realtime processing*. -3. **Subheading (H3+)** -- finer details within a section, e.g., *Operating points*. +3. **Subheading (H3+)** -- finer details within a section, e.g., *Models*. 4. **Paragraph** -- up to 3 sentences per paragraph. diff --git a/docs/deployments/container/accessing-images.mdx b/docs/deployments/container/accessing-images.mdx index 610763df..b959a718 100644 --- a/docs/deployments/container/accessing-images.mdx +++ b/docs/deployments/container/accessing-images.mdx @@ -64,21 +64,21 @@ See [how to run the Core Speech CPU container here.](/deployments/container/cpu- The Transcription GPU images are required to use the most accurate models. -### Standard operating point +### Standard model -There is a single image available that supports all languages for the Standard Operating Point. There are language specific images available that support the Enhanced and Standard Operating Point. +There is a single image available that supports all languages for the Standard model. There are language specific images available that support the Enhanced and Standard models. - {`# pulling the Standard operating point Transcription GPU inference server which supports all languages with the ${smVariables.latestContainerVersion} tag: + {`# pulling the Standard model Transcription GPU inference server which supports all languages with the ${smVariables.latestContainerVersion} tag: docker pull speechmaticspublic.azurecr.io/sm-gpu-inference-server-standard-all:${smVariables.latestContainerVersion} \0 -# pulling language specific Transcription GPU inference servers available for en, es, de, fr. Supports both Enhanced and Standard operating points with the ${smVariables.latestContainerVersion} tag: +# pulling language specific Transcription GPU inference servers available for en, es, de, fr. Supports both Enhanced and Standard models with the ${smVariables.latestContainerVersion} tag: docker pull speechmaticspublic.azurecr.io/sm-gpu-inference-server-en:${smVariables.latestContainerVersion}`} -### Enhanced operating point +### Enhanced model -Depending on which Enhanced Operating Point languages are required, you can pull specific images. +Depending on which Enhanced model languages are required, you can pull specific images.
Language Pack 1 diff --git a/docs/deployments/container/batch-persistent-worker.mdx b/docs/deployments/container/batch-persistent-worker.mdx index 7c2acdf5..d303a696 100644 --- a/docs/deployments/container/batch-persistent-worker.mdx +++ b/docs/deployments/container/batch-persistent-worker.mdx @@ -87,7 +87,7 @@ curl -X POST address.of.container:PORT/v2/jobs \ "transcription_config": { "language": "en", "diarization": "speaker", - "operating_point": "enhanced" + "model": "enhanced" } }' \ -F 'data_file=@~/audio_file.mp3' @@ -201,7 +201,7 @@ curl -X POST address.of.container:PORT/v2/jobs \ "transcription_config": { "language": "en", "diarization": "speaker", - "operating_point": "enhanced" + "model": "enhanced" } }' \ -F 'data_file=@~/audio_file.mp3' diff --git a/docs/deployments/container/cpu-speech-to-text.mdx b/docs/deployments/container/cpu-speech-to-text.mdx index 7c9c79a0..8aefcd85 100644 --- a/docs/deployments/container/cpu-speech-to-text.mdx +++ b/docs/deployments/container/cpu-speech-to-text.mdx @@ -257,7 +257,7 @@ The first-session loading time can be reduced down to several hundred millisecon You can enable this feature by setting the `SM_PREWARM_ENGINE_MODES` environment variable, with a semicolon separated list describing the required engine modes. For example, to prewarm 1 English GPU Standard and 2 English GPU Enhanced: `SM_PREWARM_ENGINE_MODES='en_general_gpu_standard:1;en_general_gpu_enhanced:2'` -In general, the format is: `{language}_{domain}_{processor}_{operating_point}:{prewarm_connections}`. +In general, the format is: `{language}_{domain}_{processor}_{model}:{prewarm_connections}`. The parameters are: - `language` - One of the supported [language codes](/speech-to-text/languages) @@ -266,7 +266,7 @@ The parameters are: - `processor` - One of `cpu` or `gpu`. Note that selecting `gpu` requires a [GPU Inference Container](/deployments/container/gpu-speech-to-text) -- `operating_point` - One of `standard` or `enhanced`. The [operating point](/speech-to-text/models) you want to prewarm +- `model` - One of `standard` or `enhanced`. The [model](/speech-to-text/models) you want to prewarm - `prewarm_connections` - Integer. The number of engine instances of the specific mode you want to pre-warm. The total number of `prewarm_connections` cannot be greater than `SM_MAX_CONCURRENT_CONNECTIONS`. After the pre-warming is complete, this parameter does not limit the types of connections the engine can start. diff --git a/docs/deployments/container/gpu-speech-to-text.mdx b/docs/deployments/container/gpu-speech-to-text.mdx index 46d278f6..4e70f2a3 100644 --- a/docs/deployments/container/gpu-speech-to-text.mdx +++ b/docs/deployments/container/gpu-speech-to-text.mdx @@ -105,14 +105,16 @@ The server can only support one of these modes at once. Once the GPU Server is running, follow the [Instructions for Linking a CPU Container](/deployments/container/cpu-speech-to-text#linking-to-a-gpu-inference-container). -### Running only one operating point +### Running only one model -[Operating Points](/speech-to-text/models) represent different levels of model complexity. -To save GPU memory for throughput, you can run the server with only one Operating Point loaded. To do this, pass the -`SM_OPERATING_POINT` environment variable to the container and set it to either `standard` or `enhanced`. +[Models](/speech-to-text/models) (previously called Operating Points) represent different levels of model complexity. +To save GPU memory for throughput, you can run the server with only one model loaded. To do this, pass the +`SM_MODEL` environment variable to the container and set it to either `standard` or `enhanced`. + +`SM_MODEL` replaces the older `SM_OPERATING_POINT` environment variable. `SM_OPERATING_POINT` is deprecated but still works and accepts the same `standard` and `enhanced` values; use `SM_MODEL` going forward. :::info -When running the all language standard Operating Point GPU inference server you must set the `SM_OPERATING_POINT` environment variable to `standard` +When running the all language standard model GPU inference server you must set the `SM_MODEL` environment variable to `standard` ::: ### Monitoring the server @@ -121,7 +123,7 @@ The inference server is based on [Nvidia's Triton architecture](https://develope can be monitored using Triton's inbuilt Prometheus metrics, or the GRPC/HTTP APIs. To expose these, configure an external mapping for port 8002(Prometheus) or 8000(HTTP). -### Operating points in GPU inference +### Models in GPU inference When inference is outsourced to a GPU server, alternative GPU-specific models are used, so you should not expect to see identical results compared to CPU-based inference. For convenience, the GPU models are also designated as 'standard' and 'enhanced'. diff --git a/docs/deployments/container/gpu-translation.mdx b/docs/deployments/container/gpu-translation.mdx index ed4e6fc2..80fe5740 100644 --- a/docs/deployments/container/gpu-translation.mdx +++ b/docs/deployments/container/gpu-translation.mdx @@ -103,7 +103,7 @@ Assuming the following config file: { "type": "transcription", "transcription_config": { - "operating_point": "enhanced", + "model": "enhanced", "language": "en" }, "translation_config": { diff --git a/docs/deployments/container/performance-and-cost.mdx b/docs/deployments/container/performance-and-cost.mdx index 87ec4ca9..bfc6d5b4 100644 --- a/docs/deployments/container/performance-and-cost.mdx +++ b/docs/deployments/container/performance-and-cost.mdx @@ -11,7 +11,7 @@ This is a comparison of the performance and estimated running costs of transcrip ### Batch transcription -| Operating Point | [CPU Standard](./cpu-speech-to-text) | [CPU Enhanced](./cpu-speech-to-text) | [GPU Standard](./gpu-speech-to-text) | [GPU Enhanced](./gpu-speech-to-text) | +| Models | [CPU Standard](./cpu-speech-to-text) | [CPU Enhanced](./cpu-speech-to-text) | [GPU Standard](./gpu-speech-to-text) | [GPU Enhanced](./gpu-speech-to-text) | |--------------------------------------------|--------------|--------------|--------------|--------------| | Lowest Processing Cost (US ¢ per hour) | 1.7 | 3.8 | 0.34 | 1.67 | | Cost vs CPU Standard (%) | - | 224% | 20% | 98% | @@ -31,13 +31,13 @@ The benchmark uses the following configuration: | Price Basis | Azure PAYG East US, Linux, Standard | :::note -For GPU Operating Points, transcribers and inference servers were all run on a single VM node. +For GPU Models, transcribers and inference servers were all run on a single VM node. ::: ### Realtime transcription -| Operating Point | [CPU Standard](./cpu-speech-to-text#realtime-transcription) | [CPU Enhanced](./cpu-speech-to-text#realtime-transcription) | [GPU Standard](./gpu-speech-to-text#batch-and-real-time-inference) | [GPU Enhanced](./gpu-speech-to-text#batch-and-real-time-inference) | +| Models | [CPU Standard](./cpu-speech-to-text#realtime-transcription) | [CPU Enhanced](./cpu-speech-to-text#realtime-transcription) | [GPU Standard](./gpu-speech-to-text#batch-and-real-time-inference) | [GPU Enhanced](./gpu-speech-to-text#batch-and-real-time-inference) | |--------------------------------------------|--------------|--------------|--------------|--------------| | Lowest Processing Cost (US ¢ per hour) | 1.97 | 2.95 | 0.86 | 2.51 | | Cost vs. CPU Standard (%) | - | 150% | 44% | 127% | @@ -55,7 +55,7 @@ This benchmark uses the following configuration[^4]: | Price Basis | Azure PAYG East US, Linux, Standard | :::note -For GPU Operating Points, the transcribers and inference servers were run on a single VM node. +For GPU Models, the transcribers and inference servers were run on a single VM node. Each first session, transcriber requires 0.25 cores for both OPs, with 1.2 GB memory (Standard OP) or 3 GB memory (Enhanced OP). Every additional session consumes 0.1 cores and 100 MB of memory. ::: diff --git a/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js b/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js index bde709ce..c349ad10 100644 --- a/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js +++ b/docs/speech-to-text/batch/assets/file-transcription-quickstart.example.js @@ -20,7 +20,7 @@ async function transcribeFile() { { transcription_config: { language: "en", - operating_point: "enhanced", + model: "enhanced", }, }, "json-v2", diff --git a/docs/speech-to-text/features/audio-events.mdx b/docs/speech-to-text/features/audio-events.mdx index 1457d317..d7f1a3f5 100644 --- a/docs/speech-to-text/features/audio-events.mdx +++ b/docs/speech-to-text/features/audio-events.mdx @@ -293,4 +293,4 @@ An example of a request only for `applause` and `music` - Audio Events is supported only in the JSON type API response - While the occurrence of music can be detected, richer metadata about the music such as title, artist, genre, etc cannot be identified - Only one instance of an event type can be tracked at a point in time. e.g. seamlessly switching consecutive songs will be detected as one single music event -- For On-Prem Containers, Audio Events is available only for GPU Operating Points +- For On-Prem Containers, Audio Events is available only for GPU Models