diff --git a/microsoft-edge/toc.yml b/microsoft-edge/toc.yml index c48cf27696..d8b64454ed 100644 --- a/microsoft-edge/toc.yml +++ b/microsoft-edge/toc.yml @@ -135,6 +135,9 @@ - name: Detect languages with the Language Detector API href: ./web-platform/languagedetector-api.md + + - name: Convert speech to text with the SpeechRecognition API + href: ./web-platform/speech-recognition-api.md # /AI for the web # ============================================================================= # ----------------------------------------------------------------------------- diff --git a/microsoft-edge/web-platform/prompt-api-images/api-and-model-ready.png b/microsoft-edge/web-platform/prompt-api-images/api-and-model-ready.png index a0bb080cbe..0971c94b7b 100644 Binary files a/microsoft-edge/web-platform/prompt-api-images/api-and-model-ready.png and b/microsoft-edge/web-platform/prompt-api-images/api-and-model-ready.png differ diff --git a/microsoft-edge/web-platform/prompt-api-images/flags-prompt-api.png b/microsoft-edge/web-platform/prompt-api-images/flags-prompt-api.png index 3aa6ee5f03..3690d6e906 100644 Binary files a/microsoft-edge/web-platform/prompt-api-images/flags-prompt-api.png and b/microsoft-edge/web-platform/prompt-api-images/flags-prompt-api.png differ diff --git a/microsoft-edge/web-platform/prompt-api-images/model-downloading.png b/microsoft-edge/web-platform/prompt-api-images/model-downloading.png index b99ee11ffa..f9f43a40e0 100644 Binary files a/microsoft-edge/web-platform/prompt-api-images/model-downloading.png and b/microsoft-edge/web-platform/prompt-api-images/model-downloading.png differ diff --git a/microsoft-edge/web-platform/prompt-api-images/prerelease-model-flag-for-prompt-api.png b/microsoft-edge/web-platform/prompt-api-images/prerelease-model-flag-for-prompt-api.png new file mode 100644 index 0000000000..bdd8c665ec Binary files /dev/null and b/microsoft-edge/web-platform/prompt-api-images/prerelease-model-flag-for-prompt-api.png differ diff --git a/microsoft-edge/web-platform/prompt-api-images/prompting.png b/microsoft-edge/web-platform/prompt-api-images/prompting.png index 6f22e288a2..75e61a473c 100644 Binary files a/microsoft-edge/web-platform/prompt-api-images/prompting.png and b/microsoft-edge/web-platform/prompt-api-images/prompting.png differ diff --git a/microsoft-edge/web-platform/prompt-api.md b/microsoft-edge/web-platform/prompt-api.md index 4fa46ead89..10068f4677 100644 --- a/microsoft-edge/web-platform/prompt-api.md +++ b/microsoft-edge/web-platform/prompt-api.md @@ -5,22 +5,27 @@ author: MSEdgeTeam ms.author: msedgedevrel ms.topic: article ms.service: microsoft-edge -ms.date: 05/19/2025 +ms.date: 06/02/2025 --- # Prompt a built-in language model with the Prompt API -The [Prompt API](https://github.com/webmachinelearning/prompt-api) is an experimental web API that allows you to prompt a small language model (SLM) that is built into Microsoft Edge, from your website's or browser extension's JavaScript code. Use the Prompt API to generate and analyze text or create application logic based on user input, and discover innovative ways to integrate prompt engineering capabilities into your web application. +The [Prompt API](https://webmachinelearning.github.io/prompt-api/) is an experimental web API that allows you to prompt a small language model (SLM) that is built into Microsoft Edge, from your website's or browser extension's JavaScript code. Use the Prompt API to generate and analyze text or create application logic based on user input, and discover innovative ways to integrate prompt engineering capabilities into your web application. **Detailed contents:** * [Availability of the Prompt API](#availability-of-the-prompt-api) * [Alternatives to and benefits of the Prompt API](#alternatives-to-and-benefits-of-the-prompt-api) +* [Small language models built into Microsoft Edge](#small-language-models-built-into-microsoft-edge) * [The Phi-4-mini model](#the-phi-4-mini-model) * [Disclaimer](#disclaimer) * [Hardware requirements](#hardware-requirements) - * [Model availability](#model-availability) + * [Availability of the Phi-4-mini model](#availability-of-the-phi-4-mini-model) +* [The Aion-1.0-Instruct model](#the-aion-10-instruct-model) + * [Enable Aion-1.0-Instruct for the Prompt API](#enable-aion-10-instruct-for-the-prompt-api) + * [Disclaimer](#disclaimer-1) + * [Availability of the Aion-1.0-Instruct model](#availability-of-the-aion-10-instruct-model) * [Enable the Prompt API](#enable-the-prompt-api) * [See a working example](#see-a-working-example) * [Use the Prompt API](#use-the-prompt-api) @@ -30,7 +35,6 @@ The [Prompt API](https://github.com/webmachinelearning/prompt-api) is an experim * [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download) * [Provide the model with a system prompt](#provide-the-model-with-a-system-prompt) * [N-shot prompting with initialPrompts](#n-shot-prompting-with-initialprompts) - * [Set topK and temperature](#set-topk-and-temperature) * [Clone a session to start the conversation again with the same options](#clone-a-session-to-start-the-conversation-again-with-the-same-options) * [Prompt the model](#prompt-the-model) * [Wait for the final response](#wait-for-the-final-response) @@ -48,13 +52,13 @@ The [Prompt API](https://github.com/webmachinelearning/prompt-api) is an experim ## Availability of the Prompt API -The Prompt API is available as a developer preview in Microsoft Edge Canary or Dev channels, starting with version 138.0.3309.2. +The Prompt API is available as a developer preview in the Microsoft Edge Canary and Edge Dev channels, starting with version 138.0.3309.2. The Prompt API is intended to help discover use cases and understand challenges for built-in SLMs. This API is expected to be succeeded by other experimental APIs for specific AI-powered tasks such as writing assistance and text translation. To learn more about these other APIs, see: * [Summarize, write, and rewrite text with the writing assistance APIs](./writing-assistance-apis.md) - -* The [webmachinelearning / translation-api](https://github.com/webmachinelearning/translation-api) repo. +* [Translate text with the Translator API](./translator-api.md) +* [Detect languages with the Language Detector API](./languagedetector-api.md) @@ -72,7 +76,7 @@ The Prompt API uses an SLM that runs on the same device where the inputs to and * **Network independence:** Beyond the initial model download, there's no network latency when prompting the model, and may also be used when the device is offline. -* **Improved privacy:** The data input to the model never leaves the device and is not collected to train AI models. +* **Improved privacy:** The data input to the model never leaves the device, and isn't collected to train AI models. The Prompt API uses a model that's provided by Microsoft Edge and built into the browser, which comes with the additional benefits over custom local solutions such as those based on WebGPU, WebNN, or WebAssembly: @@ -81,10 +85,20 @@ The Prompt API uses a model that's provided by Microsoft Edge and built into the * **Simplified usage for web developers:** The built-in model can be run by using straightforward web APIs and doesn't require AI/ML expertise or using third-party frameworks. + +## Small language models built into Microsoft Edge + +In the Microsoft Edge Canary and Dev channels, starting with version 138.0.3309.2, the Prompt API uses the Phi-4-mini model, which is built into Microsoft Edge. + +Starting with version 150.0.4070, the Prompt API can also be used with the prerelease Aion-1.0-Instruct model, which is also built into Microsoft Edge. Aion-1.0-Instruct is a smaller, faster, and more efficient model than Phi-4-mini, and is supported on devices with less capable GPUs or no GPU, via CPU-inferencing. If the performance class of your device isn't high enough to support Phi-4-mini, you can test the prerelease Aion-1.0-Instruct model. + +To learn more about both models, and how to enable Aion-1.0-Instruct, read the sections below. + + ## The Phi-4-mini model -The Prompt API allows you to prompt Phi-4-mini — a powerful small language model that excels at text-based tasks — built into Microsoft Edge. To learn more about Phi-4-mini and its capabilities, see the model card at [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct). +The Prompt API allows you to prompt Phi-4-mini, which is built into Microsoft Edge. Phi-4-mini is a powerful small language model that excels at text-based tasks. To learn more about Phi-4-mini and its capabilities, see the model card at [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct). @@ -104,7 +118,7 @@ The Prompt API developer preview is intended to work on devices with hardware ca * **GPU:** 5.5 GB of VRAM or more. -* **Network:** Unlimited data plan or unmetered connection. The model is not downloaded if using a metered connection. +* **Network:** Unlimited data plan or unmetered connection. The model isn't downloaded if using a metered connection. To check if your device supports the Prompt API developer preview, see [Enable the Prompt API](#enable-the-prompt-api) below and check your device performance class. @@ -112,9 +126,49 @@ Due to the experimental nature of the Prompt API, you might observe issues on sp -#### Model availability +#### Availability of the Phi-4-mini model + +An initial download of the Phi-4-mini model is required the first time that a website calls an API that requires an on-device model. You can monitor the downloading of the Phi-4-mini model by using the monitor option when creating a new Prompt API session. To learn more, see [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download), below. + + + +## The Aion-1.0-Instruct model + +In Microsoft Edge Canary or Edge Dev, starting with version 150.0.4070, the Prompt API can also be used with the prerelease Aion-1.0-Instruct model, which is built into Microsoft Edge. + +This Aion-1.0-Instruct model is significantly smaller, faster, and more efficient than Phi-4-mini, and is supported on devices with less capable GPUs or no GPU, via CPU-inferencing. + +Aion-1.0-Instruct is expected to be made available as an open source model in July 2026. + + + +#### Enable Aion-1.0-Instruct for the Prompt API + +By default, the Prompt API uses the Phi-4-mini model. To use Aion-1.0-Instruct in Microsoft Edge Canary or Edge Dev, enable the **Enable prerelease on-device language model** flag, as described in the steps below. When this flag is enable, Aion-1.0-Instruct overrides Phi-4-mini as the default model for the Prompt API. + +1. Make sure you're using the latest version of Edge Canary or Edge Dev (version 150.0.4070 or later). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). + +1. In Edge Canary or Edge Dev, open a new tab or window and go to `edge://flags`. + +1. In the search box at the top of the page, enter **Enable prerelease on-device language model**. + +1. In the **Enable prerelease on-device language model** drop-down list, select **Enabled**, and then click the **Restart** button: + + ![Flags page showing the prerelease on-device language model flag](./prompt-api-images/prerelease-model-flag-for-prompt-api.png) + +1. To check that Aion-1.0-Instruct is being used as the on-device language model, go to `edge://on-device-internals`, click **Model Status**, and check that **Model Name** is set to **Aion-1.0-Instruct**. + + + +#### Disclaimer + +The Aion-1.0-Instruct model is made available in Microsoft Edge 150.0.4070 for early developer testing and feedback. In addition to the Responsible AI considerations listed above, note that, given its prerelease state, model behaviors and capabilities are subject to change. + + + +#### Availability of the Aion-1.0-Instruct model -An initial download of the model will be required the first time a website calls a built-in AI API. You can monitor the model download by using the monitor option when creating a new Prompt API session. To learn more, see [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download), below. +An initial download of the Aion-1.0-Instruct model is required the first time that a website calls an API that requires an on-device model. You can monitor the downloading of the Aion-1.0-Instruct model by using the monitor option when creating a new Prompt API session. To learn more, see [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download), below. @@ -122,25 +176,29 @@ An initial download of the model will be required the first time a website calls To use the Prompt API in Microsoft Edge: -1. Make sure you're using the latest version of Microsoft Edge Canary or Dev (version 138.0.3309.2 or newer). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). +1. Make sure you're using the latest version of Microsoft Edge Canary or Edge Dev (version 138.0.3309.2 or later). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). -1. In Microsoft Edge Canary or Dev, open a new tab or window and go to `edge://flags/`. +1. In Edge Canary or Edge Dev, open a new tab or window and go to `edge://flags/`. -1. In the search box, at the top of the page, enter **Prompt API for Phi mini**. +1. In the search box, at the top of the page, enter **Prompt API for on-device language model**. The page is filtered to show the matching flag. -1. Under **Prompt API for Phi mini**, select **Enabled**: +1. Under **Prompt API for on-device language model**, select **Enabled**: ![Flags page of browser](./prompt-api-images/flags-prompt-api.png) 1. Optionally, to log information locally that may be useful for debugging issues, also enable the **Enable on device AI model debug logs** flag. -1. Restart Microsoft Edge Canary or Dev. +1. Restart Edge Canary or Edge Dev. 1. To check if your device meets the hardware requirements for the Prompt API developer preview, open a new tab, go to `edge://on-device-internals`, and check the **Device performance class** value. - If your device performance class is **High** or greater, the Prompt API should be supported on your device. If you continue to notice issues, please [Create a new issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?template=prompt-api.md) in the MSEdgeExplainers repo. + If your device performance class is **High** or greater, the Prompt API should be supported on your device. + + If your device performance class is **Medium** or **Low**, the Prompt API is only supported through the prerelease Aion-1.0-Instruct model, which is available starting with Edge version 150.0.4070. To test the Aion-1.0-Instruct model, see [Enable Aion-1.0-Instruct for the Prompt API](#enable-aion-10-instruct-for-the-prompt-api), above. + + If you notice issues with these models, please [Create a new issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?template=prompt-api.md) in the MSEdgeExplainers repo. @@ -150,7 +208,7 @@ To see the Prompt API in action, and review existing code that uses the API: 1. [Enable the Prompt API](#enable-the-prompt-api), as described above. -1. In Microsoft Edge Canary or Dev browser, open a tab or window and go to the [Prompt API playground](https://microsoftedge.github.io/Demos/built-in-ai/playgrounds/prompt-api/). +1. In Edge Canary or Edge Dev, open a tab or window and go to the [Prompt API playground](https://microsoftedge.github.io/Demos/built-in-ai/playgrounds/prompt-api/). In the **Built-in AI playgrounds** navigation on the left, **Prompt** is selected. @@ -171,8 +229,6 @@ To see the Prompt API in action, and review existing code that uses the API: * **System prompt** * **Response constraint schema** * **More settings** > **N-shot prompt instructions** - * **TopK** - * **Temperature** 1. Click the **Prompt** button, at the bottom of the page. @@ -251,8 +307,6 @@ The available options are: * `initialPrompts`, to give the model context about the prompts which will be sent to the model, and to establish a pattern of user/assistant interactions that the model should follow for future prompts. -* `topK` and `temperature`, to adjust the coherence and determinism of the model output. - These options are documented below. @@ -326,26 +380,6 @@ const session = await LanguageModel.create({ ``` - -###### Set topK and temperature - -`topK` and `temperature` are known as _sampling parameters_ and are used by the model to influence the generation of text. - -* TopK sampling limits the number of words considered for each subsequent word in the generated text, which can speed up the generation process and lead to more coherent outputs but also reduce diversity. - -* Temperature sampling controls the randomness of the output. A lower temperature results in less random outputs, favoring higher probability words, and thus producing more deterministic text. - -Set the `topK` and `temperature` options, to configure the model's sampling parameters: - -```javascript -// Create a LanguageModel session and setting the topK and temperature options. -const session = await LanguageModel.create({ - topK: 10, - temperature: 0.7 -}); -``` - - #### Clone a session to start the conversation again with the same options @@ -359,9 +393,7 @@ const firstSession = await LanguageModel.create({ initialPrompts: [ role: "system", content: "You are a helpful assistant." - ], - topK: 10, - temperature: 0.7 + ] }); // Later, create a new session by cloning the first session to start a new @@ -487,8 +519,6 @@ console.log(`Sentiment: ${sentiment}`); console.log(`Confidence: ${confidence}`); ``` -To learn more, see [Structured output with JSON schema or RegExp constraints](https://github.com/webmachinelearning/prompt-api?tab=readme-ov-file#structured-output-with-json-schema-or-regexp-constraints). - ###### Send multiple messages per prompt @@ -576,19 +606,26 @@ controller.abort(); ## Send feedback -The Prompt API developer preview is intended to help discover use-cases for browser-provided language models. We're very interested in learning about the range of scenarios for which you intend to use the Prompt API, any issues with the API or language models, and whether new task-specific APIs, such as for proofreading or translation, would be useful. +The Prompt API developer preview is intended to help discover use-cases for browser-provided language models. + +We're interested in learning about: +* The range of scenarios for which you intend to use the Prompt API. +* Any issues with the Prompt API. +* Any issues with the language models. +* Whether new task-specific APIs would be useful. To send feedback about your scenarios and the tasks you want to achieve, please add a comment to [the Prompt API feedback issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/1012). If you notice any issues when using the API instead, please [report it on the repo](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?template=prompt-api.md). -You can also contribute to the discussion about the design of the Prompt API at the [W3C Web Machine Learning Working Group repository](https://github.com/webmachinelearning/prompt-api). +You can also contribute to the discussion about the design of the Prompt API at the [W3C Web Machine Learning Working Group repository](https://github.com/webmachinelearning/prompt-api/). ## See also -* [Explainer for the Prompt API](https://github.com/webmachinelearning/prompt-api), on the Web Machine Learning GitHub repo. +* [Prompt API draft specification](https://webmachinelearning.github.io/prompt-api/) +* [webmachinelearning/prompt-api GitHub repo](https://webmachinelearning.github.io/prompt-api/) * [Write, rewrite, and summarize text with the Writing Assistance APIs](./writing-assistance-apis.md) * [Correct grammar, spelling, and punctuation errors in text with the Proofreader API](./proofreader-api.md) * [Translate text with the Translator API](./translator-api.md) diff --git a/microsoft-edge/web-platform/speech-recognition-api-images/flag.png b/microsoft-edge/web-platform/speech-recognition-api-images/flag.png new file mode 100644 index 0000000000..0309c53cb8 Binary files /dev/null and b/microsoft-edge/web-platform/speech-recognition-api-images/flag.png differ diff --git a/microsoft-edge/web-platform/speech-recognition-api-images/installing.png b/microsoft-edge/web-platform/speech-recognition-api-images/installing.png new file mode 100644 index 0000000000..7b30e1ab90 Binary files /dev/null and b/microsoft-edge/web-platform/speech-recognition-api-images/installing.png differ diff --git a/microsoft-edge/web-platform/speech-recognition-api-images/speaking.png b/microsoft-edge/web-platform/speech-recognition-api-images/speaking.png new file mode 100644 index 0000000000..a7a216f632 Binary files /dev/null and b/microsoft-edge/web-platform/speech-recognition-api-images/speaking.png differ diff --git a/microsoft-edge/web-platform/speech-recognition-api.md b/microsoft-edge/web-platform/speech-recognition-api.md new file mode 100644 index 0000000000..0f0fe9d315 --- /dev/null +++ b/microsoft-edge/web-platform/speech-recognition-api.md @@ -0,0 +1,299 @@ +--- +title: Convert speech to text with the SpeechRecognition API +description: Convert speech to text with the SpeechRecognition API. +author: MSEdgeTeam +ms.author: msedgedevrel +ms.topic: article +ms.service: microsoft-edge +ms.date: 06/01/2026 +--- +# Convert speech to text with the SpeechRecognition API + +The SpeechRecognition API is a standard web API that enables converting speech, from an audio source such as a media file or device microphone, into text, directly from a website's or browser extensions's JavaScript code. This article focuses on using the SpeechRecognition API with the on‑device (or local) speech recognition model that is built into Microsoft Edge. + +For more information about the API, see [Web Speech API](https://developer.mozilla.org/docs/Web/API/Web_Speech_API), at MDN. + +**Detailed contents:** +* [Availability of the local speech recognition model](#availability-of-the-local-speech-recognition-model) +* [Benefits of the local speech recognition model](#benefits-of-the-local-speech-recognition-model) + * [Model availability](#model-availability) +* [Enable local speech recognition in Microsoft Edge](#enable-local-speech-recognition-in-microsoft-edge) +* [See a working example](#see-a-working-example) +* [Use the SpeechRecognition API with local recognition in your website](#use-the-speechrecognition-api-with-local-recognition-in-your-website) + * [Check if the API is supported and instantiate a SpeechRecognition object](#check-if-the-api-is-supported-and-instantiate-a-speechrecognition-object) + * [Choose an input language and opt-in to local recognition](#choose-an-input-language-and-opt-in-to-local-recognition) + * [Check whether the local model is already installed](#check-whether-the-local-model-is-already-installed) + * [Start speech recognition](#start-speech-recognition) + * [Stop recognition explicitly and on media end](#stop-recognition-explicitly-and-on-media-end) +* [Send feedback](#send-feedback) +* [See also](#see-also) + + + +## Availability of the local speech recognition model + +The local speech recognition model is available in Microsoft Edge Canary or Dev (version 150.0.4076 or later). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). + + + +## Benefits of the local speech recognition model + +When using the SpeechRecognition API with the local model in Microsoft Edge, speech recognition happens on the same device where the speech is captured. This approach has the following benefits compared to cloud-based solutions: + +* **Reduced cost:** There's no cost associated with using a cloud recognition service. + +* **Network independence:** Beyond the initial model download, there's no network latency when using this API to convert speech, and the API can also be used when the device is offline. + +* **Improved privacy:** The speech input into the model never leaves the device, and isn't collected to train AI models. + + + +#### Model availability + +An initial download of the model is required the first time that a website uses the local speech recognition model with the SpeechRecognition API. + +You can monitor the model download by using the promise that's returned by the SpeechRecognition API `install()` method. See [Check whether the local model is already installed](#check-whether-the-local-model-is-already-installed), below. + + + +## Enable local speech recognition in Microsoft Edge + +To use the local speech recognition model with the SpeechRecognition API, you need to enable the feature in Microsoft Edge Canary or Dev. To enable speech recognition using the on-device model: + +1. Make sure you're using Microsoft Edge Canary or Dev (version 150.0.4076 or newer). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). + +1. In Microsoft Edge Canary or Dev, open a new tab or window and go to `edge://flags`. + +1. In the search box at the top of the page, enter **Speech Recognition with on-device model**. + +1. In the **Speech Recognition with on-device model** drop-down list, select **Enabled**, and then click the **Restart** button in the lower right: + + ![Flags page of browser](./speech-recognition-api-images/flag.png) + + + +## See a working example + +To see the SpeechRecognition API in action and view the demo code: + +1. [Enable local speech recognition in Microsoft Edge](#enable-local-speech-recognition-in-microsoft-edge), as described above. + +1. In Microsoft Edge Canary or Dev, open a tab or window and go to [SpeechRecognition API playground](https://microsoftedge.github.io/Demos/built-in-ai/playgrounds/speechrecognition-api/). + +1. In the information banner at the top, check the status: it initially reads **SpeechRecognition API ready. Click Start to begin.** + +1. In the **Input language** drop-down list, select the language that you want to use for speech recognition. + +1. In the **Audio source** drop-down list, select an audio source for speech recognition: + + * Select **Microphone** to use your device microphone as the audio source. + * Select **File** to use an audio or video file from your device as the audio source. + +1. If you selected **File** as the audio source, a **Media file** section is displayed. Click the **Choose File** button, and then select an audio or video file from your device. + +1. Click the **Start** button. + + If you haven't already downloaded the local speech recognition model for the selected language, the download starts and the information banner reads **Installing on-device model for en-US...**: + + ![Installation of the on-device speech recognition model](./speech-recognition-api-images/installing.png) + + After the model is installed, the text transcription is displayed in the page: + + ![Conversion of speech to text](./speech-recognition-api-images/speaking.png) + +1. To stop converting speech to text, at any time, click the **Stop** button. + + The transcription might also stop automatically after a long period of silence in the input audio. + +See also: +* [/built-in-ai/playgrounds/speechrecognition-api/](https://github.com/MicrosoftEdge/Demos/tree/main/built-in-ai/playgrounds/speechrecognition-api) - Source code for the SpeechRecognition API playground demo. + + + +## Use the SpeechRecognition API with local recognition in your website + +The following sections describe how to use the SpeechRecognition API with local speech recognition in your website's code. For more details about the API itself, see [Web Speech API](https://developer.mozilla.org/docs/Web/API/Web_Speech_API), at MDN. + + + +#### Check if the API is supported and instantiate a SpeechRecognition object + +To ensure that the SpeechRecognition API is supported in the browser, test whether the `SpeechRecognition` object is available: + +```js +if (!window.SpeechRecognition) { + console.log("The SpeechRecognition API is not available in this browser."); +} else { + console.log("The SpeechRecognition API is available."); +} +``` + +If the API is supported, create a new `SpeechRecognition` instance to start using the API: + +```js +const recognition = new SpeechRecognition(); +``` + +See also: +* [SpeechRecognition](https://developer.mozilla.org/docs/Web/API/SpeechRecognition), at MDN. + + + +#### Choose an input language and opt-in to local recognition + +To configure speech recognition by using a local model, specify an input language and set the `processLocally` option: + +```js +recognition.lang = "en-US"; +recognition.processLocally = true; +``` + +As of Microsoft Edge 150.0.4076, the following input languages are supported for local speech recognition: +* English (en-US) +* German (de-DE) +* Italian (it-IT) +* Portuguese (pt-PT) +* Spanish (es-ES) +* Korean (ko-KR) + +Language support is expected to expand in future versions. + +Also set the `continuous` and `interimResults` options to `true` to transcribe long audio sessions without stopping and receive interim results: + +```js +recognition.continuous = true; +recognition.interimResults = true; +``` + +See also: +* [SpeechRecognition: lang property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/lang), at MDN. +* [SpeechRecognition: processLocally property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/processLocally), at MDN. +* [SpeechRecognition: continuous property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/continuous), at MDN. +* [SpeechRecognition: interimResults property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/interimResults), at MDN. + + + +#### Check whether the local model is already installed + +Before starting recognition, check if the local model is available for your selected language by using the `SpeechRecognition.available()` method. + +If the model is not yet installed, trigger the installation by using the `SpeechRecognition.install()` method and wait for the model to complete before starting recognition: + +```js +async function ensureModelReady(lang) { + // Check if the model is already available. + const availability = await SpeechRecognition.available({ + langs: [lang], + processLocally: true, + }); + + // If the model is already available, proceed to recognition. + if (availability === "available") { + return true; + } + + // If the model is not available but can be downloaded, + // trigger the installation and wait for it to complete + // before proceeding to recognition. + if (availability === "downloadable" || availability === "downloading") { + const installed = await SpeechRecognition.install({ + langs: [lang], + processLocally: true, + }); + + if (!installed) { + throw new Error(`Failed to install local model for ${lang}.`); + } + + return true; + } + + return false; +} +``` + +The promise returned by `SpeechRecognition.install()` resolves when installation succeeds or fails. + +See also: +* [SpeechRecognition: available() static method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/available_static), at MDN. +* [SpeechRecognition: install() static method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/install_static), at MDN. + + + +#### Start speech recognition + +After you've made sure that the API and model are both ready, to start recognition, use the `start()` method. + +When called without a parameter, the `start()` method recognizes audio from the user's microphone: + +```js +recognition.start(); +``` + +To recognize audio from a media file instead of from the user's microphone, pass a `MediaStreamTrack` instance as an argument to the `start()` method. For example, you can create a `MediaStreamTrack` instance by creating a `MediaStreamDestinationNode` instance by using the WebAudio API: + +```js +const audioContext = new AudioContext(); +const mediaStreamDestination = audioContext.createMediaStreamDestination(); +recognition.start(mediaStreamDestination.stream.getAudioTracks()[0]); +``` + +See also: +* [SpeechRecognition: start() method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/start), at MDN. +* [Web Audio API](https://developer.mozilla.org/docs/Web/API/Web_Audio_API), at MDN. + + + +#### Stop recognition explicitly and on media end + +To stop recognition, use the `stop()` method: + +```js +recognition.stop(); +``` + +You can also choose to stop recognition when the media input ends, by using the `onended` event handler of the media element that you're using as input. For example, if you're using a `HTMLAudioElement` or `HTMLVideoElement` as the audio source, you can set up the event handler as follows: + +```js +mediaElement.onended = () => recognition.stop(); +``` + +See also: +* [SpeechRecognition: stop() method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/stop), at MDN. + + + +## Send feedback + +We're interested in hearing your feedback about: +* The local speech recognition model. +* The performance of the local speech recognition model. +* Any other improvements you'd like to see for your use-cases. + +Please send feedback, by adding a comment to the [SpeechRecognition API feedback issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/1333). + + + +## See also + + +Microsoft: +* [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). + +MDN: +* [SpeechRecognition](https://developer.mozilla.org/docs/Web/API/SpeechRecognition) + * [SpeechRecognition: available() static method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/available_static) + * [SpeechRecognition: continuous property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/continuous) + * [SpeechRecognition: install() static method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/install_static) + * [SpeechRecognition: interimResults property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/interimResults) + * [SpeechRecognition: lang property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/lang) + * [SpeechRecognition: processLocally property](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/processLocally) + * [SpeechRecognition: start() method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/start) + * [SpeechRecognition: stop() method](https://developer.mozilla.org/docs/Web/API/SpeechRecognition/stop) +* [Web Audio API](https://developer.mozilla.org/docs/Web/API/Web_Audio_API) +* [Web Speech API](https://developer.mozilla.org/docs/Web/API/Web_Speech_API) + +GitHub: +* [SpeechRecognition API playground](https://microsoftedge.github.io/Demos/built-in-ai/playgrounds/speechrecognition-api/) + * [/built-in-ai/playgrounds/speechrecognition-api/](https://github.com/MicrosoftEdge/Demos/tree/main/built-in-ai/playgrounds/speechrecognition-api) - Source code. +* [SpeechRecognition API feedback issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/1333) diff --git a/microsoft-edge/web-platform/writing-assistance-apis-images/prerelease-model-flag-for-wa-apis.png b/microsoft-edge/web-platform/writing-assistance-apis-images/prerelease-model-flag-for-wa-apis.png new file mode 100644 index 0000000000..bdd8c665ec Binary files /dev/null and b/microsoft-edge/web-platform/writing-assistance-apis-images/prerelease-model-flag-for-wa-apis.png differ diff --git a/microsoft-edge/web-platform/writing-assistance-apis.md b/microsoft-edge/web-platform/writing-assistance-apis.md index 714e5e671b..43ff2cb5e0 100644 --- a/microsoft-edge/web-platform/writing-assistance-apis.md +++ b/microsoft-edge/web-platform/writing-assistance-apis.md @@ -5,7 +5,7 @@ author: MSEdgeTeam ms.author: msedgedevrel ms.topic: article ms.service: microsoft-edge -ms.date: 05/19/2025 +ms.date: 06/02/2025 --- # Summarize, write, and rewrite text with the Writing Assistance APIs @@ -19,10 +19,15 @@ For introductory information about the Summarizer API, Writer API, and Rewriter * [Availability of the Writing Assistance APIs](#availability-of-the-writing-assistance-apis) * [Alternatives to and benefits of the Writing Assistance APIs](#alternatives-to-and-benefits-of-the-writing-assistance-apis) +* [Small language models built into Microsoft Edge](#small-language-models-built-into-microsoft-edge) * [The Phi-4-mini model](#the-phi-4-mini-model) * [Disclaimer](#disclaimer) * [Hardware requirements](#hardware-requirements) - * [Model availability](#model-availability) + * [Availability of the Phi-4-mini model](#availability-of-the-phi-4-mini-model) +* [The Aion-1.0-Instruct model](#the-aion-10-instruct-model) + * [Enable Aion-1.0-Instruct for the Writing Assistance APIs](#enable-aion-10-instruct-for-the-writing-assistance-apis) + * [Disclaimer](#disclaimer-1) + * [Availability of the Aion-1.0-Instruct model](#availability-of-the-aion-10-instruct-model) * [Enable the Writing Assistance APIs](#enable-the-writing-assistance-apis) * [See working examples](#see-working-examples) * [Use the Writing Assistance APIs](#use-the-writing-assistance-apis) @@ -48,7 +53,7 @@ For introductory information about the Summarizer API, Writer API, and Rewriter ## Availability of the Writing Assistance APIs -The Writer API and the Rewriter APIs are available as a developer preview in Microsoft Edge Canary or Dev channels, starting with version 138.0.3309.2. +The Writer API and the Rewriter APIs are available as a developer preview in the Microsoft Edge Canary and Edge Dev channels, starting with version 138.0.3309.2. The Summarizer API has been enabled by default since Microsoft Edge 138. @@ -71,7 +76,7 @@ The Writing Assistance APIs use a small language model (SLM) that runs on the sa * **Network independence:** Beyond the initial model download, there's no network latency when prompting the model, and may also be used when the device is offline. -* **Improved privacy:** The data input to the model never leaves the device and is not collected to train AI models. +* **Improved privacy:** The data input to the model never leaves the device, and isn't collected to train AI models. The Writing Assistance APIs use a model that's provided by Microsoft Edge and built into the browser, which comes with the additional benefits over custom local solutions such as those based on WebGPU, WebNN, or WebAssembly: @@ -80,6 +85,16 @@ The Writing Assistance APIs use a model that's provided by Microsoft Edge and bu * **Simplified usage for web developers:** The built-in model can be run by using straightforward web APIs and doesn't require AI/ML expertise or using third-party frameworks. + +## Small language models built into Microsoft Edge + +In Microsoft Edge Canary and Edge Dev channels, starting with version 138.0.3309.2, the Writing Assistance APIs use the Phi-4-mini model, which is built into Microsoft Edge. + +Starting with version 150.0.4070, the Writing Assistance APIs can also be used with the prerelease Aion-1.0-Instruct model, which is also built into Microsoft Edge. Aion-1.0-Instruct is a smaller, faster, and more efficient model than Phi-4-mini, and is supported on devices with less capable GPUs or no GPU, via CPU-inferencing. If the performance class of your device isn't high enough to support Phi-4-mini, you can test the prerelease Aion-1.0-Instruct model. + +To learn more about both models, and how to enable Aion-1.0-Instruct, read the sections below. + + ## The Phi-4-mini model @@ -105,7 +120,7 @@ The Writing Assistance APIs are currently limited to: * **GPU:** 5.5 GB of VRAM or more. -* **Network:** Unlimited data plan or unmetered connection. The model is not downloaded if using a metered connection. +* **Network:** Unlimited data plan or unmetered connection. The model isn't downloaded if using a metered connection. To check if your device supports the Writing Assistance APIs developer preview, see [Enable the Writing Assistance APIs](#enable-the-writing-assistance-apis) below and check your device performance class. @@ -113,9 +128,49 @@ Due to the experimental nature of the Writing Assistance APIs, you might observe -#### Model availability +#### Availability of the Phi-4-mini model -An initial download of the model will be required the first time a website calls a built-in AI API. You can monitor the model download by using the monitor option when creating a new Summarizer, Writer, or Rewriter API session. To learn more, see [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download), below. +An initial download of the Phi-4-mini model is required the first time that a website calls a built-in AI API. You can monitor the downloading of the Phi-4-mini model by using the monitor option when creating a new Summarizer, Writer, or Rewriter API session. To learn more, see [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download), below. + + + +## The Aion-1.0-Instruct model + +In Microsoft Edge Canary or Edge Dev, starting with version 150.0.4070, the Writing Assistance APIs can also be used with the prerelease Aion-1.0-Instruct model, built into Microsoft Edge. + +This Aion-1.0-Instruct model is significantly smaller, faster, and more efficient than Phi-4-mini, and is supported on devices with less capable GPUs or no GPU, via CPU-inferencing. + +Aion-1.0-Instruct is expected to be made available as an open source model in July 2026. + + + +#### Enable Aion-1.0-Instruct for the Writing Assistance APIs + +By default, the Writing Assistance APIs use the Phi-4-mini model. To use Aion-1.0-Instruct in Microsoft Edge Canary or Edge Dev, enable the **Enable prerelease on-device language model** flag, as follows. When this flag is enable, Aion-1.0-Instruct overrides Phi-4-mini as the default model for the Writing Assistance APIs. + +1. Make sure you're using the latest version of Edge Canary or Edge Dev (version 150.0.4070 or later). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). + +1. In Edge Canary or Edge Dev, open a new tab or window and go to `edge://flags`. + +1. In the search box at the top of the page, enter **Enable prerelease on-device language model**. + +1. In the **Enable prerelease on-device language model** drop-down list, select **Enabled**, and then click the **Restart** button: + + ![Flags page showing the prerelease on-device language model flag](./writing-assistance-apis-images/prerelease-model-flag-for-wa-apis.png) + +1. To check that Aion-1.0-Instruct is being used as the on-device language model, go to `edge://on-device-internals`, click **Model Status**, and check that **Model Name** is set to **Aion-1.0-Instruct**. + + + +#### Disclaimer + +The Aion-1.0-Instruct model is made available in Microsoft Edge 150.0.4070 for early developer testing and feedback. In addition to the Responsible AI considerations listed above, note that, given its prerelease state, model behaviors and capabilities are subject to change. + + + +#### Availability of the Aion-1.0-Instruct model + +An initial download of the Aion-1.0-Instruct model is required the first time that a website calls an API that requires an on-device model. You can monitor the downloading of the Aion-1.0-Instruct model by using the monitor option when creating a new Summarizer, Writer, or Rewriter API session. To learn more, see [Monitor the progress of the model download](#monitor-the-progress-of-the-model-download), below. @@ -123,9 +178,9 @@ An initial download of the model will be required the first time a website calls To use the Writer API or the Rewriter API in Microsoft Edge: -1. Make sure you're using the latest version of Microsoft Edge Canary or Dev (version 138.0.3309.2 or newer). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). +1. Make sure you're using the latest version of Microsoft Edge Canary or Edge Dev (version 138.0.3309.2 or later). See [Become a Microsoft Edge Insider](https://www.microsoft.com/edge/download/insider). -1. In Microsoft Edge Canary or Dev, open a new tab or window and go to `edge://flags/`. +1. In Edge Canary or Edge Dev, open a new tab or window and go to `edge://flags/`. 1. In the search box, at the top of the page: @@ -142,11 +197,15 @@ To use the Writer API or the Rewriter API in Microsoft Edge: 1. Optionally, to log information locally that may be useful for debugging issues, also enable the **Enable on device AI model debug logs** flag. -1. Restart Microsoft Edge Canary or Dev. +1. Restart Edge Canary or Edge Dev. 1. To check if your device meets the hardware requirements for the Writing Assistance APIs developer preview, open a new tab, go to `edge://on-device-internals`, and check the **Device performance class** value. - If your device performance class is **High** or greater, the Writing Assistance APIs should be supported on your device. If you continue to notice issues, please [file a new issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?template=writing-assistance-api.md). + If your device performance class is **High** or greater, the Writing Assistance APIs should be supported on your device. + + If your device performance class is **Medium** or **Low**, the Writing Assistance APIs are only supported through the prerelease Aion-1.0-Instruct model, which is available starting with Edge version 150.0.4070. To test the Aion-1.0-Instruct model, see [Enable Aion-1.0-Instruct for the Writing Assistance APIs](#enable-aion-10-instruct-for-the-writing-assistance-apis), above. + + If you notice issues with these models, please [file a new issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/new?template=writing-assistance-api.md) in the MSEdgeExplainers repo. @@ -156,7 +215,7 @@ To see the Writing Assistance APIs in action, and review existing code that uses 1. [Enable the Writing Assistance APIs](#enable-the-writing-assistance-apis), as described above. -1. In Microsoft Edge Canary or Dev browser, open a tab or window and go to the [Summarizer API playground](https://microsoftedge.github.io/Demos/built-in-ai/playgrounds/summarizer-api/). +1. In Microsoft Edge Canary or Edge Dev, open a tab or window and go to the [Summarizer API playground](https://microsoftedge.github.io/Demos/built-in-ai/playgrounds/summarizer-api/). 1. In the **Built-in AI playgrounds** navigation on the left: @@ -540,7 +599,11 @@ controller.abort(); ## Send feedback -We're very interested in learning about the range of scenarios for which you intend to use the Writing Assistance APIs, any issues with the APIs or language models, and whether new task-specific APIs, such as for translation, would be useful. +We're interested in learning about: +* The range of scenarios for which you intend to use the Writing Assistance APIs. +* Any issues with the Writing Assistance APIs. +* Any issues with the language models. +* Whether new task-specific APIs would be useful. To send feedback about your scenarios and the tasks you want to achieve, please add a comment to [the Writing Assistance APIs feedback issue](https://github.com/MicrosoftEdge/MSEdgeExplainers/issues/1031).