Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 34 additions & 4 deletions sdk/voicelive/azure-ai-voicelive/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,38 @@
# Release History

## 1.0.0 (Unreleased)
## 1.0.0 (2026-05-25)

This is the first General Availability (GA) release of the Azure VoiceLive client library for Java.

### Breaking Changes

- Narrowed `VoiceLiveAsyncClient` session startup to three overloads:
- `startSession()`
- `startSession(String, VoiceLiveRequestOptions)`
- `startSession(AgentSessionConfig, VoiceLiveRequestOptions)`
- Renamed token-count accessors on token statistic models (JSON wire format unchanged):
- `CachedTokenDetails.getTextTokens()` / `getAudioTokens()` / `getImageTokens()` → `getTextTokenCount()` / `getAudioTokenCount()` / `getImageTokenCount()`
- `InputTokenDetails.getCachedTokens()` / `getTextTokens()` / `getAudioTokens()` / `getImageTokens()` → `getCachedTokenCount()` / `getTextTokenCount()` / `getAudioTokenCount()` / `getImageTokenCount()`
- `OutputTokenDetails.getTextTokens()` / `getAudioTokens()` / `getReasoningTokens()` → `getTextTokenCount()` / `getAudioTokenCount()` / `getReasoningTokenCount()`
- `ResponseTokenStatistics.getTotalTokens()` / `getInputTokens()` / `getOutputTokens()` → `getTotalTokenCount()` / `getInputTokenCount()` / `getOutputTokenCount()`
- `RequestImageContentPart` URL accessor renamed and JSON field changed:
- `getUrl()` / `setUrl(String)` → `getImageUrl()` / `setImageUrl(String)`
- JSON property `url` → `image_url`
- Renamed base event types for client↔server symmetry:
- `ClientEvent` (base for outbound events) → `SessionClientEvent`
- `SessionUpdate` (base for inbound events) → `SessionServerEvent`
- `VoiceLiveSessionAsyncClient.receiveEvents()` now returns `Flux<SessionServerEvent>`
- `VoiceLiveSessionAsyncClient.sendEvent(...)` now accepts `SessionClientEvent`
- Renamed MCP-related model types to Pascal case (`MCP*` → `Mcp*`): `McpApprovalType`, `McpServer`, `McpTool`, `McpApprovalResponseRequestItem`, `ResponseMcpApprovalRequestItem`, `ResponseMcpApprovalResponseItem`, `ResponseMcpCallItem`, `ResponseMcpListToolItem`.
- `VoiceLiveSessionAsyncClient.truncateConversation(String, int, int)` now accepts a `java.time.Duration` for the audio-end position instead of raw milliseconds. The two-argument overload (`itemId`, `contentIndex`) is preserved and defaults to `Duration.ZERO`.
- Removed `sendInputAudio(byte[])`; use `sendInputAudio(BinaryData)` (wrap raw bytes with `BinaryData.fromBytes(...)`).
- `AgentSessionConfig.toQueryParameters()` is no longer part of the public API; the conversion is handled internally by `VoiceLiveAsyncClient`.
- `VoiceLiveSessionOptions.setAnimation(...)` renamed to `setAnimationOptions(...)`.
- `AnimationOptions.setOutputs(...)` / `getOutputs()` renamed to `setOutputTypes(...)` / `getOutputTypes()`.
- `LogProbProperties.getLogprob()` renamed to `getLogProb()`.
- `SessionUpdateConversationItemInputAudioTranscriptionCompleted.getLogprobs()` renamed to `getLogProbs()`.
- Removed preview service versions from `VoiceLiveServiceVersion`; only GA versions remain (`V2025_10_01`, `V2026_04_10`). The latest version is now `V2026_04_10`.

### Features Added

- **Avatar voice synchronization** for video avatars:
Expand All @@ -18,19 +47,20 @@ This is the first General Availability (GA) release of the Azure VoiceLive clien
- **Transcription enhancements**:
- New transcription models on `AudioInputTranscriptionOptionsModel`: `GPT_4O_TRANSCRIBE_DIARIZE`, `MAI_TRANSCRIBE_1`
- New `TranscriptionPhrase` and `TranscriptionWord` types with timing/confidence information
- `SessionUpdateConversationItemInputAudioTranscriptionCompleted` now exposes `getLogprobs()` and `getPhrases()`
- `SessionUpdateConversationItemInputAudioTranscriptionCompleted` now exposes `getLogProbs()` and `getPhrases()`
- New `ServerEventResponseAudioTranscriptAnnotationAdded` event
- **Session include options and metadata**:
- New `SessionIncludeOption` expandable enum for opting into additional response payloads (e.g. logprobs, phrases, file-search results)
- `VoiceLiveSessionOptions` and `VoiceLiveSessionResponse` now expose `include` (`List<SessionIncludeOption>`) and `metadata` (`Map<String,String>`, up to 16 entries)
- **Personal voice models**: added `PersonalVoiceModels.DRAGON_HDOMNI_LATEST_NEURAL` and `MAI_VOICE_1`
- **Reasoning token usage**: `OutputTokenDetails.getReasoningTokens()` exposes reasoning token counts
- **Reasoning token usage**: `OutputTokenDetails.getReasoningTokenCount()` exposes reasoning token counts
- **Interim response on response.create**: `ResponseCreateParams.setInterimResponse(BinaryData)` lets callers attach interim response config to a single response request
- Restored no-arg `VoiceLiveAsyncClient.startSession()` overload (uses the deployment's default model).
- Significantly improved Javadoc for `ServerVadTurnDetection`, `AzureCustomVoice`, `AzurePersonalVoice`, `AzureStandardVoice`, `AzureSemanticVadTurnDetection*`, and other model types

### Other Changes

- Updated default service API version to track the latest TypeSpec spec.
- Updated default service API version to `2026-04-10` (GA).

## 1.0.0-beta.6 (2026-05-01)

Expand Down
38 changes: 24 additions & 14 deletions sdk/voicelive/azure-ai-voicelive/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,13 +193,23 @@ For easier learning, explore these focused samples in order:
> ```
> These samples use `javax.sound.sampled` for audio I/O.

### Session startup overloads

`VoiceLiveAsyncClient` exposes three session-start overloads:

- `startSession()`
- `startSession(String model, VoiceLiveRequestOptions options)`
- `startSession(AgentSessionConfig agentConfig, VoiceLiveRequestOptions options)`

Pass `null` for `VoiceLiveRequestOptions` in the samples below when you do not need to provide one.

### Simple voice assistant

Create a basic voice assistant session:

```java com.azure.ai.voicelive.simple.session
// Start session with default options
client.startSession("gpt-realtime")
// Start session with a specific model; pass null when no VoiceLiveRequestOptions value is needed
client.startSession("gpt-realtime", null)
.flatMap(session -> {
System.out.println("Session started");

Expand Down Expand Up @@ -243,8 +253,8 @@ VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setInputAudioTranscription(transcription)
.setTurnDetection(turnDetection);

// Start session with options
client.startSession("gpt-realtime")
// Start session and then send session configuration
client.startSession("gpt-realtime", null)
.flatMap(session -> {
// Send session configuration
ClientEventSessionUpdate updateEvent = new ClientEventSessionUpdate(options);
Expand Down Expand Up @@ -370,7 +380,7 @@ VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setInstructions("You have access to weather information. Use get_current_weather when asked about weather.");

// 3. Handle function call events
client.startSession("gpt-realtime")
client.startSession("gpt-realtime", null)
.flatMap(session -> {
return session.receiveEvents()
.doOnNext(event -> {
Expand Down Expand Up @@ -422,8 +432,8 @@ Use [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) servers to

```java com.azure.ai.voicelive.mcp
// Configure MCP servers as tools
MCPServer mcpServer = new MCPServer("deepwiki", "https://mcp.deepwiki.com/mcp")
.setRequireApproval(BinaryData.fromObject(MCPApprovalType.ALWAYS));
McpServer mcpServer = new McpServer("deepwiki", "https://mcp.deepwiki.com/mcp")
.setRequireApproval(BinaryData.fromObject(McpApprovalType.ALWAYS));

VoiceLiveSessionOptions options = new VoiceLiveSessionOptions()
.setTools(Arrays.asList(mcpServer))
Expand All @@ -436,10 +446,10 @@ session.receiveEvents()
SessionUpdateResponseOutputItemDone itemDone = (SessionUpdateResponseOutputItemDone) event;
SessionResponseItem item = itemDone.getItem();

if (item instanceof ResponseMCPApprovalRequestItem) {
if (item instanceof ResponseMcpApprovalRequestItem) {
// Approve the tool call
ResponseMCPApprovalRequestItem approvalRequest = (ResponseMCPApprovalRequestItem) item;
MCPApprovalResponseRequestItem approval = new MCPApprovalResponseRequestItem(
ResponseMcpApprovalRequestItem approvalRequest = (ResponseMcpApprovalRequestItem) item;
McpApprovalResponseRequestItem approval = new McpApprovalResponseRequestItem(
approvalRequest.getId(), true);
ClientEventConversationItemCreate createItem = new ClientEventConversationItemCreate()
.setItem(approval);
Expand All @@ -464,13 +474,13 @@ Connect directly to an Azure AI Foundry agent using `AgentSessionConfig`. The ag
AgentSessionConfig agentConfig = new AgentSessionConfig("my-agent", "my-project")
.setAgentVersion("1.0");

// Start session with agent config (uses DefaultAzureCredential)
// Start session with agent config; pass null when no VoiceLiveRequestOptions value is needed
VoiceLiveAsyncClient client = new VoiceLiveClientBuilder()
.endpoint(endpoint)
.credential(new DefaultAzureCredentialBuilder().build())
.buildAsyncClient();

client.startSession(agentConfig)
client.startSession(agentConfig, null)
.flatMap(session -> {
return session.receiveEvents()
.doOnNext(event -> handleEvent(event))
Expand Down Expand Up @@ -596,8 +606,8 @@ VoiceLiveSessionOptions sessionOptions = new VoiceLiveSessionOptions()
.setInputAudioTranscription(transcriptionOptions)
.setTurnDetection(turnDetection);

// Start session and handle events
client.startSession("gpt-4o-realtime-preview")
// Start session (null VoiceLiveRequestOptions), then handle events
client.startSession("gpt-4o-realtime-preview", null)
.flatMap(session -> {
// Subscribe to receive server events
session.receiveEvents()
Expand Down
Loading
Loading