Extend Activity Schema to Support Multimodal Interactions with Streaming by gurubhg · Pull Request #423 · microsoft/Agents

gurubhg · 2026-02-05T06:25:47Z

This PR implements the approved proposal from issue #416 to extend the Activity Protocol schema for multimodal interactions with streaming support for voice/audio.

Changes:

Added Reserved Events for Media Streaming (Media.Start, Media.Chunk, Media.End, Voice.Message)
Extended streaminfo entity to support media streaming with streamState property
Added Session Lifecycle Commands (session.init, session.update, session.end) for multimodal interactions
Bumped version to Provisional 3.4

Key design decisions (per AP Core Committee):

No new activity types - uses existing event, command, commandResult
No new schema fields - uses existing value, valueType, entities
100% backward compatible
Uses streamInfo entity for stream metadata and sequencing
Uses Media.* prefix for media streaming events

Related: #416

This PR implements the approved proposal from issue #416 to extend the Activity Protocol schema for multimodal interactions with streaming support for voice/audio. Changes: - Added Reserved Events for Media Streaming (Media.Start, Media.Chunk, Media.End, Voice.Message) - Extended streaminfo entity to support media streaming with streamState property - Added Session Lifecycle Commands (session.init, session.update, session.end) for multimodal interactions - Bumped version to Provisional 3.4 Key design decisions (per AP Core Committee): - No new activity types - uses existing event, command, commandResult - No new schema fields - uses existing value, valueType, entities - 100% backward compatible - Uses streamInfo entity for stream metadata and sequencing - Uses Media.* prefix for media streaming events Related: #416

… changes) Per discussion on #416, the existing streaminfo entity properties are sufficient for media streaming: - streamType uses existing values: 'streaming', 'final' (not new 'audio'/'video') - valueType on the event activity identifies the media type - No need for new streamState property This ensures zero schema changes to streaminfo entity while supporting multimodal media streaming.

Added separate examples in streaminfo section: - Text Streaming: Existing example using typing/message activities - Voice/Media Streaming: New example showing Media.Start, Media.Chunk, Media.End, and Voice.Message events with streaminfo entities Both examples demonstrate consistent use of streamType values (streaming, final) while different activity types and valueType distinguish the modality.

Based on comprehensive review of proposal #416: 1. Added Implementation Note for Voice.Message explaining: - Why event is used instead of message (SDK validation limitation) - Protocol does allow value/valueType on message (A2005) - Reference to future APv4 vision (#377) 2. Added Error Handling section (A5260-A5262): - Handling Media.Chunk without Media.Start - Stream error signaling via streamResult - Resilience requirements for missing chunks 3. Added Note clarifying session.* commands are reserved protocol commands (not subject to application/* namespace requirement per A6301) These additions address gaps identified during comprehensive review and capture the open discussion points from the proposal.

Added the detailed client-server interaction example from proposal #416: - Session handshake with embedded readiness state - Media streaming events (start, chunk, end) - Optional state updates (thinking, speaking) with threshold notes - Final Voice.Message delivery - Explanatory notes about optional steps This provides a complete reference for implementers to understand the end-to-end flow of a voice streaming session.

- Split large JSON code blocks with comments into separate blocks - Added descriptive headers before each example - Used proper code block language specifiers (json, text) - Reorganized multimodal interaction flow into numbered steps - Added blockquotes for explanatory notes - Removed invalid JSON comments (JSON doesn't support //) This improves readability when the spec is rendered in GitHub, documentation sites, and other markdown viewers.

github-actions bot added the Specs This is related to Activity Protocol Specification label Feb 5, 2026

gurubhg added 6 commits February 5, 2026 12:07

fix: remove duplicate notes in Multimodal Interaction Flow

b22e163

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend Activity Schema to Support Multimodal Interactions with Streaming#423

Extend Activity Schema to Support Multimodal Interactions with Streaming#423
gurubhg wants to merge 7 commits intomainfrom
users/guhiriya/extend-activity-schema-multimodal-streaming

gurubhg commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gurubhg commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant