Extend Activity Schema to Support Multimodal Interactions with Streaming#423
Draft
Extend Activity Schema to Support Multimodal Interactions with Streaming#423
Conversation
This PR implements the approved proposal from issue #416 to extend the Activity Protocol schema for multimodal interactions with streaming support for voice/audio. Changes: - Added Reserved Events for Media Streaming (Media.Start, Media.Chunk, Media.End, Voice.Message) - Extended streaminfo entity to support media streaming with streamState property - Added Session Lifecycle Commands (session.init, session.update, session.end) for multimodal interactions - Bumped version to Provisional 3.4 Key design decisions (per AP Core Committee): - No new activity types - uses existing event, command, commandResult - No new schema fields - uses existing value, valueType, entities - 100% backward compatible - Uses streamInfo entity for stream metadata and sequencing - Uses Media.* prefix for media streaming events Related: #416
… changes) Per discussion on #416, the existing streaminfo entity properties are sufficient for media streaming: - streamType uses existing values: 'streaming', 'final' (not new 'audio'/'video') - valueType on the event activity identifies the media type - No need for new streamState property This ensures zero schema changes to streaminfo entity while supporting multimodal media streaming.
Added separate examples in streaminfo section: - Text Streaming: Existing example using typing/message activities - Voice/Media Streaming: New example showing Media.Start, Media.Chunk, Media.End, and Voice.Message events with streaminfo entities Both examples demonstrate consistent use of streamType values (streaming, final) while different activity types and valueType distinguish the modality.
Based on comprehensive review of proposal #416: 1. Added Implementation Note for Voice.Message explaining: - Why event is used instead of message (SDK validation limitation) - Protocol does allow value/valueType on message (A2005) - Reference to future APv4 vision (#377) 2. Added Error Handling section (A5260-A5262): - Handling Media.Chunk without Media.Start - Stream error signaling via streamResult - Resilience requirements for missing chunks 3. Added Note clarifying session.* commands are reserved protocol commands (not subject to application/* namespace requirement per A6301) These additions address gaps identified during comprehensive review and capture the open discussion points from the proposal.
Added the detailed client-server interaction example from proposal #416: - Session handshake with embedded readiness state - Media streaming events (start, chunk, end) - Optional state updates (thinking, speaking) with threshold notes - Final Voice.Message delivery - Explanatory notes about optional steps This provides a complete reference for implementers to understand the end-to-end flow of a voice streaming session.
- Split large JSON code blocks with comments into separate blocks - Added descriptive headers before each example - Used proper code block language specifiers (json, text) - Reorganized multimodal interaction flow into numbered steps - Added blockquotes for explanatory notes - Removed invalid JSON comments (JSON doesn't support //) This improves readability when the spec is rendered in GitHub, documentation sites, and other markdown viewers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the approved proposal from issue #416 to extend the Activity Protocol schema for multimodal interactions with streaming support for voice/audio.
Changes:
Key design decisions (per AP Core Committee):
Related: #416