diff --git a/specs/activity/protocol-activity.md b/specs/activity/protocol-activity.md
index 2ed2e701..c665009e 100644
--- a/specs/activity/protocol-activity.md
+++ b/specs/activity/protocol-activity.md
@@ -1,6 +1,6 @@
 # Activity Protocol -- Activity
 
-Version: Provisional 3.3
+Version: Provisional 3.4
 
 ## Abstract
 
@@ -638,6 +638,196 @@ Possible values for `contentType` are audio, video, text, screen, all or any oth
 }
 ```
 
+### Reserved Events for Media Streaming
+
+Media streaming events are used to facilitate real-time multimodal interactions, particularly for voice and audio streaming. These events use the `Media.*` prefix and work in conjunction with the [`streamInfo`](#streaminfo) entity for stream metadata and sequencing.
+
+`A5210`: Media streaming events MUST use the `Media.*` prefix for their `name` field.
+
+`A5211`: Media streaming events SHOULD include a [`streamInfo`](#streaminfo) entity to convey stream metadata.
+
+`A5212`: Media streaming events MAY use the `value` and `valueType` fields to carry modality-specific content.
+
+#### Media.Start
+
+The `Media.Start` event initiates a media streaming session. It establishes the stream context and media type that will be transmitted.
+
+| Field       | Type   | Required | Description                                      |
+|-------------|--------|----------|--------------------------------------------------|
+| `type`      | string | Yes      | Must be `"event"`                                |
+| `name`      | string | Yes      | Must be `"Media.Start"`                          |
+| `valueType` | string | No       | Identifies the schema of the `value` object, e.g., `"application/vnd.microsoft.activity.mediastart+json"` |
+| `value`     | object | No       | Contains media type and content type information |
+| `entities`  | array  | Yes      | Must include a [`streamInfo`](#streaminfo) entity with `streamType` of `"streaming"` |
+
+Example:
+```json
+{
+  "type": "event",
+  "name": "Media.Start",
+  "valueType": "application/vnd.microsoft.activity.mediastart+json",
+  "value": {
+    "mediaType": "audio",
+    "contentType": "audio/webm"
+  },
+  "entities": [
+    {
+      "type": "streamInfo",
+      "streamId": "abc123",
+      "streamType": "streaming",
+      "streamSequence": 1
+    }
+  ]
+}
+```
+
+`A5220`: Senders MUST include a [`streamInfo`](#streaminfo) entity in `Media.Start` events with a valid `streamId`.
+
+`A5221`: The `streamSequence` in `Media.Start` SHOULD be `1` as it initiates the stream.
+
+#### Media.Chunk
+
+The `Media.Chunk` event sends a chunk of media data during an active streaming session. Chunks are sequenced using the `streamSequence` field in the [`streamInfo`](#streaminfo) entity.
+
+| Field       | Type   | Required | Description                                      |
+|-------------|--------|----------|--------------------------------------------------|
+| `type`      | string | Yes      | Must be `"event"`                                |
+| `name`      | string | Yes      | Must be `"Media.Chunk"`                          |
+| `valueType` | string | No       | Identifies the schema of the `value` object, e.g., `"application/vnd.microsoft.activity.audiochunk+json"` |
+| `value`     | object | Yes      | Contains the media chunk data                    |
+| `entities`  | array  | Yes      | Must include a [`streamInfo`](#streaminfo) entity |
+
+The `value` object for audio chunks typically includes:
+
+| Property       | Type    | Required | Description                                    |
+|----------------|---------|----------|------------------------------------------------|
+| `contentType`  | string  | Yes      | MIME type of the media, e.g., `"audio/webm"`   |
+| `contentUrl`   | string  | Yes      | Data URI containing Base64-encoded media data  |
+| `durationMs`   | integer | No       | Duration of the chunk in milliseconds          |
+| `timestamp`    | string  | No       | ISO 8601 timestamp of the chunk                |
+| `transcription`| string  | No       | Optional real-time transcription of audio      |
+
+Example:
+```json
+{
+  "type": "event",
+  "name": "Media.Chunk",
+  "valueType": "application/vnd.microsoft.activity.audiochunk+json",
+  "value": {
+    "contentType": "audio/webm",
+    "contentUrl": "data:audio/webm;base64,...",
+    "durationMs": 2500,
+    "timestamp": "2025-10-07T10:30:05Z",
+    "transcription": "Your destination?"
+  },
+  "entities": [
+    {
+      "type": "streamInfo",
+      "streamId": "abc123",
+      "streamType": "streaming",
+      "streamSequence": 2
+    }
+  ]
+}
+```
+
+`A5230`: Senders MUST include a [`streamInfo`](#streaminfo) entity in `Media.Chunk` events with the same `streamId` as the corresponding `Media.Start`.
+
+`A5231`: The `streamSequence` MUST be incrementing for each chunk within the same stream.
+
+`A5232`: Receivers SHOULD use `streamSequence` to order chunks and detect missing chunks.
+
+#### Media.End
+
+The `Media.End` event signals the end of a media streaming session.
+
+| Field       | Type   | Required | Description                                      |
+|-------------|--------|----------|--------------------------------------------------|
+| `type`      | string | Yes      | Must be `"event"`                                |
+| `name`      | string | Yes      | Must be `"Media.End"`                            |
+| `valueType` | string | No       | Identifies the schema, e.g., `"application/vnd.microsoft.activity.mediaend+json"` |
+| `entities`  | array  | Yes      | Must include a [`streamInfo`](#streaminfo) entity with `streamType` of `"final"` |
+
+Example:
+```json
+{
+  "type": "event",
+  "name": "Media.End",
+  "valueType": "application/vnd.microsoft.activity.mediaend+json",
+  "entities": [
+    {
+      "type": "streamInfo",
+      "streamId": "abc123",
+      "streamType": "final",
+      "streamSequence": 3
+    }
+  ]
+}
+```
+
+`A5240`: Senders MUST include a [`streamInfo`](#streaminfo) entity in `Media.End` events with `streamType` set to `"final"`.
+
+`A5241`: Receivers SHOULD clean up stream resources upon receiving `Media.End`.
+
+#### Voice.Message
+
+The `Voice.Message` event delivers a complete voice message, either as a final response after streaming or as a standalone voice message.
+
+> **Implementation Note:**
+> 
+> The Activity Protocol schema permits `value` and `valueType` on `message` activities (per A2005). However, current SDK implementations may not fully support this combination for validation purposes. For GA compatibility, `Voice.Message` is defined as an `event` activity. This ensures consistent behavior across all existing Bot Framework, Azure Bot Service, and Teams clients.
+> 
+> Future versions (APv4+) may unify voice messages under the `message` activity type for consistency with text messages. See [#377](https://github.com/microsoft/Agents/issues/377) for the longer-term vision.
+
+| Field       | Type   | Required | Description                                      |
+|-------------|--------|----------|--------------------------------------------------|
+| `type`      | string | Yes      | Must be `"event"`                                |
+| `name`      | string | Yes      | Must be `"Voice.Message"`                        |
+| `valueType` | string | Yes      | Must be `"application/vnd.microsoft.activity.voice+json"` |
+| `value`     | object | Yes      | Contains the voice message content               |
+
+The `value` object for voice messages includes:
+
+| Property       | Type    | Required | Description                                    |
+|----------------|---------|----------|------------------------------------------------|
+| `contentType`  | string  | Yes      | MIME type of the audio, e.g., `"audio/webm"`   |
+| `contentUrl`   | string  | Yes      | Data URI or URL containing the audio data      |
+| `transcription`| string  | No       | Text transcription of the audio                |
+| `durationMs`   | integer | No       | Duration in milliseconds                       |
+| `timestamp`    | string  | No       | ISO 8601 timestamp                             |
+| `locale`       | string  | No       | Language/locale of the audio, e.g., `"en-US"`  |
+
+Example:
+```json
+{
+  "type": "event",
+  "name": "Voice.Message",
+  "valueType": "application/vnd.microsoft.activity.voice+json",
+  "value": {
+    "contentType": "audio/webm",
+    "contentUrl": "data:audio/webm;base64,...",
+    "transcription": "Book a flight to Paris",
+    "durationMs": 3400,
+    "timestamp": "2025-10-07T10:30:00Z",
+    "locale": "en-US"
+  }
+}
+```
+
+`A5250`: `Voice.Message` events MUST include a `valueType` of `"application/vnd.microsoft.activity.voice+json"`.
+
+`A5251`: The `value` object MUST include `contentType` and `contentUrl` fields.
+
+`A5252`: Senders SHOULD include a `transcription` field to support accessibility and text-based processing.
+
+#### Error Handling
+
+`A5260`: If a `Media.Chunk` event is received without a corresponding `Media.Start`, receivers MAY ignore it or MAY process it if the `streamId` is known from a prior session.
+
+`A5261`: If a stream error occurs, senders SHOULD send a `Media.End` event with `streamResult` set to `"error"` in the `streaminfo` entity.
+
+`A5262`: Receivers SHOULD be resilient to missing chunks and SHOULD use `streamSequence` to detect gaps.
+
 
 ## Invoke activity
 
@@ -1594,6 +1784,14 @@ The `error` field contains the reason the original [command activity](#command-a
 
 # Appendix I - Changes
 
+# 2025-02-05 - guhiriya@microsoft.com
+* Added Reserved Events for Media Streaming (`Media.Start`, `Media.Chunk`, `Media.End`, `Voice.Message`)
+* Documented usage of existing `streaminfo` entity for media streaming (no schema changes)
+* Added Session Lifecycle Commands (`session.init`, `session.update`, `session.end`) for multimodal interactions
+* Added normative requirements A5210-A5252 for media streaming events
+* Added normative requirements A9260-A9262 for media streaming in streaminfo
+* Added normative requirements A9400-A9442 for session lifecycle commands
+
 # 2025-09-30 - mattb-msft
 * Updated Channel Account definition to reflect current rules and usages. 
 
@@ -1764,16 +1962,20 @@ Note that on channels with a persistent chat feed, `platform` is typically usefu
 
 ### streaminfo
 
-The `streaminfo` entity conveys metadata supporting chunked streaming of text messages, typically sent as a sequence of `typing` Activities, followed by a final `message` Activity containing the complete text.
+The `streaminfo` entity conveys metadata supporting chunked streaming of messages. It is used for:
+- **Text streaming**: Sent as a sequence of `typing` Activities, followed by a final `message` Activity containing the complete text.
+- **Media streaming**: Used with [Media.* events](#reserved-events-for-media-streaming) (`Media.Start`, `Media.Chunk`, `Media.End`) for real-time voice/audio streaming.
 
 | Property         | Type    | Required | Description                                                                     |
 |------------------|---------|----------|---------------------------------------------------------------------------------|
 | `type`           | string  | Yes      | Must be `"streaminfo"`                                                          |
 | `streamId`       | string  | Yes      | Unique identifier for the streaming session                                     |
 | `streamSequence` | integer | Yes      | Incrementing sequence number for each chunk for non-final messages              |
-| `streamType`     | string  | No       | One of `"informative"`, `"streaming"`, or `"final"`. Defaults to `"streaming"`` |
+| `streamType`     | string  | No       | One of `"informative"`, `"streaming"`, or `"final"`. Defaults to `"streaming"` |
 | `streamResult`   | string  | No       | Present only on final message; one of `"success"`, `"timeout"`, or `"error"`    |
 
+#### Text Streaming
+
 `A9240`: Streaming text is sent via a sequence of `typing` Activities containing `streaminfo` entities.
 
 `A9241`: The final message is sent as a `message` Activity with `streamType` set to `"final"`.
@@ -1790,11 +1992,24 @@ The `streaminfo` entity conveys metadata supporting chunked streaming of text me
 
 `A9247`: Channels that do not support streaming SHOULD buffer all chunks and deliver a single `message` when complete.
 
+#### Media Streaming
+
+When used with [Media.* events](#reserved-events-for-media-streaming), the `streaminfo` entity serves as the single place for stream identification and sequencing, independent of the activity type. The existing `streamType` values (`"streaming"`, `"final"`) are used to indicate stream lifecycle, while the `valueType` field on the event activity identifies the media type.
+
+`A9260`: For media streaming, the `streamType` field uses existing values: `"streaming"` for active chunks, `"final"` for stream end.
+
+`A9261`: The `streamId` MUST be consistent across all activities in a streaming session (`Media.Start`, `Media.Chunk`, `Media.End`).
+
+`A9262`: Receivers SHOULD use `streamSequence` to detect out-of-order or missing chunks in media streams.
+
 ---
 
-Example:
+#### Example: Text Streaming
+
+Text streaming uses `typing` activities for incremental chunks, followed by a final `message` activity:
+
+**Informative message** - Show processing status:
 ```json
-// Sending an informative message chunk
 {
   "type": "typing",
   "text": "Getting the answer...",
@@ -1808,8 +2023,10 @@ Example:
     }
   ]
 }
+```
 
-// Sending a streaming text chunk
+**Streaming text chunk** - Incremental content:
+```json
 {
   "type": "typing",
   "text": "A quick brown fox jumped over the",
@@ -1822,8 +2039,10 @@ Example:
     }
   ]
 }
+```
 
-// Sending the final complete message
+**Final complete message** - Full response:
+```json
 {
   "type": "message",
   "text": "A quick brown fox jumped over the lazy dog.",
@@ -1838,6 +2057,113 @@ Example:
 }
 ```
 
+#### Example: Voice/Media Streaming
+
+Voice streaming uses `event` activities with [Media.* events](#reserved-events-for-media-streaming). The `valueType` identifies the media type, while `streaminfo` handles sequencing:
+
+**Media.Start** - Initiate audio streaming session:
+```json
+{
+  "type": "event",
+  "name": "Media.Start",
+  "valueType": "application/vnd.microsoft.activity.mediastart+json",
+  "value": {
+    "mediaType": "audio",
+    "contentType": "audio/webm"
+  },
+  "entities": [
+    {
+      "type": "streaminfo",
+      "streamId": "v-00001",
+      "streamType": "streaming",
+      "streamSequence": 1
+    }
+  ]
+}
+```
+
+**Media.Chunk** - Send audio chunk with optional transcription:
+```json
+{
+  "type": "event",
+  "name": "Media.Chunk",
+  "valueType": "application/vnd.microsoft.activity.audiochunk+json",
+  "value": {
+    "contentType": "audio/webm",
+    "contentUrl": "data:audio/webm;base64,GkXfo59ChoEBQveBAU...",
+    "durationMs": 2500,
+    "timestamp": "2025-10-07T10:30:05Z",
+    "transcription": "Book a flight to"
+  },
+  "entities": [
+    {
+      "type": "streaminfo",
+      "streamId": "v-00001",
+      "streamType": "streaming",
+      "streamSequence": 2
+    }
+  ]
+}
+```
+
+**Media.Chunk** - Continue streaming (additional chunks):
+```json
+{
+  "type": "event",
+  "name": "Media.Chunk",
+  "valueType": "application/vnd.microsoft.activity.audiochunk+json",
+  "value": {
+    "contentType": "audio/webm",
+    "contentUrl": "data:audio/webm;base64,R0lGODlhAQABAIAA...",
+    "durationMs": 1800,
+    "timestamp": "2025-10-07T10:30:07Z",
+    "transcription": "Paris please"
+  },
+  "entities": [
+    {
+      "type": "streaminfo",
+      "streamId": "v-00001",
+      "streamType": "streaming",
+      "streamSequence": 3
+    }
+  ]
+}
+```
+
+**Media.End** - Signal end of audio stream:
+```json
+{
+  "type": "event",
+  "name": "Media.End",
+  "valueType": "application/vnd.microsoft.activity.mediaend+json",
+  "entities": [
+    {
+      "type": "streaminfo",
+      "streamId": "v-00001",
+      "streamType": "final",
+      "streamSequence": 4
+    }
+  ]
+}
+```
+
+**Voice.Message** - Final complete voice response (Server to Client):
+```json
+{
+  "type": "event",
+  "name": "Voice.Message",
+  "valueType": "application/vnd.microsoft.activity.voice+json",
+  "value": {
+    "contentType": "audio/webm",
+    "contentUrl": "data:audio/webm;base64,UklGRiQAAABXQVZF...",
+    "transcription": "I found flights to Paris. The next available is tomorrow at 8:05am.",
+    "durationMs": 4200,
+    "timestamp": "2025-10-07T10:30:12Z",
+    "locale": "en-US"
+  }
+}
+```
+
 # Appendix III - Protocols using the Invoke activity
 
 The [invoke activity](#invoke-activity) is designed for use only within protocols supported by Activity Protocol channels (i.e., it is not a generic extensibility mechanism). This appendix contains a list of all protocols using this activity.
@@ -1923,6 +2249,208 @@ The authenticity of a call from an Agent can be established by inspecting its JS
 
 The Microsoft Telephony channel defines channel command activities in the namespace `channel/vnd.microsoft.telephony.<action>`. 
 
+## Session Lifecycle Commands
+
+Session lifecycle commands are used to manage multimodal streaming sessions, particularly for voice interactions. These commands follow request/response semantics with acknowledgments via `commandResult` activities.
+
+> **Note:** The `session.*` command names are reserved Activity Protocol commands for multimodal session management. Unlike application-defined commands (which must use the `application/*` namespace per A6301), these are protocol-level commands similar to other reserved event names.
+
+### session.init
+
+The `session.init` command initializes a new streaming session. It establishes the session context and is acknowledged with a `commandResult` containing the session state.
+
+**Request:**
+```json
+{
+  "type": "command",
+  "id": "cmd1",
+  "name": "session.init",
+  "value": {
+    "sessionId": "sess_123"
+  }
+}
+```
+
+**Response (commandResult):**
+```json
+{
+  "type": "commandResult",
+  "replyToId": "cmd1",
+  "value": {
+    "status": "success",
+    "sessionId": "sess_123",
+    "state": "listening"
+  }
+}
+```
+
+`A9400`: The `session.init` command MUST include a `sessionId` in the `value` object.
+
+`A9401`: Receivers MUST respond with a `commandResult` activity indicating success or failure.
+
+`A9402`: A successful `session.init` response MAY include an initial `state` (e.g., `"listening"`), eliminating the need for a separate `session.update`.
+
+### session.update
+
+The `session.update` command updates the state of an active session. It is used to signal state transitions during multimodal interactions.
+
+**Request:**
+```json
+{
+  "type": "command",
+  "id": "cmd2",
+  "name": "session.update",
+  "value": {
+    "state": "speaking"
+  }
+}
+```
+
+**Response (commandResult):**
+```json
+{
+  "type": "commandResult",
+  "replyToId": "cmd2",
+  "value": {
+    "status": "acknowledged"
+  }
+}
+```
+
+Defined session states:
+
+| State       | Description                                                |
+|-------------|------------------------------------------------------------|
+| `listening` | Bot is awaiting user input (input.expected)                |
+| `thinking`  | Bot is processing the input                                |
+| `speaking`  | Bot is generating or delivering output (output.generating) |
+| `idle`      | Bot is not currently in an active state                    |
+| `error`     | An error has occurred during the interaction               |
+
+`A9410`: The `session.update` command SHOULD include a `state` field in the `value` object.
+
+`A9411`: Receivers SHOULD respond with a `commandResult` activity acknowledging the state change.
+
+`A9412`: Session state updates are optional and threshold-based; clients may safely ignore them.
+
+### session.update (Barge-In)
+
+The `session.update` command can also signal a barge-in event, where the user or system interrupts the current output.
+
+```json
+{
+  "type": "command",
+  "name": "session.update",
+  "value": {
+    "signal": "bargeIn",
+    "origin": "user"
+  }
+}
+```
+
+`A9420`: A barge-in signal SHOULD include `origin` indicating whether it was triggered by `"user"` or `"system"`.
+
+`A9421`: Upon receiving a barge-in, the server SHOULD return to the `"listening"` state.
+
+### session.end
+
+The `session.end` command terminates an active session.
+
+```json
+{
+  "type": "command",
+  "name": "session.end",
+  "value": {
+    "reason": "completed"
+  }
+}
+```
+
+Defined end reasons:
+
+| Reason      | Description                              |
+|-------------|------------------------------------------|
+| `completed` | Session ended normally                   |
+| `cancelled` | Session was cancelled                    |
+| `error`     | Session ended due to an error            |
+| `timeout`   | Session ended due to inactivity timeout  |
+
+`A9430`: The `session.end` command SHOULD include a `reason` field in the `value` object.
+
+`A9431`: Receivers SHOULD clean up session resources upon receiving `session.end`.
+
+### Multimodal Interaction Flow
+
+The typical flow for a voice streaming session:
+
+```text
+Client → Server:
+  session.init → commandResult (listening) → Media.Start → Media.Chunk x N → Media.End → bargeIn (optional)
+
+Server → Client:
+  Optional session.update (thinking) → Optional session.update (speaking) → Voice.Message
+
+Barge-In:
+  Client sends bargeIn → Server returns to listening
+```
+
+#### Round-Trip Flow Example: Client and Server Interaction
+
+The following example illustrates a complete voice streaming interaction:
+
+**Step 1: Session Handshake**
+```text
+client  → command:        session.init
+server  → commandResult:  { "status": "success", "sessionId": "SESS-123", "state": "listening" }
+```
+> Because readiness (`listening`) is embedded in the response above, a separate `session.update(state="listening")` call is NOT required.
+
+**Step 2: Readiness Signal (Optional)**
+
+This step is required only if the channel or runtime explicitly requires a readiness signal:
+```text
+server  → command:        session.update { "state": "listening", "sessionId": "SESS-123" }
+client  → commandResult:  { "status": "acknowledged" }
+```
+
+**Step 3: Stream Media (Fire-and-Forget Events)**
+```text
+client  → event:  Media.Start   { streamId: "STR-1", contentType: "audio/webm" }
+client  → event:  Media.Chunk   { streamId: "STR-1", seq: 1, ... }
+client  → event:  Media.Chunk   { streamId: "STR-1", seq: 2, ... }
+  ... (more Media.Chunk events)
+client  → event:  Media.End     { streamId: "STR-1" }
+```
+
+**Step 4: Processing State Updates (Optional)**
+
+These updates are optional and rate-limited. Clients may safely ignore them. They fire only when thresholds are crossed (e.g., >200ms of "thinking"):
+```text
+server  → command:        session.update { "state": "thinking", "sessionId": "SESS-123" }
+client  → commandResult:  { "status": "acknowledged" }
+
+server  → command:        session.update { "state": "speaking", "sessionId": "SESS-123" }
+client  → commandResult:  { "status": "acknowledged" }
+```
+
+**Step 5: Final Voice Response**
+```text
+server  → event:  Voice.Message  
+                  valueType: "application/vnd.microsoft.activity.voice+json"
+                  value: { "contentType": "audio/webm", "contentUrl": "...", "transcription": "..." }
+```
+
+> **Notes:** 
+> - `listening` is NOT needed as a separate step if included in the `session.init` commandResult.
+> - `thinking` and `speaking` session.update messages are optional and threshold-based.
+> - Media streaming events are fire-and-forget (no acknowledgment required).
+
+`A9440`: Session lifecycle commands follow request/response semantics; receivers SHOULD send acknowledgments via `commandResult`.
+
+`A9441`: Session lifecycle commands are required only for real-time streaming modalities (voice, video).
+
+`A9442`: The `listening` state MAY be embedded in the `session.init` response, making a separate `session.update(listening)` optional.
+
 ## Patterns for rejecting commands
 
 ### General pattern for rejecting commands