feat(core): Support Gemini 2.5 and 3.x Live Models in ADK JS by AmaadMartin · Pull Request #409 · google/adk-js

AmaadMartin · 2026-06-04T21:08:14Z

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

Link to an existing issue (if applicable):

Closes: #362
Related: #362
2. Or, if no issue exists, describe the change:

If applicable, please follow the issue templates to provide as much detail as possible.

Problem: The JS ADK lacked native support for Gemini 2.5 and 3.x Live Models (specifically the bidirectional real-time WebSocket API), resulting in feature parity gaps with the Python ADK.

Solution:

Ported the AsyncQueue utility class to cleanly serialize and bridge incoming WebSocket events to async iterators.
Implemented GeminiLlmConnection.receive() to cleanly map and translate Gemini Live WebSocket events (usageMetadata, serverContent containing model text/thought/audio/transcriptions, toolCall, sessionResumptionUpdate, goAway).
Integrated immediate yielding of tool calls in the connection handler to prevent Gemini 3.x tool call deadlocks.
Modified google_llm.ts's liveApiClient initializer to override the location to 'global' when utilizing Vertex AI, ensuring successful connections without immediately hanging or dropping sockets.
Implemented unmocked end-to-end tests at tests/e2e/live_model_test.ts connecting to real Vertex AI Live endpoints.
Wrote unit tests achieving 100% branch and line coverage.

Testing Plan
Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.

Unit Tests:

I have added or updated unit tests for my change.
All unit tests pass locally.

Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any necessary setup or configuration.

Established WebSocket session to Vertex AI's Bidirectional Live API using gemini-2.5-flash-preview-native-audio.
Successfully sent/streamed contents and verified text streams returned back correctly.

Checklist

I have read the CONTRIBUTING.md document.
I have performed a self-review of my own code.
I have commented my code, particularly in hard-to-understand areas.
I have added tests that prove my fix is effective or that my feature works.
New and existing unit tests pass locally with my changes.

…name connection helpers

kalenkevich · 2026-06-08T22:07:17Z

+/**
+ * A generic, async-safe queue that implements AsyncIterable.
+ */
+export class AsyncQueue<T> implements AsyncIterable<T> {


What about /adk-js/core/src/agents/live_request_queue.ts ?

The AsyncQueue<T> implemented here is a general utility to bridge a push-based stream (callbacks, like the WebSocket onmessage event) to a pull-based JS AsyncIterable<T> (so the caller can consume it as an async generator).

In contrast, live_request_queue.ts is a specialized request queuing mechanism designed to schedule/serialize outgoing request calls made by the agent runner. They work in opposite directions (one handles incoming streaming messages, the other manages outgoing worker calls).

kalenkevich · 2026-06-08T22:07:55Z

-        // TODO - b/425992518: GenAI SDK inconsistent API, missing methods.
-        onmessage: () => {},
+        onmessage: (message) => {
+          console.log('E2E Debug: onmessage', JSON.stringify(message));


remove please

Done. Removed the debug logs.

kalenkevich · 2026-06-08T22:08:05Z

+          messageQueue.push(message);
+        },
+        onerror: (error) => {
+          console.error('E2E Debug: onerror', error);


use logger instead

Done. Removed the debug logs and forwarded the error directly to the queue.

kalenkevich · 2026-06-08T22:08:11Z

+          messageQueue.error(error);
+        },
+        onclose: () => {
+          console.log('E2E Debug: onclose');


please remove

Done. Removed the debug logs.

kalenkevich · 2026-06-08T22:11:18Z

+      logger.debug('Got LLM Live message:', message);
+
+      if (message.usageMetadata) {
+        yield {
+          usageMetadata: message.usageMetadata,
+          ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+        };
+      }
+
+      if (message.serverContent) {
+        const serverContent = message.serverContent;
+        const content = serverContent.modelTurn;
+
+        if (serverContent.groundingMetadata) {
+          pendingGroundingMetadata = serverContent.groundingMetadata;
+        }
+
+        // Standalone groundingMetadata event (when content is empty)
+        if (
+          !(content && content.parts) &&
+          serverContent.groundingMetadata &&
+          !serverContent.turnComplete
+        ) {
+          yield {
+            groundingMetadata: serverContent.groundingMetadata,
+            ...(serverContent.interrupted !== undefined
+              ? {interrupted: serverContent.interrupted}
+              : {}),
+            ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+          };
+        }
+
+        if (content && content.parts) {
+          const llmResponse: LlmResponse = {
+            content: content as Content,
+            ...(serverContent.interrupted !== undefined
+              ? {interrupted: serverContent.interrupted}
+              : {}),
+            ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+          };
+
+          if (!serverContent.turnComplete && serverContent.groundingMetadata) {
+            llmResponse.groundingMetadata = serverContent.groundingMetadata;
+          }
+
+          const hasInlineData = content.parts.some((p) => p.inlineData);
+          for (const part of content.parts) {
+            if (part.text) {
+              const currentIsThought = !!part.thought;
+              if (text && currentIsThought !== isThought) {
+                yield this.buildFullTextResponse(text, isThought);
+                text = '';
+                isThought = false;
+              }
+              text += part.text;
+              isThought = currentIsThought;
+              llmResponse.partial = true;
+            }
+          }
+
+          // don't yield the merged text event when receiving audio data
+          if (text && !content.parts.some((p) => p.text) && !hasInlineData) {
+            yield this.buildFullTextResponse(text, isThought);
+            text = '';
+            isThought = false;
+          }
+
+          yield llmResponse;
+        }
+
+        if (serverContent.inputTranscription) {
+          if (serverContent.inputTranscription.text) {
+            this._inputTranscriptionText +=
+              serverContent.inputTranscription.text;
+            yield {
+              inputTranscription: {
+                text: serverContent.inputTranscription.text,
+                finished: false,
+              },
+              partial: true,
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+          }
+          if (serverContent.inputTranscription.finished) {
+            yield {
+              inputTranscription: {
+                text: this._inputTranscriptionText,
+                finished: true,
+              },
+              partial: false,
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+            this._inputTranscriptionText = '';
+          }
+        }
+
+        if (serverContent.outputTranscription) {
+          if (serverContent.outputTranscription.text) {
+            this._outputTranscriptionText +=
+              serverContent.outputTranscription.text;
+            yield {
+              outputTranscription: {
+                text: serverContent.outputTranscription.text,
+                finished: false,
+              },
+              partial: true,
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+          }
+          if (serverContent.outputTranscription.finished) {
+            yield {
+              outputTranscription: {
+                text: this._outputTranscriptionText,
+                finished: true,
+              },
+              partial: false,
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+            this._outputTranscriptionText = '';
+          }
+        }
+
+        if (
+          serverContent.interrupted ||
+          serverContent.turnComplete ||
+          serverContent.generationComplete
+        ) {
+          if (this._inputTranscriptionText) {
+            yield {
+              inputTranscription: {
+                text: this._inputTranscriptionText,
+                finished: true,
+              },
+              partial: false,
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+            this._inputTranscriptionText = '';
+          }
+          if (this._outputTranscriptionText) {
+            yield {
+              outputTranscription: {
+                text: this._outputTranscriptionText,
+                finished: true,
+              },
+              partial: false,
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+            this._outputTranscriptionText = '';
+          }
+        }
+
+        if (serverContent.turnComplete) {
+          let gMetadataToYield = pendingGroundingMetadata;
+          if (text) {
+            yield this.buildFullTextResponse(text, isThought, gMetadataToYield);
+            text = '';
+            isThought = false;
+            gMetadataToYield = undefined;
+          }
+          if (toolCallParts.length > 0) {
+            logger.debug('Returning aggregated toolCallParts');
+            yield {
+              content: {role: 'model', parts: toolCallParts},
+              ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+            };
+            toolCallParts = [];
+          }
+          const finalResponse: LlmResponse = {
+            turnComplete: true,
+            ...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
+          };
+          if (serverContent.interrupted !== undefined) {
+            finalResponse.interrupted = serverContent.interrupted;
+          }
+          const finalGrounding =
+            serverContent.groundingMetadata || gMetadataToYield;
+          if (finalGrounding !== undefined && finalGrounding !== null) {
+            finalResponse.groundingMetadata = finalGrounding;
+          }
+          yield finalResponse;
+          break;
+        }
+
+        if (serverContent.interrupted) {
+          if (text) {
+            yield this.buildFullTextResponse(text, isThought);
+            text = '';
+            isThought = false;
+          } else {
+            yield {
+              interrupted: serverContent.interrupted,


please move all of that message trasformation logic to separate util functions + tests. Thanks

is the same manner as core/src/utils/streaming_utils.ts

Done! Extracted the entire message transformation and accumulation logic into a new utility class LiveResponseAggregator at core/src/utils/live_connection_utils.ts.

Also added comprehensive unit tests for all branches and states in core/test/utils/live_connection_utils_test.ts to ensure 100% coverage, making GeminiLlmConnection a clean wrapper that delegates the transformation state.

kalenkevich · 2026-06-08T22:12:08Z

+      const isGemini3x = isGemini3xFlashLive(this.modelVersion);
+      if (isGemini3x && content.parts.length === 1 && content.parts[0].text) {
+        logger.debug('Using sendRealtimeInput for Gemini 3.x text input');
+        this.geminiSession.sendRealtimeInput({text: content.parts[0].text});


does this works only fro lite model? not flash not pro?

Yes, the Live API (bidirectional realtime WebSocket interface) is only supported on low-latency Flash/Lite live variants (like gemini-3.1-flash-live-preview). There are no Pro models supported by the Live API gateway currently. This helper targets model names matching the gemini-3.x-flash-live naming structure.

…sponseAggregator utility and add tests

tSte · 2026-06-10T15:12:03Z

@AmaadMartin thanks a lot for this.

May I ask you if this is only first part of addressing #362 or complete implementation? Because what seems to be missing, is:

The feature is still unreachable through the public API. [Feature Request] Gemini 2.5 / 3.1 realtime (Live API) support #362 is about running Live models through Runner.runLive(), correct? This PR only touches GeminiLlmConnection / google_llm.ts.

core/src/runner/runner.ts still has // TODO - b/425992518: Implement runLive and related methods.
LlmAgent.runLiveFlow still throws 'LlmAgent.runLiveFlow not implemented'
BaseAgent.runLive is still not implemented

Please, correct me if I am wrong, but after this merges, the only way to use the new code is to call llm.connect() directly and hand-roll the send/receive loops, bypassing agents, tools, sessions, plugins, and LiveRequestQueue. #325 implemented that missing layer (Runner.runLive with plugin lifecycle and event-persistence rules for audio inlineData, BaseAgent.runLive, LlmAgent.runLiveFlow with the queue-draining send loop, function-response ferrying back over the open socket, and transfer_to_agent recursion). Is this a planned follow-up?

receive() terminates on the first turnComplete

LiveResponseAggregator sets isDone = true on turnComplete, and GeminiLlmConnection.receive() breaks out of its loop. #362 explicitly lists "keep receive() open across turns" as one of the two fork patches this should replace, and adk-python's receive() treats turn_complete as an in-stream signal and keeps iterating until the websocket closes. With this design a future runLiveFlow must re-invoke receive() per turn. Or am I off?

Activity signals are not bridged

LiveRequestQueue already exposes sendActivityStart() / sendActivityEnd(), but the connection has no counterpart (sendRealtimeInput({activityStart: {}}) / ({activityEnd: {}})). Manual-VAD voice agents (automaticActivityDetection disabled) can't signal speech boundaries. #325 added optional sendActivityStart/sendActivityEnd to BaseLlmConnection + GeminiLlmConnection.

Vertex location is hardcoded to 'global'

Correct me if I am wrong, but for deployments with data-residency requirements (EU regions), routing live audio through global is not acceptable as an unconditional override. Could this fall back to 'global' only when no location is configured, or be opt-in?

E2E coverage doesn't seem to exercise the models from the issue

tests/e2e/live_model_test.ts connects with gemini-2.5-flash and responseModalities: ['text'], single turn. The issue is specifically about gemini-2.5-flash-preview-native-audio and gemini-3.1-flash-live-preview with audio; neither the audio path, the native-audio mime routing, nor multi-turn behaviour is covered end-to-end. Should there be some sort of tests for this?

…with E2E tests

AmaadMartin · 2026-06-10T21:10:52Z

Thanks for the feedback! We have updated the PR to address these points directly:

1. Public API / Orchestration Layer (Runner.runLive):
Yes, the high-level runner and agent orchestration flow (Runner.runLive, BaseAgent.runLive, etc.) is planned as an immediate follow-up to keep this PR focused and isolated to the connection/model layer. We will integrate the runner orchestration layer in a subsequent PR (using your PR feat: add bidirectional live streaming to Runner and LlmAgent #325).
2. WebSocket receive loop lifetime:
We have updated the receiving loop in GeminiLlmConnection.receive() in gemini_llm_connection.ts to stay open and continue consuming server messages from the queue across multiple turns, rather than exiting on the first turnComplete event.
3. Activity signals bridging:
Added sendActivityStart() and sendActivityEnd() methods to the connection layer in base_llm_connection.ts and gemini_llm_connection.ts to support manually signaling user activity boundaries.
4. Vertex AI location configuration:
Updated the client initialization in google_llm.ts to respect regional location configurations (falling back to 'global' only if no region is provided).
5. Audio and multi-turn E2E coverage (including Gemini 3.1):
- Implemented unmocked E2E tests in live_model_test.ts verifying PCM audio streaming, audio transcriptions, and multi-turn conversation persistence over Vertex AI using gemini-live-2.5-flash-native-audio.
- Added a dedicated test case for gemini-3.1-flash-live-preview-04-2026 using the new input routing (sendRealtimeInput). Since this model is in private preview and requires explicit GCP project allowlisting on Vertex AI (yielding 1008 Policy Violation otherwise), we marked the test as skipped by default so that the test suite passes on non-allowlisted projects, while remaining runnable for allowlisted clients.
- Unit test coverage in gemini_llm_connection_test.ts fully verifies the Gemini 3.x payload formatting and routing logic.

…rty to E2E requests

Amaad Martin and others added 5 commits June 2, 2026 13:52

feat(core): Support Gemini 2.5 and 3.1 Live Models in ADK JS

f4a54c4

feat(core): support general Gemini 3.x Live models and skip E2E in CI

954f286

test(e2e): remove hardcoded personal GCP project ID

4edea34

feat(core): generalize Gemini 3.1 checks to general 3.x series and re…

1d7e720

…name connection helpers

Merge branch 'main' into feat/live-model-support-gemini-2.5-3.1

c6a4c89

kalenkevich reviewed Jun 8, 2026

View reviewed changes

refactor(core): Extract live message transformation logic into LiveRe…

f6d7a0f

…sponseAggregator utility and add tests

feat: live model and connection layer support for Gemini 2.5 and 3.1 …

a05b389

…with E2E tests

fix: resolve duplicate goAway property and add missing contents prope…

11106e6

…rty to E2E requests

Conversation

AmaadMartin commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tSte commented Jun 10, 2026

Uh oh!

AmaadMartin commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AmaadMartin commented Jun 4, 2026 •

edited

Loading