Skip to content

feat(core): Support Gemini 2.5 and 3.x Live Models in ADK JS#409

Open
AmaadMartin wants to merge 8 commits into
google:mainfrom
AmaadMartin:feat/live-model-support-gemini-2.5-3.1
Open

feat(core): Support Gemini 2.5 and 3.x Live Models in ADK JS#409
AmaadMartin wants to merge 8 commits into
google:mainfrom
AmaadMartin:feat/live-model-support-gemini-2.5-3.1

Conversation

@AmaadMartin

@AmaadMartin AmaadMartin commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Please ensure you have read the contribution guide before creating a pull request.

Link to Issue or Description of Change

  1. Link to an existing issue (if applicable):

Closes: #362
Related: #362
2. Or, if no issue exists, describe the change:

If applicable, please follow the issue templates to provide as much detail as possible.

Problem: The JS ADK lacked native support for Gemini 2.5 and 3.x Live Models (specifically the bidirectional real-time WebSocket API), resulting in feature parity gaps with the Python ADK.

Solution:

  • Ported the AsyncQueue utility class to cleanly serialize and bridge incoming WebSocket events to async iterators.
  • Implemented GeminiLlmConnection.receive() to cleanly map and translate Gemini Live WebSocket events (usageMetadata, serverContent containing model text/thought/audio/transcriptions, toolCall, sessionResumptionUpdate, goAway).
  • Integrated immediate yielding of tool calls in the connection handler to prevent Gemini 3.x tool call deadlocks.
  • Modified google_llm.ts's liveApiClient initializer to override the location to 'global' when utilizing Vertex AI, ensuring successful connections without immediately hanging or dropping sockets.
  • Implemented unmocked end-to-end tests at tests/e2e/live_model_test.ts connecting to real Vertex AI Live endpoints.
  • Wrote unit tests achieving 100% branch and line coverage.

Testing Plan
Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

Manual End-to-End (E2E) Tests:
Please provide instructions on how to manually test your changes, including any necessary setup or configuration.

  • Established WebSocket session to Vertex AI's Bidirectional Live API using gemini-2.5-flash-preview-native-audio.
  • Successfully sent/streamed contents and verified text streams returned back correctly.

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.

/**
* A generic, async-safe queue that implements AsyncIterable.
*/
export class AsyncQueue<T> implements AsyncIterable<T> {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about /adk-js/core/src/agents/live_request_queue.ts ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AsyncQueue<T> implemented here is a general utility to bridge a push-based stream (callbacks, like the WebSocket onmessage event) to a pull-based JS AsyncIterable<T> (so the caller can consume it as an async generator).

In contrast, live_request_queue.ts is a specialized request queuing mechanism designed to schedule/serialize outgoing request calls made by the agent runner. They work in opposite directions (one handles incoming streaming messages, the other manages outgoing worker calls).

Comment thread core/src/models/google_llm.ts Outdated
// TODO - b/425992518: GenAI SDK inconsistent API, missing methods.
onmessage: () => {},
onmessage: (message) => {
console.log('E2E Debug: onmessage', JSON.stringify(message));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove please

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the debug logs.

Comment thread core/src/models/google_llm.ts Outdated
messageQueue.push(message);
},
onerror: (error) => {
console.error('E2E Debug: onerror', error);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logger instead

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the debug logs and forwarded the error directly to the queue.

Comment thread core/src/models/google_llm.ts Outdated
messageQueue.error(error);
},
onclose: () => {
console.log('E2E Debug: onclose');

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Removed the debug logs.

Comment on lines +172 to +362
logger.debug('Got LLM Live message:', message);

if (message.usageMetadata) {
yield {
usageMetadata: message.usageMetadata,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
}

if (message.serverContent) {
const serverContent = message.serverContent;
const content = serverContent.modelTurn;

if (serverContent.groundingMetadata) {
pendingGroundingMetadata = serverContent.groundingMetadata;
}

// Standalone groundingMetadata event (when content is empty)
if (
!(content && content.parts) &&
serverContent.groundingMetadata &&
!serverContent.turnComplete
) {
yield {
groundingMetadata: serverContent.groundingMetadata,
...(serverContent.interrupted !== undefined
? {interrupted: serverContent.interrupted}
: {}),
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
}

if (content && content.parts) {
const llmResponse: LlmResponse = {
content: content as Content,
...(serverContent.interrupted !== undefined
? {interrupted: serverContent.interrupted}
: {}),
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};

if (!serverContent.turnComplete && serverContent.groundingMetadata) {
llmResponse.groundingMetadata = serverContent.groundingMetadata;
}

const hasInlineData = content.parts.some((p) => p.inlineData);
for (const part of content.parts) {
if (part.text) {
const currentIsThought = !!part.thought;
if (text && currentIsThought !== isThought) {
yield this.buildFullTextResponse(text, isThought);
text = '';
isThought = false;
}
text += part.text;
isThought = currentIsThought;
llmResponse.partial = true;
}
}

// don't yield the merged text event when receiving audio data
if (text && !content.parts.some((p) => p.text) && !hasInlineData) {
yield this.buildFullTextResponse(text, isThought);
text = '';
isThought = false;
}

yield llmResponse;
}

if (serverContent.inputTranscription) {
if (serverContent.inputTranscription.text) {
this._inputTranscriptionText +=
serverContent.inputTranscription.text;
yield {
inputTranscription: {
text: serverContent.inputTranscription.text,
finished: false,
},
partial: true,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
}
if (serverContent.inputTranscription.finished) {
yield {
inputTranscription: {
text: this._inputTranscriptionText,
finished: true,
},
partial: false,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
this._inputTranscriptionText = '';
}
}

if (serverContent.outputTranscription) {
if (serverContent.outputTranscription.text) {
this._outputTranscriptionText +=
serverContent.outputTranscription.text;
yield {
outputTranscription: {
text: serverContent.outputTranscription.text,
finished: false,
},
partial: true,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
}
if (serverContent.outputTranscription.finished) {
yield {
outputTranscription: {
text: this._outputTranscriptionText,
finished: true,
},
partial: false,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
this._outputTranscriptionText = '';
}
}

if (
serverContent.interrupted ||
serverContent.turnComplete ||
serverContent.generationComplete
) {
if (this._inputTranscriptionText) {
yield {
inputTranscription: {
text: this._inputTranscriptionText,
finished: true,
},
partial: false,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
this._inputTranscriptionText = '';
}
if (this._outputTranscriptionText) {
yield {
outputTranscription: {
text: this._outputTranscriptionText,
finished: true,
},
partial: false,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
this._outputTranscriptionText = '';
}
}

if (serverContent.turnComplete) {
let gMetadataToYield = pendingGroundingMetadata;
if (text) {
yield this.buildFullTextResponse(text, isThought, gMetadataToYield);
text = '';
isThought = false;
gMetadataToYield = undefined;
}
if (toolCallParts.length > 0) {
logger.debug('Returning aggregated toolCallParts');
yield {
content: {role: 'model', parts: toolCallParts},
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
toolCallParts = [];
}
const finalResponse: LlmResponse = {
turnComplete: true,
...(this.modelVersion ? {modelVersion: this.modelVersion} : {}),
};
if (serverContent.interrupted !== undefined) {
finalResponse.interrupted = serverContent.interrupted;
}
const finalGrounding =
serverContent.groundingMetadata || gMetadataToYield;
if (finalGrounding !== undefined && finalGrounding !== null) {
finalResponse.groundingMetadata = finalGrounding;
}
yield finalResponse;
break;
}

if (serverContent.interrupted) {
if (text) {
yield this.buildFullTextResponse(text, isThought);
text = '';
isThought = false;
} else {
yield {
interrupted: serverContent.interrupted,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move all of that message trasformation logic to separate util functions + tests. Thanks

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the same manner as core/src/utils/streaming_utils.ts

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Extracted the entire message transformation and accumulation logic into a new utility class LiveResponseAggregator at core/src/utils/live_connection_utils.ts.

Also added comprehensive unit tests for all branches and states in core/test/utils/live_connection_utils_test.ts to ensure 100% coverage, making GeminiLlmConnection a clean wrapper that delegates the transformation state.

const isGemini3x = isGemini3xFlashLive(this.modelVersion);
if (isGemini3x && content.parts.length === 1 && content.parts[0].text) {
logger.debug('Using sendRealtimeInput for Gemini 3.x text input');
this.geminiSession.sendRealtimeInput({text: content.parts[0].text});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this works only fro lite model? not flash not pro?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the Live API (bidirectional realtime WebSocket interface) is only supported on low-latency Flash/Lite live variants (like gemini-3.1-flash-live-preview). There are no Pro models supported by the Live API gateway currently. This helper targets model names matching the gemini-3.x-flash-live naming structure.

@tSte

tSte commented Jun 10, 2026

Copy link
Copy Markdown

@AmaadMartin thanks a lot for this.

May I ask you if this is only first part of addressing #362 or complete implementation? Because what seems to be missing, is:

  1. The feature is still unreachable through the public API. [Feature Request] Gemini 2.5 / 3.1 realtime (Live API) support #362 is about running Live models through Runner.runLive(), correct? This PR only touches GeminiLlmConnection / google_llm.ts.
  • core/src/runner/runner.ts still has // TODO - b/425992518: Implement runLive and related methods.
  • LlmAgent.runLiveFlow still throws 'LlmAgent.runLiveFlow not implemented'
  • BaseAgent.runLive is still not implemented

Please, correct me if I am wrong, but after this merges, the only way to use the new code is to call llm.connect() directly and hand-roll the send/receive loops, bypassing agents, tools, sessions, plugins, and LiveRequestQueue. #325 implemented that missing layer (Runner.runLive with plugin lifecycle and event-persistence rules for audio inlineData, BaseAgent.runLive, LlmAgent.runLiveFlow with the queue-draining send loop, function-response ferrying back over the open socket, and transfer_to_agent recursion). Is this a planned follow-up?

  1. receive() terminates on the first turnComplete

LiveResponseAggregator sets isDone = true on turnComplete, and GeminiLlmConnection.receive() breaks out of its loop. #362 explicitly lists "keep receive() open across turns" as one of the two fork patches this should replace, and adk-python's receive() treats turn_complete as an in-stream signal and keeps iterating until the websocket closes. With this design a future runLiveFlow must re-invoke receive() per turn. Or am I off?

  1. Activity signals are not bridged

LiveRequestQueue already exposes sendActivityStart() / sendActivityEnd(), but the connection has no counterpart (sendRealtimeInput({activityStart: {}}) / ({activityEnd: {}})). Manual-VAD voice agents (automaticActivityDetection disabled) can't signal speech boundaries. #325 added optional sendActivityStart/sendActivityEnd to BaseLlmConnection + GeminiLlmConnection.

  1. Vertex location is hardcoded to 'global'

Correct me if I am wrong, but for deployments with data-residency requirements (EU regions), routing live audio through global is not acceptable as an unconditional override. Could this fall back to 'global' only when no location is configured, or be opt-in?

  1. E2E coverage doesn't seem to exercise the models from the issue

tests/e2e/live_model_test.ts connects with gemini-2.5-flash and responseModalities: ['text'], single turn. The issue is specifically about gemini-2.5-flash-preview-native-audio and gemini-3.1-flash-live-preview with audio; neither the audio path, the native-audio mime routing, nor multi-turn behaviour is covered end-to-end. Should there be some sort of tests for this?

@AmaadMartin

Copy link
Copy Markdown
Collaborator Author

Thanks for the feedback! We have updated the PR to address these points directly:

  • 1. Public API / Orchestration Layer (Runner.runLive):
    Yes, the high-level runner and agent orchestration flow (Runner.runLive, BaseAgent.runLive, etc.) is planned as an immediate follow-up to keep this PR focused and isolated to the connection/model layer. We will integrate the runner orchestration layer in a subsequent PR (using your PR feat: add bidirectional live streaming to Runner and LlmAgent #325).

  • 2. WebSocket receive loop lifetime:
    We have updated the receiving loop in GeminiLlmConnection.receive() in gemini_llm_connection.ts to stay open and continue consuming server messages from the queue across multiple turns, rather than exiting on the first turnComplete event.

  • 3. Activity signals bridging:
    Added sendActivityStart() and sendActivityEnd() methods to the connection layer in base_llm_connection.ts and gemini_llm_connection.ts to support manually signaling user activity boundaries.

  • 4. Vertex AI location configuration:
    Updated the client initialization in google_llm.ts to respect regional location configurations (falling back to 'global' only if no region is provided).

  • 5. Audio and multi-turn E2E coverage (including Gemini 3.1):

    • Implemented unmocked E2E tests in live_model_test.ts verifying PCM audio streaming, audio transcriptions, and multi-turn conversation persistence over Vertex AI using gemini-live-2.5-flash-native-audio.
    • Added a dedicated test case for gemini-3.1-flash-live-preview-04-2026 using the new input routing (sendRealtimeInput). Since this model is in private preview and requires explicit GCP project allowlisting on Vertex AI (yielding 1008 Policy Violation otherwise), we marked the test as skipped by default so that the test suite passes on non-allowlisted projects, while remaining runnable for allowlisted clients.
    • Unit test coverage in gemini_llm_connection_test.ts fully verifies the Gemini 3.x payload formatting and routing logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Gemini 2.5 / 3.1 realtime (Live API) support

3 participants