Feat: Implement Gemini Interaction API in adk-js#364
Conversation
…olve CI E401 error
…DK 2.0 type inheritance checks in CI
| text?: string; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| functionCall?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| functionResponse?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| inlineData?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| fileData?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| thoughtSignature?: any; | ||
| thought?: boolean; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| codeExecutionResult?: any; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| executableCode?: any; |
There was a problem hiding this comment.
please remove, use Part type from @google/genai
There was a problem hiding this comment.
I've audited the SDK type definitions and confirmed that the native Part type from @google/genai already natively contains all these fields. I have removed the ExtendedPart interface entirely and refactored the utility functions to parse standard Part objects directly.
| interface ExtendedTool { | ||
| functionDeclarations?: Array<{ | ||
| name: string; | ||
| description?: string; | ||
| parameters?: { | ||
| properties?: Record<string, unknown>; | ||
| required?: string[]; | ||
| }; | ||
| parametersJsonSchema?: unknown; | ||
| }>; | ||
| googleSearch?: unknown; | ||
| codeExecution?: unknown; | ||
| urlContext?: unknown; | ||
| } | ||
|
|
||
| interface InteractionTextContent { | ||
| type: 'text'; | ||
| text: string; | ||
| } | ||
|
|
||
| interface InteractionFunctionCall { | ||
| type: 'function_call'; | ||
| id: string; | ||
| name: string; | ||
| arguments: Record<string, unknown>; | ||
| thought_signature?: string; | ||
| } | ||
|
|
||
| interface InteractionFunctionResult { | ||
| type: 'function_result'; | ||
| name: string; | ||
| call_id: string; | ||
| result: unknown; | ||
| } | ||
|
|
||
| interface InteractionMediaContent { | ||
| type: 'image' | 'audio' | 'video' | 'document'; | ||
| data?: string; | ||
| uri?: string; | ||
| mime_type: string; | ||
| } | ||
|
|
||
| interface InteractionThought { | ||
| type: 'thought'; | ||
| signature?: string; | ||
| } | ||
|
|
||
| interface InteractionCodeExecutionCall { | ||
| type: 'code_execution_call'; | ||
| id: string; | ||
| arguments: { | ||
| code: string; | ||
| language: string; | ||
| }; | ||
| } | ||
|
|
||
| interface InteractionCodeExecutionResult { | ||
| type: 'code_execution_result'; | ||
| call_id: string; | ||
| result: string; | ||
| is_error: boolean; | ||
| } | ||
|
|
||
| type InteractionContent = | ||
| | InteractionTextContent | ||
| | InteractionFunctionCall | ||
| | InteractionFunctionResult | ||
| | InteractionMediaContent | ||
| | InteractionThought | ||
| | InteractionCodeExecutionCall | ||
| | InteractionCodeExecutionResult; | ||
|
|
||
| interface InteractionTurn { | ||
| role: string; | ||
| content: InteractionContent[]; | ||
| } | ||
|
|
||
| interface InteractionTool { | ||
| type: 'function' | 'google_search' | 'code_execution' | 'url_context'; | ||
| name?: string; | ||
| description?: string; | ||
| parameters?: unknown; | ||
| } | ||
|
|
||
| interface InteractionResponse { | ||
| id: string; | ||
| status: 'completed' | 'requires_action' | 'failed' | string; | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| outputs?: Record<string, unknown>[]; | ||
| usage?: { | ||
| total_input_tokens?: number; | ||
| total_output_tokens?: number; | ||
| }; | ||
| } | ||
|
|
||
| interface InteractionSSEEvent { | ||
| event_type?: string; | ||
| eventType?: string; | ||
| delta?: { | ||
| type: string; | ||
| text?: string; | ||
| name?: string; | ||
| id?: string; | ||
| arguments?: Record<string, unknown>; | ||
| thought_signature?: string; | ||
| data?: string; | ||
| uri?: string; | ||
| mime_type: string; | ||
| }; | ||
| status?: string; | ||
| error?: { | ||
| code: string; | ||
| message: string; | ||
| }; | ||
| code?: string; | ||
| message?: string; | ||
| interaction_id?: string; | ||
| interactionId?: string; | ||
| interaction?: { | ||
| id: string; | ||
| }; | ||
| id?: string; | ||
| } | ||
|
|
||
| interface GoogleGenAIWithInteractions { | ||
| interactions: { | ||
| create(params: { | ||
| model?: string; | ||
| input: InteractionTurn[]; | ||
| stream: boolean; | ||
| systemInstruction?: string; | ||
| tools?: InteractionTool[]; | ||
| generationConfig?: Record<string, unknown>; | ||
| previousInteractionId?: string; | ||
| // eslint-disable-next-line @typescript-eslint/no-explicit-any | ||
| }): Promise<any>; // We keep 'any' here as the SDK return type is complex (stream vs non-stream) | ||
| }; | ||
| } |
There was a problem hiding this comment.
Can we get those types from @google/genai?
There was a problem hiding this comment.
I have replaced all custom interaction types (InteractionContent, InteractionTurn, InteractionTool, InteractionResponse, InteractionSSEEvent, etc.) with native counterparts provided by @google/genai (like Interactions.Content, Interactions.Turn, Interactions.Tool, and Interactions.Interaction).
To handle runtime/SDK discrepancies typesafely without casting to any:
- Defined local
ExtendedInteractionandExtendedInteractionStatusUpdateinterfaces extending the SDK types to cleanly declare the runtimeerrorfields. - Adjusted the stream event parser to fall back to
delta.signature || delta.thought_signatureand correctly map nestederrorobjects for standardErrorEvents. - Removed the obsolete
GoogleGenAIWithInteractionsclient wrapper sinceGoogleGenAIhas a nativeinteractionsgetter.
| if (mimeType.startsWith('image/')) { | ||
| return 'image'; | ||
| } else if (mimeType.startsWith('audio/')) { | ||
| return 'audio'; | ||
| } else if (mimeType.startsWith('video/')) { | ||
| return 'video'; | ||
| } else { | ||
| return 'document'; | ||
| } |
There was a problem hiding this comment.
please use switch case instead
There was a problem hiding this comment.
Refactored getInteractionMediaType to split mimeType by / and use a clean switch statement on the primary media type prefix (e.g., image, audio).
fdad857 to
c4561d4
Compare
Please ensure you have read the contribution guide before creating a pull request.
Link to Issue or Description of Change
N/A
Closes: #issue_number
Related: #issue_number
2. Or, if no issue exists, describe the change:
This PR implements the next-generation stateful Gemini Interaction API integration in
adk-js, mirroring the design and functionality already present inadk-python. This enables stateful, multi-turn conversations by tracking interaction history server-side usinginteractionId, reducing payload sizes across progressive turns.If applicable, please follow the issue templates to provide as much detail as possible.
Problem:
The current
adk-jscore only supports stateless execution via the standardgenerateContentAPI, which requires sending the entire conversational history back and forth on every turn. This increases payload sizes, causes overhead, and prevents leveraging server-side interaction history tracking.Solution:
previousInteractionId?: stringtoLlmRequestincore/src/models/llm_request.ts.interactionId?: stringtoLlmResponseincore/src/models/llm_response.ts.InteractionsRequestProcessorundercore/src/agents/processors/interactions_request_processor.ts. It automatically traverses the session events history in reverse to find the latest validinteractionIdfor the current branch and sub-agent name, injecting it aspreviousInteractionIdinto the outgoing request.INTERACTIONS_REQUEST_PROCESSORinLlmAgentrequest processors, immediately following theCONTENT_REQUEST_PROCESSOR.core/src/models/interactions_utils.tscontaining:getLatestUserContentsto trim the outgoing conversation history, sending only the latest continuous user turn whenpreviousInteractionIdis present (with special handling to retain the preceding model turn's function call if the user turn contains a function response).@google/genaiInteractions REST schemas (and vice-versa).generateContentViaInteractionswrapping@google/genaiinteractions resource calls.Geminiclass (core/src/models/google_llm.ts) to acceptuseInteractionsApi?: booleanparameter, toggling the flow to delegate togenerateContentViaInteractionswhen enabled.Testing Plan
Please describe the tests that you ran to verify your changes. This is required for all PRs that are not small documentation or typo fixes.
Unit Tests:
We implemented extensive unit tests targeting the stateful request processor and the payload converters:
core/test/agents/processors/interactions_request_processor_test.ts(6 tests)core/test/models/interactions_utils_test.ts(89 tests)Summary of passed npm test results:
We achieved 100% Statement, Branch, Function, and Line coverage for both new source files in
adk-js/core:core/src/agents/processors/interactions_request_processor.ts: 100% Coveragecore/src/models/interactions_utils.ts: 100% CoverageManual End-to-End (E2E) Tests:
We created a verification script
verify_interactions.tsin the root of the workspace. It tests a two-turn conversation:interactionId).previousInteractionIdset (verifies history is trimmed and the model correctly recalls "blue" from the server-side state).To execute manual verification:
Checklist
Additional context
TAG=agy
CONV=8a91ed6a-f4db-4160-83d9-68d5e80e066c