Skip to content

feat(provider): add native Google Vertex AI Gemini provider#1428

Open
gitdevine wants to merge 6 commits into
Gitlawb:mainfrom
gitdevine:feat/native-gemini-vertex
Open

feat(provider): add native Google Vertex AI Gemini provider#1428
gitdevine wants to merge 6 commits into
Gitlawb:mainfrom
gitdevine:feat/native-gemini-vertex

Conversation

@gitdevine
Copy link
Copy Markdown
Contributor

What

Adds a native Google Vertex AI Gemini provider that calls Vertex's
publishers/google/models/{model}:generateContent directly (ADC auth), with a
full Anthropic-shaped request/response + streaming contract, tool calling, and
support for Gemini 3 "thinking" models.

Why

Routing Gemini through the Anthropic Vertex path (publishers/anthropic) only
works for Claude models. A native transport lets OpenClaude drive Gemini on
Vertex end-to-end (text + tools), including the gemini-3.x / gemini-2.5-pro
thinking models.

Highlights / impact

  • Provider wizard step for project + location (uses ADC — no API key).
  • Native client: Anthropic ⇄ Vertex translation (contents,
    systemInstruction, tools, toolConfig, generationConfig) + streaming.
  • Tool schemas translated via an allowlist into Vertex's OpenAPI subset
    (UPPERCASE types; constenum; exclusiveMinimumminimum; drops
    propertyNames/$schema/additionalProperties/…).
  • Gemini "thinking" handling: thoughtSignature round-trip, temperature
    floored to 1.0 (Google warns <1.0 degrades/loops), maxOutputTokens floored
    so the model can think and answer.
  • functionResponse turns kept pure (trailing reminder text split into its
    own turn) — otherwise Vertex returns an empty STOP response after a tool
    result.
  • Explicit, diagnosable empty-response errors (no silent failures).

Providers affected

Only the new gemini-vertex first-party path. Existing providers (Anthropic,
OpenAI, Gemini-via-shim, MiniMax, Ollama, …) are untouched; the Anthropic
Vertex (Claude) path is explicitly guarded so Gemini routing can't hijack it.

Tested

  • Real run: gemini-3.5-flash on Vertex (global endpoint) — greeting + Skill
    tool-call round-trip working end-to-end.
  • bun test src/services/api/geminiVertexClient.test.ts (12 pass)
  • bun test src/services/api/client.test.ts (20 pass)
  • bun run smoke

Follow-up / limitations

  • thoughtSignature is smuggled through the tool_use id to survive the
    Anthropic message round-trip; a dedicated field would be cleaner if the
    message schema gains one.
  • Streaming currently emits the aggregated response as Anthropic stream events
    rather than incremental SSE from Vertex.

🤖 Generated with Claude Code

gitdevine and others added 6 commits May 29, 2026 14:32
Adds a `gemini-vertex` transport that routes requests directly to the
Google Vertex AI Gemini API (`generateContent`) instead of going through
an OpenAI-compatible shim.

Key additions:
- `src/integrations/vendors/gemini-vertex.ts` — vendor descriptor with
  ADC auth, regional/global endpoint routing, model catalog
- `src/services/api/geminiVertexClient.ts` — native Vertex client with
  Anthropic-compatible surface (messages.create, beta.messages.create,
  withResponse streaming contract); extracts usageMetadata token counts
- `src/utils/geminiAuth.ts` — getGeminiVertexLocation/Model/ProjectId
  helpers and DEFAULT_GEMINI_VERTEX_MODEL constant
- `src/utils/providerProfile.ts` — buildGeminiVertexProfileEnv(),
  gemini-vertex branch in buildLaunchEnv()
- `src/utils/providerProfiles.ts` — resolveProfileCompatibility,
  applyProviderProfileToProcessEnv, isProcessEnvAlignedWithProfile
  all handle the gemini-vertex case
- `src/services/api/client.ts` — routes to GeminiVertexClient when
  CLAUDE_CODE_USE_GEMINI_VERTEX=1 is set
- `src/services/api/providerConfig.ts` — exposes vertexProject and
  vertexLocation in ResolvedProviderRequest
- `src/integrations/generated/integrationArtifacts.generated.ts` —
  registers gemini-vertex vendor and preset
- UI: provider wizard, ProviderManager, buildCurrentProviderSummary
  all show/handle Google Vertex AI Gemini option
- 193 unit tests pass; real Vertex AI call verified (Paris, tokens 15/1)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…point only)

gemini-3.5-flash is available on the global Vertex AI endpoint but not
regional ones. Verified live: works with location=global. It is a
thinking model requiring at least ~1000+ output tokens.

Also added contextWindow/maxOutputTokens metadata to gemini-2.5-pro.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Adds gemini-vertex to recognized profiles in provider-launch.ts
- Validates that GEMINI_VERTEX_PROJECT is set before launching
- Adds printSummary message for gemini-vertex
- Adds dev:gemini-vertex npm script in package.json

Usage: bun run dev:gemini-vertex
Requires: GEMINI_VERTEX_PROJECT set in env or saved profile

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Build on the native gemini-vertex provider with the user-facing and
defensive fixes needed to actually use it end-to-end.

ProviderManager.tsx
- New preset-gemini-vertex-project screen: ask for the Google Cloud
  project ID before the model step (pre-fills from
  GOOGLE_CLOUD_PROJECT/GCLOUD_PROJECT/GOOGLE_PROJECT_ID).
- Skip preset-api-key for gemini-vertex (it uses ADC, not an API key).
- Esc navigation flows: choose -> project -> model -> choose.

providerProfiles.ts
- applyProviderProfileToProcessEnv now uses profile.baseUrl as the
  GEMINI_VERTEX_PROJECT when activating a gemini-vertex profile mid
  session (matches how startup applies it).

client.ts (intent guard)
- Read the active provider profile and force the Vertex Gemini branch
  when its provider is 'gemini-vertex', even when CLAUDE_CODE_USE_VERTEX
  is set in the shell. Conversely, the Anthropic Vertex branch refuses
  to fire when the active profile is gemini-vertex. Prevents stale shell
  setx CLAUDE_CODE_USE_VERTEX=1 from routing Gemini requests to
  publishers/anthropic/models/* (404).

providerProfile.ts
- Add CLOUD_ML_REGION and ANTHROPIC_VERTEX_PROJECT_ID to
  PROFILE_ENV_KEYS so they get cleared on every profile switch, never
  letting a stale region (e.g. typoed 'glogal') leak across profiles.

geminiVertexClient.ts
- Pass openclaude's system prompt to Vertex as systemInstruction
  instead of dropping it.
- Raise maxOutputTokens floor to 8192 for thinking-capable models
  (gemini-3.x, gemini-2.5-pro) so internal thinking doesn't eat the
  entire budget on short turns.
- Surface every silent-empty-response path as a descriptive error:
  prompt-level block, MAX_TOKENS with thinking-budget hint,
  SAFETY/RECITATION/BLOCKLIST refusals, and the empty-text fallback.
  Without this the empty assistant message is filtered downstream and
  the user sees nothing.

model.ts
- gemini-vertex cases in getSmallFastModel, getDefaultHaikuModel,
  getDefaultSonnetModel, getDefaultOpusModel,
  getDefaultMainLoopModelSetting, getUserSpecifiedModelSetting. Without
  these the background helpers fell back to claude-* names (which don't
  exist on Vertex Gemini).
- Defensive guard at the top of firstPartyNameToCanonical.

tokenEstimation.ts
- Defensive guard in getTokenizerConfig so an unexpected undefined
  model doesn't take the render layer down.

messages.ts
- default branch in normalizeMessages so unrecognised message types
  no longer leak undefined slots into the result array.
- Defensive guard at the top of isNotEmptyMessage.

provider.tsx
- Legacy /provider wizard kept consistent with the new 3-step flow.
…Calls

Without tools the Vertex Gemini client could chat but every agentic prompt
(which always declares tools) crashed with finishReason=MALFORMED_FUNCTION_CALL:
the model tried to emit a function call, Vertex had no declared tools to
match it against, and finished with malformed output and no text. The user
saw nothing or a useless error.

This adds round-trip tool support so the same client actually works for
openclaude's agent loop.

Request side
- toVertexSchema(): translate Anthropic input_schema (lowercase JSON
  Schema) to Vertex's UPPERCASE OpenAPI-style schema. Recursively walks
  properties/items/anyOf and drops fields Vertex rejects ($schema, $ref,
  additionalProperties, allOf collapsed to anyOf, etc.) instead of
  failing the whole request on one odd field.
- toGeminiTools(): build `tools: [{ functionDeclarations: [...] }]` from
  params.tools. Drops tools without an input_schema so server-side
  connector tools don't pollute the declaration list.
- toGeminiToolConfig(): map Anthropic tool_choice (auto/any/tool/none)
  to Vertex's toolConfig.functionCallingConfig with mode + allowedFunctionNames.

History side (toGeminiContents)
- assistant tool_use blocks → model functionCall parts. Keeps a
  toolUseId → name map so the matching tool_result wires up correctly.
- user tool_result blocks → user functionResponse parts using the
  remembered function name.
- Stringifies array-shaped tool_result content (each {text} block joined)
  so structured tool output reaches the model.

Response side
- extractContentBlocks(): build Anthropic-shaped content blocks (text +
  tool_use) preserving order. Synthesises stable toolu_vertex_* ids so
  the agent loop can wire the follow-up tool_result back.
- GeminiVertexMessage.content now allows tool_use blocks; stop_reason is
  'tool_use' on function-call turns so the agent loop runs the tool
  instead of treating it as a final reply.
- toAnthropicStream loops over all blocks and emits one
  start/delta/stop trio per block (text_delta or input_json_delta).
- Empty-response guards no longer fire on function-call-only turns —
  those are valid agent output.
- Added MALFORMED_FUNCTION_CALL to the descriptive error list (it should
  no longer fire now that tools are forwarded, but if a custom tool
  produces an unparseable schema we surface a useful message).

Tests
- New: tools are translated with UPPERCASE types and additionalProperties
  is dropped.
- New: a functionCall-only response becomes a tool_use content block with
  stop_reason='tool_use'.
- Updated: maxOutputTokens=321 boosted to 8192 for the gemini-3.5-flash
  thinking-model floor (existing behaviour, now reflected in the test).
- translate tool schemas via an allowlist (drop propertyNames/$schema/…,
  const→enum, exclusiveMinimum→minimum) to satisfy Vertex's OpenAPI subset
- round-trip Gemini thinking thoughtSignature through tool_use ids
- clamp thinking-model temperature to 1.0 (Google warns <1.0 degrades/loops)
- keep functionResponse turns pure (split trailing text) so the model does
  not return an empty STOP response after a tool result
- add a non-sensitive diagnostic to empty-response errors

Provider path tested: gemini-3.5-flash on Vertex (global endpoint), real
greeting + Skill tool call. Checks: bun test geminiVertexClient.test.ts
(12 pass), bun run build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@gitdevine gitdevine force-pushed the feat/native-gemini-vertex branch from a16f0bb to 653434a Compare May 29, 2026 12:58
@gitdevine gitdevine closed this May 29, 2026
@gitdevine gitdevine reopened this May 29, 2026
@openbrandev
Copy link
Copy Markdown

The failing smoke-and-tests check is pre-existing flakiness on main, not
introduced by this PR.

Proof: main's own PR Checks run at the exact commit this branch is based on
(70b4b07), with none of my changes, fails on the same four tests:
https://github.com/Gitlawb/openclaude/actions/runs/26583255103

  • getAttributionTexts > preserves includeCoAuthoredBy true as an explicit old-default opt-in
  • active auto-compact cooldown blocks before model call with cooldown guidance
  • auto-compact cooldown tracking is carried into the next query call
  • breaker metadata tracking callback publishes a fresh object

The same commit also has a passing run, i.e. these are flaky: they pass in
isolation but fail when the whole suite runs in one process
(bun test --max-concurrency=1), which points to cross-test state leakage
(shared process.env / the shared mutation lock), not a real regression.

None of these tests are related to the Gemini Vertex provider. This PR's own
tests pass (geminiVertexClient.test.ts, plus the gemini-vertex cases in
client.test.ts).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants