Skip to content

feat(salesforce-data-cloud): generic ssot_get agent op for SFDC SSOT reads (YET-1615)#340

Open
benitezfede wants to merge 4 commits into
mainfrom
fbenitez/yet-1615-agent-sfdc-ssot-get
Open

feat(salesforce-data-cloud): generic ssot_get agent op for SFDC SSOT reads (YET-1615)#340
benitezfede wants to merge 4 commits into
mainfrom
fbenitez/yet-1615-agent-sfdc-ssot-get

Conversation

@benitezfede

Copy link
Copy Markdown
Contributor

What & why

Adds a generic ssot_get(path) operation to the Salesforce Data Cloud proxy client so Monte Carlo's data-collector can read the Salesforce SSOT metadata endpoints (/services/data/{v}/ssot/* — data streams, calculated insights, catalogs, schemas) through the agent.

This is the apollo-agent side of YET-1614 and unblocks YET-1616 (the data-collector will route all agent /ssot traffic through this op). It lets a single code path serve both self-hosted and remote-agent connections: the agent builds the URL from the connection's own credentials, authenticates the call, and returns only the JSON body — the customer's token never leaves the agent.

Today self-hosted Data Cloud connections can't collect Data Streams freshness/volume because the /ssot GET needs a token the agent won't release to the DC; ssot_get closes that gap by keeping the token inside the agent.

How it works

  • ssot_get(path)GET https://{domain}{path}, with path supplied verbatim by the DC (version-agnostic — the DC owns the API version and any pagination cursor).
  • Auth mirrors the existing list_dataspaces core-REST call: mint a short-lived client-credentials core token (POST /services/oauth2/token) and send it as Authorization: Bearer. SSOT endpoints are core REST on the My Domain authed with the core tokennot the Data Cloud query token from the a360 exchange (that exchange revokes the core token and targets a different host). The shared mint is extracted into _mint_core_token, now used by both list_dataspaces and ssot_get.
  • Security: path must be a relative path beginning with /; absolute and protocol-relative values are rejected so the minted token is only ever sent to the connection's own My Domain. The token is never returned to the caller, and response bodies are redacted (_redact_body) before appearing in errors/logs.
  • Non-200 / non-JSON responses surface as RuntimeError carrying code NNN (so the DC can extract the HTTP status) with a redacted body.
  • Auto-dispatched by name via the agent's _resolve_method — no registry change.

Testing

  • Unit: 4 new test_ssot_get_* tests (token mint + authenticated GET, absolute-URL rejection, protocol-relative rejection, non-200 code NNN + redaction). Full file 33/33 green; pyright (basic) clean. The list_dataspaces refactor onto _mint_core_token is covered by its existing tests.
  • Live: validated against a real Salesforce Data Cloud dev org — ssot_get('/services/data/v62.0/ssot/data-streams?limit=100') returned 10 real data streams; …/ssot/calculated-insights returned its collection; no credential/token leak in result or logs.

Checklist

  • Tests added/updated (unit + live validation against a real org)
  • Types: pyright (basic) clean
  • No new dependencies
  • Token never leaves the agent; path-safety guard added
  • Deployed to dev
  • Integration tested on dev

…eads

Add SalesforceDataCloudProxyClient.ssot_get(path) so the data-collector can
read Salesforce SSOT metadata endpoints (/services/data/{v}/ssot/* — data
streams, catalogs, schemas) through the agent. The agent owns building the URL
from the connection's credentials and authenticating the call, so the
customer's token never leaves the agent — only the JSON body is returned. This
lets a single op serve both the self-hosted and remote-agent paths (YET-1615).

Authentication mirrors list_dataspaces: a short-lived core token is minted via
the client-credentials grant and sent as a Bearer credential. SSOT endpoints
are core REST on the My Domain authed with the core token (NOT the Data Cloud
query token from the a360 exchange). The shared mint logic is extracted into
_mint_core_token, now used by both list_dataspaces and ssot_get.

path must be a relative path beginning with '/'; absolute and protocol-relative
values are rejected so the minted token is only ever sent to the connection's
own My Domain. Non-200/non-JSON responses surface as RuntimeError with `code
NNN` and a redacted body.

YET-1615

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@benitezfede benitezfede requested a review from a team as a code owner June 25, 2026 13:15
@linear

linear Bot commented Jun 25, 2026

Copy link
Copy Markdown

YET-1615

benitezfede and others added 2 commits June 25, 2026 10:30
Address apollo-agent#340 review findings (YET-1615):
- ssot_get: reject a non-str/empty path with a clean ValueError (was an opaque
  AttributeError); annotate the return as `dict | list` since some core-REST
  endpoints return a top-level JSON array.
- _redact_body: mask access_token / refresh_token / id_token / signature in
  both single- and double-quoted forms (was access_token, single-quoted only) —
  the new ssot_get error path flows through this redactor. Move the stray
  mid-file `import re` into the top import block.
- tests: add query-string forwarding, non-JSON-200, error-body redaction, and
  a direct _redact_body unit test.

37 tests pass; pyright (basic) clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a generic ssot_get(path) operation to the Salesforce Data Cloud proxy client so the data-collector can read Salesforce core REST SSOT metadata endpoints (/services/data/{v}/ssot/*) via the agent, keeping OAuth tokens inside the agent.

Changes:

  • Introduces SalesforceDataCloudProxyClient.ssot_get() and factors core OAuth minting into _mint_core_token().
  • Expands _redact_body() to mask additional sensitive token-like keys across both single- and double-quoted payload representations.
  • Adds unit tests covering SSOT GET success, path safety checks, error surfacing/redaction, query forwarding, and non-JSON handling.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
apollo/integrations/db/salesforce_data_cloud_proxy_client.py Adds ssot_get, refactors core token minting, and broadens response-body redaction.
tests/test_salesforce_data_cloud_client.py Adds unit tests for ssot_get behavior, safety guards, and redaction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread apollo/integrations/db/salesforce_data_cloud_proxy_client.py
Comment thread apollo/integrations/db/salesforce_data_cloud_proxy_client.py Outdated
Comment thread apollo/integrations/db/salesforce_data_cloud_proxy_client.py Outdated
- Validate path via urlsplit (reject scheme/netloc) instead of substring
  matching, so a query string embedding a URL (pagination cursor) is allowed
- Log only the path component plus a has_query flag; query-string values
  never appear in logs or error messages
- Align _mint_core_token and ssot_get docstrings with actual RuntimeError
  formats (code NNN vs HTTP NNN)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
benitezfede added a commit that referenced this pull request Jul 2, 2026
- Validate path via urlsplit (reject scheme/netloc) instead of substring
  matching, so a query string embedding a URL (pagination cursor) is allowed
- Log only the path component plus a has_query flag; query-string values
  never appear in logs or error messages
- Align _mint_core_token and ssot_get docstrings with actual RuntimeError
  formats (code NNN vs HTTP NNN)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants