feat(salesforce-data-cloud): generic ssot_get agent op for SFDC SSOT reads (YET-1615)#340
Open
benitezfede wants to merge 4 commits into
Open
feat(salesforce-data-cloud): generic ssot_get agent op for SFDC SSOT reads (YET-1615)#340benitezfede wants to merge 4 commits into
benitezfede wants to merge 4 commits into
Conversation
…eads
Add SalesforceDataCloudProxyClient.ssot_get(path) so the data-collector can
read Salesforce SSOT metadata endpoints (/services/data/{v}/ssot/* — data
streams, catalogs, schemas) through the agent. The agent owns building the URL
from the connection's credentials and authenticating the call, so the
customer's token never leaves the agent — only the JSON body is returned. This
lets a single op serve both the self-hosted and remote-agent paths (YET-1615).
Authentication mirrors list_dataspaces: a short-lived core token is minted via
the client-credentials grant and sent as a Bearer credential. SSOT endpoints
are core REST on the My Domain authed with the core token (NOT the Data Cloud
query token from the a360 exchange). The shared mint logic is extracted into
_mint_core_token, now used by both list_dataspaces and ssot_get.
path must be a relative path beginning with '/'; absolute and protocol-relative
values are rejected so the minted token is only ever sent to the connection's
own My Domain. Non-200/non-JSON responses surface as RuntimeError with `code
NNN` and a redacted body.
YET-1615
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address apollo-agent#340 review findings (YET-1615): - ssot_get: reject a non-str/empty path with a clean ValueError (was an opaque AttributeError); annotate the return as `dict | list` since some core-REST endpoints return a top-level JSON array. - _redact_body: mask access_token / refresh_token / id_token / signature in both single- and double-quoted forms (was access_token, single-quoted only) — the new ssot_get error path flows through this redactor. Move the stray mid-file `import re` into the top import block. - tests: add query-string forwarding, non-JSON-200, error-body redaction, and a direct _redact_body unit test. 37 tests pass; pyright (basic) clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a generic ssot_get(path) operation to the Salesforce Data Cloud proxy client so the data-collector can read Salesforce core REST SSOT metadata endpoints (/services/data/{v}/ssot/*) via the agent, keeping OAuth tokens inside the agent.
Changes:
- Introduces
SalesforceDataCloudProxyClient.ssot_get()and factors core OAuth minting into_mint_core_token(). - Expands
_redact_body()to mask additional sensitive token-like keys across both single- and double-quoted payload representations. - Adds unit tests covering SSOT GET success, path safety checks, error surfacing/redaction, query forwarding, and non-JSON handling.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| apollo/integrations/db/salesforce_data_cloud_proxy_client.py | Adds ssot_get, refactors core token minting, and broadens response-body redaction. |
| tests/test_salesforce_data_cloud_client.py | Adds unit tests for ssot_get behavior, safety guards, and redaction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Validate path via urlsplit (reject scheme/netloc) instead of substring matching, so a query string embedding a URL (pagination cursor) is allowed - Log only the path component plus a has_query flag; query-string values never appear in logs or error messages - Align _mint_core_token and ssot_get docstrings with actual RuntimeError formats (code NNN vs HTTP NNN) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
benitezfede
added a commit
that referenced
this pull request
Jul 2, 2026
- Validate path via urlsplit (reject scheme/netloc) instead of substring matching, so a query string embedding a URL (pagination cursor) is allowed - Log only the path component plus a has_query flag; query-string values never appear in logs or error messages - Align _mint_core_token and ssot_get docstrings with actual RuntimeError formats (code NNN vs HTTP NNN) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
Adds a generic
ssot_get(path)operation to the Salesforce Data Cloud proxy client so Monte Carlo's data-collector can read the Salesforce SSOT metadata endpoints (/services/data/{v}/ssot/*— data streams, calculated insights, catalogs, schemas) through the agent.This is the apollo-agent side of YET-1614 and unblocks YET-1616 (the data-collector will route all agent
/ssottraffic through this op). It lets a single code path serve both self-hosted and remote-agent connections: the agent builds the URL from the connection's own credentials, authenticates the call, and returns only the JSON body — the customer's token never leaves the agent.Today self-hosted Data Cloud connections can't collect Data Streams freshness/volume because the
/ssotGET needs a token the agent won't release to the DC;ssot_getcloses that gap by keeping the token inside the agent.How it works
ssot_get(path)→GET https://{domain}{path}, withpathsupplied verbatim by the DC (version-agnostic — the DC owns the API version and any pagination cursor).list_dataspacescore-REST call: mint a short-lived client-credentials core token (POST /services/oauth2/token) and send it asAuthorization: Bearer. SSOT endpoints are core REST on the My Domain authed with the core token — not the Data Cloud query token from the a360 exchange (that exchange revokes the core token and targets a different host). The shared mint is extracted into_mint_core_token, now used by bothlist_dataspacesandssot_get.pathmust be a relative path beginning with/; absolute and protocol-relative values are rejected so the minted token is only ever sent to the connection's own My Domain. The token is never returned to the caller, and response bodies are redacted (_redact_body) before appearing in errors/logs.RuntimeErrorcarryingcode NNN(so the DC can extract the HTTP status) with a redacted body._resolve_method— no registry change.Testing
test_ssot_get_*tests (token mint + authenticated GET, absolute-URL rejection, protocol-relative rejection, non-200code NNN+ redaction). Full file 33/33 green; pyright (basic) clean. Thelist_dataspacesrefactor onto_mint_core_tokenis covered by its existing tests.ssot_get('/services/data/v62.0/ssot/data-streams?limit=100')returned 10 real data streams;…/ssot/calculated-insightsreturned its collection; no credential/token leak in result or logs.Checklist