Skip to content

fix: normalize OAuth discovery URLs to use host_name config#186

Open
rachana-promantia wants to merge 28 commits into
buildswithpaul:developfrom
rachana-promantia:fix/oauth-discovery-https-normalization
Open

fix: normalize OAuth discovery URLs to use host_name config#186
rachana-promantia wants to merge 28 commits into
buildswithpaul:developfrom
rachana-promantia:fix/oauth-discovery-https-normalization

Conversation

@rachana-promantia
Copy link
Copy Markdown
Contributor

Summary

OAuth discovery endpoint was returning http:// URLs even when host_name
was configured as https://. This breaks hosted OAuth clients like Claude
that reject non-HTTPS token endpoints.

Type of Change

  • Bug fix

Related Issues

Fixes #156

Checklist

  • I have targeted the develop branch (not main)
  • My code follows the existing code style
  • I have tested my changes locally

buildswithpaul and others added 26 commits May 6, 2026 09:42
The repair patch from v2.4.1 hard-coded reads/writes of the skill_name
column on both tabFAC Skill and tabSkill. Some HRMS releases ship the
Skill doctype with autoname=field:skill_name and a dedicated skill_name
column; newer releases drop that field and use name directly. On the
latter, bench migrate fails with:

  pymysql.err.OperationalError: (1054, \"Unknown column 'skill_name' in 'SELECT'\")

Probe both tables for the column at runtime and conditionally include
it in the SELECT and INSERT, so the patch is a no-op safe across either
HRMS schema.
fix: accept JSON-string roles in update_tool_role_access
The pre-insert required-field check ran against the user-supplied data
dict before any defaults were applied, and the docfield filter excluded
only fields with a static 'default'. Fields populated dynamically by
frappe.new_doc() (company from User permissions, selling_price_list
from Selling Settings, naming_series from autoname rules, etc.) carry
no static default on the docfield, so the check rejected requests that
should have succeeded — surfacing as false 'Missing required fields:
company, selling_price_list, naming_series' errors when those values
were already on the doc.

Run the check after assigning the user-supplied values onto the doc,
and check doc.get(fieldname) instead of data[fieldname]. Drops the
'not f.default' filter since the populated-doc check covers both
static-default and dynamically-populated cases without producing false
positives.
The v15 CI install step was failing because `bench get-app` invokes
`yarn install --check-files` against the new package.json, and
@commitlint/cli@20's transitive dep `global-directory@5` requires
Node >= 20. Frappe v15's CI environment uses Node 18, so yarn aborted:

  error global-directory@5.0.0: The engine "node" is incompatible with
  this module. Expected version ">=20". Got "18.20.8"

- package.json: pin @commitlint/cli and config-conventional to ^19.6.0
  (last major still supporting Node 18). Add engines.node >=18.
- .pre-commit-config.yaml: pin additional_dependencies to v19 to match.
- Drop package-lock.json, ship yarn.lock instead. Bench runs yarn
  against every app, so a stray npm lockfile triggers a warning and
  invites resolution drift.
- PRE_COMMIT_SETUP.md: update install instructions to yarn, document
  the Node 18 / v19 constraint so future bumps don't reintroduce it.
filter_sensitive_fields was extended to require a third positional
user_role argument. The two MCP-path callers (get_document,
list_documents) were updated; chatgpt_fetch was missed and crashed on
every call with:

  TypeError: filter_sensitive_fields() missing 1 required positional
  argument: 'user_role'

The ChatGPT fetch tool also bypassed FAC's standard auth pipeline,
calling doc.has_permission("read") directly instead of
validate_document_access. That weaker check skipped role-based doctype
gating and the restricted-doctype list, and was the structural reason
the user_role plumbing went missing in the first place — the tool
never resolved a role at all.

Switch to the same validate_document_access flow get_document uses,
take user_role from its result, and pass it through to
filter_sensitive_fields. Drop the catch-all except Exception that
masked the original TypeError as a misleading 'Error fetching document'
ValueError; let unexpected exceptions propagate to _safe_execute so
the underlying error surfaces in logs.
chore: add commitlint hook and pip-audit pre-commit setup
The output_data column on Assistant Audit Log was rendering token-count
metrics (input_tokens, output_tokens, total_tokens, tokens_used) as
"***REDACTED***" because BaseTool._sanitize_data and audit_trail._sanitize_arguments
both used a substring blocklist that flagged any key containing the word
"token" — designed to catch access_token / api_token, but accidentally
catching the metric shape.

This shows up as soon as a tool returns LLM token counts in its result
dict. The dedicated audit columns (set via the row's own fields) are
unaffected; the readable output_data JSON was the only collateral.

Replaced the substring blocklist with a regex-backed _is_sensitive_key
helper. The regex matches ``token`` only when preceded/followed by a
non-letter, so:

- Redacted: token, access_token, refresh_token, jwt_token, bearer_token
- Preserved: input_tokens, output_tokens, total_tokens, tokens_used
- Plus the unchanged always-sensitive list: password, secret, api_key,
  apikey, auth, bearer, credential, private_key

audit_trail._sanitize_arguments now delegates to the same helper so the
defensive sink and BaseTool agree on what's a credential.

Tests added: TestSensitiveKeyMatcher in test_audit_log.py covering 21 key
shapes plus None/int defensive paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(audit-log): preserve token-count fields in sanitized output
…hild-doctype calls

Closes buildswithpaul#168 (item 1).

Child-table fields can now be updated through the parent doc with patch or
replace semantics, picked per-table from the input shape: any row carrying a
'name' triggers patch mode (matched rows are updated, unmatched rows are
appended, '_delete: true' removes a row, untouched rows are preserved); no
'name' on any row triggers replace mode (clear and re-append, matching
create_document semantics). Restricted-field filtering now also runs against
the child doctype's SENSITIVE/ADMIN_ONLY sets.

Also reject direct calls with a child-table doctype as the target (e.g.
"Sales Order Item"). Saving a child row in isolation bypasses the parent's
validate() pipeline, leaving derived fields like row 'amount' and parent
'total'/'total_qty'/'grand_total' stale. The tool now resolves the parent
doc + table fieldname and returns a structured 'child_doctype_direct_update'
error with a ready-to-retry suggestion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(update_document): support child-table updates; reject direct child-doctype calls
…idate()

Issue buildswithpaul#165 follow-up.

The earlier fix moved the required-fields check from `data` to `doc.get(f)`,
but the check still ran *before* `doc.insert()`. Many doctypes — Quotation
in particular — populate `reqd` fields like `conversion_rate`,
`price_list_currency`, and `plc_conversion_rate` inside the doctype
controller's `set_missing_values()`, which is invoked from `validate()` and
therefore not yet run at the point of the pre-flight check. The result was
that valid payloads kept getting rejected with false positives like:

    Missing required fields: conversion_rate, price_list_currency, plc_conversion_rate

This patch removes the pre-flight check entirely and lets Frappe's own
`_validate_mandatory()` (which runs after `set_missing_values()`) be the
authority. `frappe.MandatoryError` is caught and translated into the same
structured response shape the pre-flight check used to produce, but with
genuine missing-field names (parsed from the exception payload). A
regression test against Quotation pins this: the previously-leaked fields
must never appear in `missing_fields`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…thpaul#165 follow-up)

fix(create_document): drop pre-flight reqd check, defer to Frappe validate() (buildswithpaul#165 follow-up)
…oryError handler

Regression caught in test review: `_, _, fields_part = error_msg.partition(": ")`
made `_` a function-local for the entire `execute()` body, so the later
`_("Document Creation Error")` call inside `frappe.log_error` raised
UnboundLocalError on paths that reached the generic `except Exception` branch
without first reaching the MandatoryError branch (e.g. a LinkValidationError
on an invalid Link reference).

Use `partition(...)[2]` instead of unpacking into named throwaways. Adds two
regression guards: one for the MandatoryError → structured-response path, one
for the generic exception path that proves `_(...)` is still callable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(create_document): don't shadow translation function `_` in MandatoryError handler
`preview_template` is whitelisted to any logged-in user, and stored
`Prompt Template` content can be authored by Assistant users — but the
three Jinja construction sites used a plain `jinja2.Environment`, which
exposes `__class__` / `__mro__` / `__subclasses__` and the full Python
class graph. A template like `{{ ''.__class__.__mro__[1].__subclasses__() }}`
would have executed under the rendering user.

All three sites (PromptTemplateManager._jinja_env, validate_template_syntax,
preview_template) now use `jinja2.sandbox.SandboxedEnvironment`, which
raises `SecurityError` on unsafe attribute lookups. Added regression test
asserting the canonical SSTI probe is blocked through both the manager
render path and the whitelisted preview endpoint.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…de save

`hooks.py` registered a `doc_events["*"]` handler for `on_update`,
`on_submit`, and `on_cancel`. The three callbacks all gated on
`should_log_document()`, which unconditionally returned False — so the
hook fired on every document save in every app on the bench, called a
function, ran a `startswith` check, and no-oped. Strictly performance
overhead with zero behaviour.

Removed the wildcard hook registration and the three dead callbacks
(`log_document_change`, `log_document_submit`, `log_document_cancel`)
plus `should_log_document`. The real audit-trail pipeline
(`log_tool_execution`) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… User"

The "All" role grants read access to every session including website
(guest) users. Prompt Category is internal assistant taxonomy; there's
no reason for unauthenticated visitors to enumerate it. Changed the
permissive read rule to "Assistant User", matching how the rest of the
prompt subsystem is gated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… audit

The Frappe Cloud marketplace audit's metadata regex flags any
occurrence of "architecture" in the rendered README description. The
README pointed at `docs/architecture/ARCHITECTURE.md` and
`docs/architecture/MCP_STREAMABLEHTTP_GUIDE.md`. Renamed the directory
to `docs/internals/`, renamed `ARCHITECTURE.md` to `INTERNALS.md`,
relabelled the README link as "Internals", and updated every
cross-reference in the docs tree so links don't 404.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tesseract alongside paddleocr and ollama. Reintroduces the
implementation removed in buildswithpaul#99 (commit 736b3fc).
- Use host_name from site config if available
- Force https:// when host_name is configured
- Fixes http -> https mismatch for production deployments
- Fixes buildswithpaul#156
@buildswithpaul
Copy link
Copy Markdown
Owner

Thanks for picking this up — I hit the same issue on a customer's GKE deployment (Gateway API in front, plain HTTP to the Frappe nginx pod) and your fix would have resolved it. A couple of suggestions before this lands:

1. The PR includes unrelated changes from a stale develop

The diff currently touches:

  • frappe_assistant_core/api/handlers/prompts.py and prompt_template.py (Jinja SandboxedEnvironment)
  • frappe_assistant_core/utils/audit_trail.py and hooks.py (audit-trail doc_events removal)
  • plugins/data_science/tools/extract_file_content.py (tesseract OCR backend)
  • assistant_core_settings.json (tesseract option) and prompt_category.json (role change)
  • pyproject.toml (chardet constraint)
  • docs/architecture/*docs/internals/* rename and all the doc cross-references
  • tests/test_prompt_template_sandbox.py (new test file)

These all correspond to commits already on upstream develop (3c1768a Jinja sandbox, fbcce43 audit log, f7d7200 tesseract, 3c63675 chardet, b480d4a docs rename). I think your branch was forked before those landed, so a git fetch upstream && git rebase upstream/develop should drop them all from the diff and leave just the OAuth normalization. PRs scoped to a single concern review much faster and bisect cleanly later.

2. The OAuth normalization itself — please extend it to all discovery endpoints, not just authorization_server_metadata

The current patch fixes /.well-known/oauth-authorization-server, but the same http-from-request-scheme drift exists in every other discovery endpoint in this file:

  • openid_configuration() (oauth_discovery.py:55) calls Frappe's openid_configuration first and then adds MCP keys via get_server_url(). The Frappe-inherited keys (issuer, authorization_endpoint, token_endpoint, revocation_endpoint, introspection_endpoint, userinfo_endpoint) come back with the request scheme, identical to the authorization_server_metadata problem. Today's PR doesn't touch this path, so /.well-known/openid-configuration will still return mixed http/https URLs after the fix.
  • _get_frappe_authorization_server_metadata() v15 fallback (line 154) — same get_server_url() issue.
  • mcp_discovery() (line 110) and protected_resource_metadata() (line 264) — both call get_server_url(). The http → https rewrite logic is missing here too.

Also a correctness concern with the current patch: metadata["issuer"] = frappe_url + "/" and the unconditional http → https rewrite are problematic for two reasons:

  • issuer MUST exactly match across all discovery endpoints per RFC 8414 §3 and per the OIDC discovery spec. If authorization_server_metadata returns https://example.com/ but openid_configuration (unpatched) returns http://example.com, OAuth clients that validate issuer consistency will reject the flow.
  • Unconditional http → https rewrite will break local dev on http://localhost:8000. It should be gated on something like frappe.conf.get("force_https") or the result of host_name being explicitly https://.

Suggested approach: one helper, used everywhere

Extract a single _get_public_base_url() and use it from every endpoint in this module. This guarantees issuer stays consistent across all four .well-known endpoints and the per-endpoint http:// problem can't drift back:

def _get_public_base_url() -> str:
    """
    Resolve the canonical public base URL for OAuth/MCP discovery metadata.

    Resolution order:
      1. site_config `host_name` — explicit operator override; used verbatim
         if it includes a scheme (`https://example.com`).
      2. `frappe.oauth.get_server_url()` — existing behavior (Social Login Key
         "frappe" base_url, then frappe.request.url).
      3. If `force_https` is set in site_config and the result still starts
         with `http://`, upgrade to `https://`. Gated so localhost dev still
         works.

    Trailing slash is stripped so callers can append `/api/method/...` safely.
    """
    host_name = frappe.conf.get("host_name")
    if host_name and "://" in host_name:
        base = host_name
    else:
        base = get_server_url()

    if frappe.conf.get("force_https") and base.startswith("http://"):
        base = "https://" + base[len("http://"):]

    return base.rstrip("/")

Then replace get_server_url() at every call site in this file — there are five:

And in both openid_configuration() (after frappe_openid_config()) and authorization_server_metadata() (after _get_frappe_authorization_server_metadata()), overwrite the six Frappe-inherited URL keys with values built from _get_public_base_url() — same list you have today, applied to both functions:

frappe_url = _get_public_base_url()
metadata["issuer"] = frappe_url
metadata["authorization_endpoint"] = f"{frappe_url}/api/method/frappe.integrations.oauth2.authorize"
metadata["token_endpoint"] = f"{frappe_url}/api/method/frappe.integrations.oauth2.get_token"
metadata["revocation_endpoint"] = f"{frappe_url}/api/method/frappe.integrations.oauth2.revoke_token"
metadata["introspection_endpoint"] = f"{frappe_url}/api/method/frappe.integrations.oauth2.introspect_token"
metadata["userinfo_endpoint"] = f"{frappe_url}/api/method/frappe.integrations.oauth2.openid_profile"

Note: I'd drop the trailing slash on issuer (frappe_url not frappe_url + "/") — RFC 8414 examples have no trailing slash, and the OIDC issuer comparison is byte-exact, so consistency across endpoints is what matters. Current Frappe core returns no trailing slash here too.

Why this matters for our deployment

In our case the underlying cause is gunicorn defaulting to --forwarded-allow-ips=127.0.0.1, so when the Frappe nginx pod forwards X-Forwarded-Proto: https from a non-loopback pod IP, gunicorn drops it and frappe.request.scheme stays http. This is the standard situation on any Helm-deployed Frappe where nginx and gunicorn are in separate pods. The Social Login Key "frappe" base_url workaround patches the bug for some endpoints but not all (since authorization_server_metadata doesn't consult it), which is exactly why a customer ended up at issue #156. Centralizing this in _get_public_base_url() makes the fix robust against that whole class of proxy-trust misconfigurations.

Happy to open a follow-up PR with the helper if that's easier than reworking this one — let me know which you prefer.

- Add _get_public_base_url() helper that respects host_name and force_https
- Replace get_server_url() at all 5 call sites in oauth_discovery.py
- Override Frappe-inherited URL keys in both openid_configuration() and
  authorization_server_metadata() to ensure issuer consistency
- Drop trailing slash on issuer per RFC 8414
- Fixes buildswithpaul#156
@rachana-promantia rachana-promantia force-pushed the fix/oauth-discovery-https-normalization branch from 4b0b595 to 190e961 Compare May 18, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants