Skip to content

Conversation

@everettraven
Copy link
Contributor

Adds an enhancement proposal to outline how we can add support for generically fetching user identity information from external sources to expose as claims in the direct external OIDC feature.

The main motivator for designing this feature is to make it easier for our customers to use the direct external OIDC configuration to work with use cases where not all the identity information for users of a cluster are presented as claims in a JWT.

We are also intentionally trying to approach this in a way that enables us to potentially contribute this logic back to the upstream Structured Authentication Configuration feature.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 12, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Dec 12, 2025

@everettraven: This pull request references CNTRLPLANE-2201 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Adds an enhancement proposal to outline how we can add support for generically fetching user identity information from external sources to expose as claims in the direct external OIDC feature.

The main motivator for designing this feature is to make it easier for our customers to use the direct external OIDC configuration to work with use cases where not all the identity information for users of a cluster are presented as claims in a JWT.

We are also intentionally trying to approach this in a way that enables us to potentially contribute this logic back to the upstream Structured Authentication Configuration feature.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joepvd for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- method: GET
url:
type: Expression
expression: "\"https://graph.microsoft.com/v1.0/users/\" + claims.upn + \"/memberOf\""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a CEL expression to generate this URL? What about escaping? If a user has any control over a claim that might be used for this (like a chosen username), could they trick the authenticator into making a bogus request? Simplest might be best here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a CEL expression to generate this URL? ... Simplest might be best here

This is a good question and one I've been wrestling with a bit myself. My main motivation for this is specifically the Entra ID + Graph API flow.

In theory, an id token used against the KAS could be valid against an endpoint like https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group?$count=true to get my (i.e the requesting user) groups.

Realistically though, I think the semantics there make it a bit more difficult to achieve because the audience in the token needs to be accepted by both KAS and Microsoft's Graph API.

That's why for this particular use case I've gone with some kind of dynamic approach based on https://learn.microsoft.com/en-us/graph/api/user-list-memberof?view=graph-rest-1.0&tabs=http#request-body .

This doesn't necessarily have to be a CEL expression - it could be a Go template that better enables us to do escaping behind the scenes. I defaulted to CEL though because the rest of the structured authentication API uses CEL expressions in various places and users are likely to be familiar with CEL if they are manipulating this configuration.

What about escaping? If a user has any control over a claim that might be used for this (like a chosen username), could they trick the authenticator into making a bogus request?

This configuration is expected to managed by a cluster administrator and the expectation is that the claims are pulled from trusted sources. If a claim has been manipulated in the JWT used for authentication my expectation is that it would be rejected as an invalid token due to a signature mismatch.

I can certainly look into escaping logic further, and as I mentioned above, Go templates may be an option here.

Overall, I think it is unlikely that end-users manipulate the authenticator into making a bogus request but it certainly isn't impossible. Something we can continue to explore.

Copy link
Contributor Author

@everettraven everettraven Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this a bit more and there are a few paths that I thought of - I'm curious what you think:

  1. Add a CEL function / library for calling net/url.PathEscape(). This would mean that an expression with path escaping could look something like: "https://example.com/" + pathEscape(claims.upn) + "/extra/pathing". This would put the onus on the end-user to ensure they are doing this path escaping.
  2. Use a different API structure that uses Go templates for the url string with parameterization, use CEL expressions for the parameters. The raw strings from the CEL expressions would then be escaped. Something along the lines of:
    ...
    url:
      type: Parameterized
      parameterized:
        template: "https://example.com/{{ index . 0 }}/extra/pathing"
        parameters:
          - "claims.upn" 
  3. Use only Go templating. We would escape all possible values before passing to the template. Something along the lines of:
    ...
    url:
      type: GoTemplate
      goTemplate: "https://example.com/{{ .upn }}/extra/pathing"

My only concern with using a Go template based approach is that it becomes yet another language that an admin needs to understand to properly configure a dynamic URL.

I think if we can stay consistent with using a CEL expression it would make it easier overall on admins configuring this - even if they need to perform the escaping themselves (we could ensure a note in the documentation that warns about not properly escaping parameters and gives some insights as to when to use what escaping functionality).

EDIT: Another potential approach for CEL-based that removes the end-user need to do any escaping is to pre-escape the values passed to the CEL program, but I'm skeptical that pre-escaping is something we could reliably do on an end-user's behalf because escaping is different for path vs query strings.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is expected to managed by a cluster administrator and the expectation is that the claims are pulled from trusted sources. If a claim has been manipulated in the JWT used for authentication my expectation is that it would be rejected as an invalid token due to a signature mismatch.

Right, the issue wouldn't be forged claims, but values for authentic claims that exploit weaknesses in the template/expression.

I think we just need to make footguns hard to reach. A couple other ideas:

  • Use the CEL type system to make it impossible to fail to escape path elements. For example, require the expression's type to be "URL" (we would have to declare it) and the only means to construct a URL in the evaluation environment is some kind of builder API that accepts each path element as a string.

  • Make the URL part of the config API more granular, e.g. (for https://contoso.com/v1/users/{userid}/etc):

    scheme: https
    host: contoso.com
    pathElements:
    - type: string
      value: "v1"
    - type: String
      value: "users"
    - type: Claim
      claim: "userid"
    - type: String
      value: "etc"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been doing a bit more digging here with the help of Gemini to gather examples of how other projects handle something like this.

It came back with 3 examples - External Secrets Operator (ESO), Crossplane, and ArgoCD.

It looks like ESO uses Go templates, Crossplane uses Go string formatting expressions (i.e https://contoso.com/v1/users/%s/etc), and ArgoCD uses some kind of "override" pattern. I'm not really a fan of any of them and I'm not that happy with the structured path elements list either.

The direction I'm starting to lean is something like:

url:
  base: "https://contoso.com"
  pathExpression: "['v1', 'users', claims.upn, 'etc']"

pathExpression is a CEL expression that must result in a list of string values which will be joined using net/url.PathJoin() and each element of the string list will be escaped using net/url.PathEscape before being joined.

I think this is a good balance that keeps the only expression language CEL and reduces the structural complexity of building a dynamic URL while maintaining the ability to ensure we escape the values we are dynamically adding to the path.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that's try, why don't we just URL escape all of the claims before passing them into the CEL environment?

Wouldn't that be awkward if the expression does anything with the value of a claim besides directly inserting it into the resulting string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that be awkward if the expression does anything with the value of a claim besides directly inserting it into the resulting string?

Could be. I don't have any use cases for manipulating the value off the top of my head that would be inherently incompatible with this though.

I do like the simplicity of the current approach (direct string insert), but if we want to exercise an abundance of caution and do the path escaping after the fact - I think the list of strings return type is probably the right direction here. @JoelSpeed had mentioned some concern around the list syntax as shown in my example being a bit awkward for end-users, which I agree, but I'm not really seeing what I would consider a better option for an escaping action that happens after the fact.

Copy link
Contributor Author

@everettraven everettraven Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we really wanted to we could consider using a custom Path return type that we supply a helper path.Join("...", "...", ...) type functionality that path escapes each entry (I think Ben suggested something like this earlier), but I'm not a huge fan of requiring a fully custom return type here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the list of strings return type is probably the right direction here

Does that allow mutation, or just direct inserts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the list of strings return type is probably the right direction here

Does that allow mutation, or just direct inserts?

It would allow mutation of the claim values. As an example, you could do something like:

url:
  base: https://contoso.com
  pathExpression: "['v1', 'users', claims.upn.lower().replace('_', '-'), 'etc']"

CEL expressions will have to successfully compile and will be limited in their length to prevent excessive
compilation and run times.

2. Introduction of network latency to authentication

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the webhook authn plugin maintains a cache, too, which should help. Does there need to be any kind of size limitation on TokenReview responses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the webhook authn plugin maintains a cache, too, which should help

You're right, it looks like it does.

Does there need to be any kind of size limitation on TokenReview responses?

AFAICT, there doesn't look to be any kind of explicit limitation on TokenReview response sizes. I could be reading it wrong, but it looks like TokenReviews are virtual resources and don't get stored in etcd.

I think the biggest concern we will have here is response time and how long it takes to serialize a large list of groups.

IIRC, the KAS already enforces a strict timeout (I think 10s by default) on a response from the webhook so we could try to have some kind of internal limit that enables us to still fetch information, build user metadata, and serialize it in a TokenReview status in the response within a reasonable time frame.

In order to nail this limitation down exactly though, we'd probably need to build this out and do some performance testing to really get a solid handle on what we arbitrarily enforce as a limitation here.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
…n scenarios

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@everettraven everettraven force-pushed the auth/oidc-large-groups branch from f2bffd5 to aaecd36 Compare January 8, 2026 16:34
# to use to authenticate requests to the provided external claims sources.
# When set to Token, it will attempt to use a user-provided access token
# to authenticate requests to the provided external claims sources.
type: { RequestProvidedToken | ClientCredential | Token }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the flow of Token vs RequestProvidedToken? Not following the difference between these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestProvidedToken would be the token provided in the Authorization header of the request against that Kubernetes API server.

Token would be some static token provided in the configuration file given to the webhook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gothca, so this will have similar security concerns in the API to storing the client credentials.

Is Token a standard naming?

mappings:
# name is the name of the claim to be built.
# this name must be globally unique.
- name: groups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a need to have an option for replace/adding items if the output is a list? E.g. you have some groups in the token and want to expand that with a call to some API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do that, I'd say don't overwrite a claim you expect to already exist in the token.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't K8s expect groups passed to the RBAC via a specific claim? So do you have the option to add to the existing token with a different claim and still have the RBAC work correctly for collecting groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structured authentication configuration file is what tells the KAS how to map a token to a cluster identity.

You can specify a specific claim or a CEL expression for setting the groups of an identity. I don't think we have GA support for CEL expressions for mapping the groups values yet on OpenShift but that should be coming soon - and would be my suggestion for that use case.

The configuration would become:

  • New claim name for additional groups
  • CEL expression that uses claim in token + new claim name concatenation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structured authentication configuration file is what tells the KAS how to map a token to a cluster identity.

We have some supported config somewhere for users to configure this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is what the original ExternalOIDC feature this feature is based on enabled.

We have ongoing work to achieve parity with what the upstream configurations are where we deem it makes sense.

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@everettraven
Copy link
Contributor Author

@benluddy @JoelSpeed @liouk This is ready for another round of review whenever you have some time. Thanks!

Copy link
Contributor

@JoelSpeed JoelSpeed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from not being clear on merge vs overwrite behaviour, I'm good with this as it stands

Comment on lines 180 to 182
groups:
- system:authenticated
- foo # fetched from UserInfo endpoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies merging behaviour rather than replacement behaviour? I thought in another thread we concluded replacement behaviour was the intention?

Copy link
Contributor Author

@everettraven everettraven Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC system:authenticated is a group that is automatically added to every user that successfully authenticates to the cluster (i.e added out-of-band of our logic).

I could probably clarify that further with a comment.

- method: GET
url:
type: Expression
expression: "\"https://graph.microsoft.com/v1.0/users/\" + claims.upn + \"/memberOf\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm viewing an old version, this isn't updated at this point in the doc yet

Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 15, 2026

@everettraven: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

so that OpenShift can source user information not provided in the JWT issued
by my identity provider.

* As an OpenShift cluster administrator, I want to remove non-essential PII (like `email` and `full_name`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PII as an acronym might not be immediately obvious to the reader

claim: groups
externalClaims: # This is the proposed new field
clientAuth:
type: ClientCredential # Use the token from the request as the access token to the UserInfo endpoint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the same as for RequestProvidedToken, copy-paste leftover?

# When set to ClientCredential, it will attempt to use configured
# client-id and client-secret parameters to fetch an access token
# to use to authenticate requests to the provided external claims sources.
# When set to Token, it will attempt to use a static user-provided access token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# When set to Token, it will attempt to use a static user-provided access token
# When set to AccessToken, it will attempt to use a static user-provided access token

# When set to ClientCredential, it will attempt to use configured
# client-id and client-secret parameters to fetch an access token
# to use to authenticate requests to the provided external claims sources.
# When set to Token, it will attempt to use a static user-provided access token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# When set to Token, it will attempt to use a static user-provided access token
# When set to AccessToken, it will attempt to use a static user-provided access token

This component will be responsible for translating a token to user identity information
to be returned to the Kubernetes API server.

It will be constrained such that _only_ the Kubernetes API server can communicate with it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the Console obtain the extra information that the webhook fetches?

I am not entirely familiar with the implementation of external OIDC at the Console side, so I am not certain that this will be a problem, but AFAIK, the Console will redirect and get a token directly from the IDP upon login, and will use that to any subsequent requests to the kube-apiserver. This means that after obtaining the initial token, it won't have obtained any of the additional information.

I might be missing implementation details, but maybe the Console just needs to make an additional TokenReview call to the kube-apiserver right after login in order to get the full UserInfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, Console just cares about having a token and already issues a TokenReview.

With the proposed architecture, nothing would change. The TokenReview would just end up being passed to the webhook authenticator for resolution - in which case it can go fetch the external claim information for constructing the user identity that is returned in the TokenReview status.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants