-
Notifications
You must be signed in to change notification settings - Fork 535
CNTRLPLANE-2201: (auth): EP for generic external claims sourcing #1907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CNTRLPLANE-2201: (auth): EP for generic external claims sourcing #1907
Conversation
|
@everettraven: This pull request references CNTRLPLANE-2201 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
| - method: GET | ||
| url: | ||
| type: Expression | ||
| expression: "\"https://graph.microsoft.com/v1.0/users/\" + claims.upn + \"/memberOf\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a CEL expression to generate this URL? What about escaping? If a user has any control over a claim that might be used for this (like a chosen username), could they trick the authenticator into making a bogus request? Simplest might be best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a CEL expression to generate this URL? ... Simplest might be best here
This is a good question and one I've been wrestling with a bit myself. My main motivation for this is specifically the Entra ID + Graph API flow.
In theory, an id token used against the KAS could be valid against an endpoint like https://graph.microsoft.com/v1.0/me/transitiveMemberOf/microsoft.graph.group?$count=true to get my (i.e the requesting user) groups.
Realistically though, I think the semantics there make it a bit more difficult to achieve because the audience in the token needs to be accepted by both KAS and Microsoft's Graph API.
That's why for this particular use case I've gone with some kind of dynamic approach based on https://learn.microsoft.com/en-us/graph/api/user-list-memberof?view=graph-rest-1.0&tabs=http#request-body .
This doesn't necessarily have to be a CEL expression - it could be a Go template that better enables us to do escaping behind the scenes. I defaulted to CEL though because the rest of the structured authentication API uses CEL expressions in various places and users are likely to be familiar with CEL if they are manipulating this configuration.
What about escaping? If a user has any control over a claim that might be used for this (like a chosen username), could they trick the authenticator into making a bogus request?
This configuration is expected to managed by a cluster administrator and the expectation is that the claims are pulled from trusted sources. If a claim has been manipulated in the JWT used for authentication my expectation is that it would be rejected as an invalid token due to a signature mismatch.
I can certainly look into escaping logic further, and as I mentioned above, Go templates may be an option here.
Overall, I think it is unlikely that end-users manipulate the authenticator into making a bogus request but it certainly isn't impossible. Something we can continue to explore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this a bit more and there are a few paths that I thought of - I'm curious what you think:
- Add a CEL function / library for calling
net/url.PathEscape(). This would mean that an expression with path escaping could look something like:"https://example.com/" + pathEscape(claims.upn) + "/extra/pathing". This would put the onus on the end-user to ensure they are doing this path escaping. - Use a different API structure that uses Go templates for the url string with parameterization, use CEL expressions for the parameters. The raw strings from the CEL expressions would then be escaped. Something along the lines of:
... url: type: Parameterized parameterized: template: "https://example.com/{{ index . 0 }}/extra/pathing" parameters: - "claims.upn"
- Use only Go templating. We would escape all possible values before passing to the template. Something along the lines of:
... url: type: GoTemplate goTemplate: "https://example.com/{{ .upn }}/extra/pathing"
My only concern with using a Go template based approach is that it becomes yet another language that an admin needs to understand to properly configure a dynamic URL.
I think if we can stay consistent with using a CEL expression it would make it easier overall on admins configuring this - even if they need to perform the escaping themselves (we could ensure a note in the documentation that warns about not properly escaping parameters and gives some insights as to when to use what escaping functionality).
EDIT: Another potential approach for CEL-based that removes the end-user need to do any escaping is to pre-escape the values passed to the CEL program, but I'm skeptical that pre-escaping is something we could reliably do on an end-user's behalf because escaping is different for path vs query strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration is expected to managed by a cluster administrator and the expectation is that the claims are pulled from trusted sources. If a claim has been manipulated in the JWT used for authentication my expectation is that it would be rejected as an invalid token due to a signature mismatch.
Right, the issue wouldn't be forged claims, but values for authentic claims that exploit weaknesses in the template/expression.
I think we just need to make footguns hard to reach. A couple other ideas:
-
Use the CEL type system to make it impossible to fail to escape path elements. For example, require the expression's type to be "URL" (we would have to declare it) and the only means to construct a URL in the evaluation environment is some kind of builder API that accepts each path element as a string.
-
Make the URL part of the config API more granular, e.g. (for https://contoso.com/v1/users/{userid}/etc):
scheme: https host: contoso.com pathElements: - type: string value: "v1" - type: String value: "users" - type: Claim claim: "userid" - type: String value: "etc"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been doing a bit more digging here with the help of Gemini to gather examples of how other projects handle something like this.
It came back with 3 examples - External Secrets Operator (ESO), Crossplane, and ArgoCD.
It looks like ESO uses Go templates, Crossplane uses Go string formatting expressions (i.e https://contoso.com/v1/users/%s/etc), and ArgoCD uses some kind of "override" pattern. I'm not really a fan of any of them and I'm not that happy with the structured path elements list either.
The direction I'm starting to lean is something like:
url:
base: "https://contoso.com"
pathExpression: "['v1', 'users', claims.upn, 'etc']"pathExpression is a CEL expression that must result in a list of string values which will be joined using net/url.PathJoin() and each element of the string list will be escaped using net/url.PathEscape before being joined.
I think this is a good balance that keeps the only expression language CEL and reduces the structural complexity of building a dynamic URL while maintaining the ability to ensure we escape the values we are dynamically adding to the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming that's try, why don't we just URL escape all of the claims before passing them into the CEL environment?
Wouldn't that be awkward if the expression does anything with the value of a claim besides directly inserting it into the resulting string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that be awkward if the expression does anything with the value of a claim besides directly inserting it into the resulting string?
Could be. I don't have any use cases for manipulating the value off the top of my head that would be inherently incompatible with this though.
I do like the simplicity of the current approach (direct string insert), but if we want to exercise an abundance of caution and do the path escaping after the fact - I think the list of strings return type is probably the right direction here. @JoelSpeed had mentioned some concern around the list syntax as shown in my example being a bit awkward for end-users, which I agree, but I'm not really seeing what I would consider a better option for an escaping action that happens after the fact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really wanted to we could consider using a custom Path return type that we supply a helper path.Join("...", "...", ...) type functionality that path escapes each entry (I think Ben suggested something like this earlier), but I'm not a huge fan of requiring a fully custom return type here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the list of strings return type is probably the right direction here
Does that allow mutation, or just direct inserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the list of strings return type is probably the right direction here
Does that allow mutation, or just direct inserts?
It would allow mutation of the claim values. As an example, you could do something like:
url:
base: https://contoso.com
pathExpression: "['v1', 'users', claims.upn.lower().replace('_', '-'), 'etc']"
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
| CEL expressions will have to successfully compile and will be limited in their length to prevent excessive | ||
| compilation and run times. | ||
|
|
||
| 2. Introduction of network latency to authentication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the webhook authn plugin maintains a cache, too, which should help. Does there need to be any kind of size limitation on TokenReview responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the webhook authn plugin maintains a cache, too, which should help
You're right, it looks like it does.
Does there need to be any kind of size limitation on TokenReview responses?
AFAICT, there doesn't look to be any kind of explicit limitation on TokenReview response sizes. I could be reading it wrong, but it looks like TokenReviews are virtual resources and don't get stored in etcd.
I think the biggest concern we will have here is response time and how long it takes to serialize a large list of groups.
IIRC, the KAS already enforces a strict timeout (I think 10s by default) on a response from the webhook so we could try to have some kind of internal limit that enables us to still fetch information, build user metadata, and serialize it in a TokenReview status in the response within a reasonable time frame.
In order to nail this limitation down exactly though, we'd probably need to build this out and do some performance testing to really get a solid handle on what we arbitrarily enforce as a limitation here.
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
…n scenarios Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
f2bffd5 to
aaecd36
Compare
| # to use to authenticate requests to the provided external claims sources. | ||
| # When set to Token, it will attempt to use a user-provided access token | ||
| # to authenticate requests to the provided external claims sources. | ||
| type: { RequestProvidedToken | ClientCredential | Token } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the flow of Token vs RequestProvidedToken? Not following the difference between these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RequestProvidedToken would be the token provided in the Authorization header of the request against that Kubernetes API server.
Token would be some static token provided in the configuration file given to the webhook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gothca, so this will have similar security concerns in the API to storing the client credentials.
Is Token a standard naming?
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Outdated
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Outdated
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Outdated
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
| mappings: | ||
| # name is the name of the claim to be built. | ||
| # this name must be globally unique. | ||
| - name: groups |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a need to have an option for replace/adding items if the output is a list? E.g. you have some groups in the token and want to expand that with a call to some API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do that, I'd say don't overwrite a claim you expect to already exist in the token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't K8s expect groups passed to the RBAC via a specific claim? So do you have the option to add to the existing token with a different claim and still have the RBAC work correctly for collecting groups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structured authentication configuration file is what tells the KAS how to map a token to a cluster identity.
You can specify a specific claim or a CEL expression for setting the groups of an identity. I don't think we have GA support for CEL expressions for mapping the groups values yet on OpenShift but that should be coming soon - and would be my suggestion for that use case.
The configuration would become:
- New claim name for additional groups
- CEL expression that uses claim in token + new claim name concatenation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structured authentication configuration file is what tells the KAS how to map a token to a cluster identity.
We have some supported config somewhere for users to configure this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that is what the original ExternalOIDC feature this feature is based on enabled.
We have ongoing work to achieve parity with what the upstream configurations are where we deem it makes sense.
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Outdated
Show resolved
Hide resolved
enhancements/authentication/external-oidc-additional-identity-information-sources.md
Show resolved
Hide resolved
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
|
@benluddy @JoelSpeed @liouk This is ready for another round of review whenever you have some time. Thanks! |
JoelSpeed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from not being clear on merge vs overwrite behaviour, I'm good with this as it stands
| groups: | ||
| - system:authenticated | ||
| - foo # fetched from UserInfo endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies merging behaviour rather than replacement behaviour? I thought in another thread we concluded replacement behaviour was the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC system:authenticated is a group that is automatically added to every user that successfully authenticates to the cluster (i.e added out-of-band of our logic).
I could probably clarify that further with a comment.
| - method: GET | ||
| url: | ||
| type: Expression | ||
| expression: "\"https://graph.microsoft.com/v1.0/users/\" + claims.upn + \"/memberOf\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm viewing an old version, this isn't updated at this point in the doc yet
Signed-off-by: Bryce Palmer <bpalmer@redhat.com>
|
@everettraven: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| so that OpenShift can source user information not provided in the JWT issued | ||
| by my identity provider. | ||
|
|
||
| * As an OpenShift cluster administrator, I want to remove non-essential PII (like `email` and `full_name`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: PII as an acronym might not be immediately obvious to the reader
| claim: groups | ||
| externalClaims: # This is the proposed new field | ||
| clientAuth: | ||
| type: ClientCredential # Use the token from the request as the access token to the UserInfo endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment the same as for RequestProvidedToken, copy-paste leftover?
| # When set to ClientCredential, it will attempt to use configured | ||
| # client-id and client-secret parameters to fetch an access token | ||
| # to use to authenticate requests to the provided external claims sources. | ||
| # When set to Token, it will attempt to use a static user-provided access token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # When set to Token, it will attempt to use a static user-provided access token | |
| # When set to AccessToken, it will attempt to use a static user-provided access token |
| # When set to ClientCredential, it will attempt to use configured | ||
| # client-id and client-secret parameters to fetch an access token | ||
| # to use to authenticate requests to the provided external claims sources. | ||
| # When set to Token, it will attempt to use a static user-provided access token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # When set to Token, it will attempt to use a static user-provided access token | |
| # When set to AccessToken, it will attempt to use a static user-provided access token |
| This component will be responsible for translating a token to user identity information | ||
| to be returned to the Kubernetes API server. | ||
|
|
||
| It will be constrained such that _only_ the Kubernetes API server can communicate with it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will the Console obtain the extra information that the webhook fetches?
I am not entirely familiar with the implementation of external OIDC at the Console side, so I am not certain that this will be a problem, but AFAIK, the Console will redirect and get a token directly from the IDP upon login, and will use that to any subsequent requests to the kube-apiserver. This means that after obtaining the initial token, it won't have obtained any of the additional information.
I might be missing implementation details, but maybe the Console just needs to make an additional TokenReview call to the kube-apiserver right after login in order to get the full UserInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, Console just cares about having a token and already issues a TokenReview.
With the proposed architecture, nothing would change. The TokenReview would just end up being passed to the webhook authenticator for resolution - in which case it can go fetch the external claim information for constructing the user identity that is returned in the TokenReview status.
Adds an enhancement proposal to outline how we can add support for generically fetching user identity information from external sources to expose as claims in the direct external OIDC feature.
The main motivator for designing this feature is to make it easier for our customers to use the direct external OIDC configuration to work with use cases where not all the identity information for users of a cluster are presented as claims in a JWT.
We are also intentionally trying to approach this in a way that enables us to potentially contribute this logic back to the upstream Structured Authentication Configuration feature.