YARN-11937. Yarn Proxy Behind a Reverse Proxy#8300
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
Thanks for the patch @K0K0V0K, LGTM! |
Description
When the Yarn Proxy is deployed behind a reverse proxy that is also used in application tracking URLs, the Yarn Proxy should redirect requests to that proxy instead of attempting to proxy them internally.
Use Case
Consider the following scenario:
• A user runs a Spark job.
• The Spark UI is hosted in the Spark History Server (SHS).
• Multiple SHS instances are deployed for high availability (HA).
• The tracking URL points to a Knox Gateway, which routes requests to the available SHS instances.
This setup ensures high availability for the tracking UI. If one SHS instance becomes unavailable, another can continue serving the UI.
Problem Statement
When the Knox Gateway forwards a user’s HTTP request to the Yarn Proxy, the Yarn Proxy attempts to proxy the request back to the Knox Gateway. However, this proxied request does not include the JWT token. As a result, Knox initiates authentication instead of forwarding the request to the appropriate SHS instance.
Proposed Solution
For security reasons, the JWT token must not be forwarded to the tracking URL. Therefore, when an application registers a tracking URL that includes a specific flag indicating that it is served behind a reverse proxy, the Yarn Proxy should redirect the user directly to the tracking URL instead of attempting to proxy the request internally.
Config
New config was created: yarn.web-proxy.redirect-flag
|
🎊 +1 overall
This message was automatically generated. |
|
Hi All, Created a follow up PR for this because the current implementation might not correctly identify this parameter when it is part of a larger query string containing multiple parameters. For example in the following URL: Currently the logic expects the query string to be exactly yarn_knox_proxy=true, it will fail to recognize the flag when other parameters (like doAs, parameter1, or parameter2) are present. The parameter detection should be improved to properly parse the query string and identify the yarn_knox_proxy=true key-value pair regardless of its position or the presence of other parameters. Could you please take a look? Thank you! |
Description of PR
When the Yarn Proxy is deployed behind a reverse proxy that is also used in application tracking URLs, the Yarn Proxy should redirect requests to that proxy instead of attempting to proxy them internally.
Use Case
Consider the following scenario:
• A user runs a Spark job.
• The Spark UI is hosted in the Spark History Server (SHS).
• Multiple SHS instances are deployed for high availability (HA).
• The tracking URL points to a Knox Gateway, which routes requests to the available SHS instances.
This setup ensures high availability for the tracking UI. If one SHS instance becomes unavailable, another can continue serving the UI.
Problem Statement
When the Knox Gateway forwards a user’s HTTP request to the Yarn Proxy, the Yarn Proxy attempts to proxy the request back to the Knox Gateway. However, this proxied request does not include the JWT token. As a result, Knox initiates authentication instead of forwarding the request to the appropriate SHS instance.
Proposed Solution
For security reasons, the JWT token must not be forwarded to the tracking URL. Therefore, when an application registers a tracking URL that includes a specific flag indicating that it is served behind a reverse proxy, the Yarn Proxy should redirect the user directly to the tracking URL instead of attempting to proxy the request internally.
Config
New config was created: yarn.web-proxy.redirect-flag
How was this patch tested?
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
If an AI tool was used:
where is the name of the AI tool used.
https://www.apache.org/legal/generative-tooling.html