Skip to content

YARN-11937. Yarn Proxy Behind a Reverse Proxy#8300

Merged
brumi1024 merged 1 commit intoapache:trunkfrom
K0K0V0K:YARN-11937
Mar 10, 2026
Merged

YARN-11937. Yarn Proxy Behind a Reverse Proxy#8300
brumi1024 merged 1 commit intoapache:trunkfrom
K0K0V0K:YARN-11937

Conversation

@K0K0V0K
Copy link
Contributor

@K0K0V0K K0K0V0K commented Mar 5, 2026

Description of PR

When the Yarn Proxy is deployed behind a reverse proxy that is also used in application tracking URLs, the Yarn Proxy should redirect requests to that proxy instead of attempting to proxy them internally.

Use Case
Consider the following scenario:
• A user runs a Spark job.
• The Spark UI is hosted in the Spark History Server (SHS).
• Multiple SHS instances are deployed for high availability (HA).
• The tracking URL points to a Knox Gateway, which routes requests to the available SHS instances.

This setup ensures high availability for the tracking UI. If one SHS instance becomes unavailable, another can continue serving the UI.

Problem Statement
When the Knox Gateway forwards a user’s HTTP request to the Yarn Proxy, the Yarn Proxy attempts to proxy the request back to the Knox Gateway. However, this proxied request does not include the JWT token. As a result, Knox initiates authentication instead of forwarding the request to the appropriate SHS instance.

Proposed Solution
For security reasons, the JWT token must not be forwarded to the tracking URL. Therefore, when an application registers a tracking URL that includes a specific flag indicating that it is served behind a reverse proxy, the Yarn Proxy should redirect the user directly to the tracking URL instead of attempting to proxy the request internally.

Config
New config was created: yarn.web-proxy.redirect-flag

How was this patch tested?

  • UT was created
  • Deployed a cluster with YARN, SPARK, KNOX and checked it there

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

AI Tooling

If an AI tool was used:

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ patch 0m 20s #8300 does not apply to trunk. Rebase required? Wrong Branch? See https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute for help.
Subsystem Report/Notes
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8300/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@p-szucs
Copy link
Contributor

p-szucs commented Mar 6, 2026

Thanks for the patch @K0K0V0K, LGTM!

Description
When the Yarn Proxy is deployed behind a reverse proxy that is also used in application tracking URLs, the Yarn Proxy should redirect requests to that proxy instead of attempting to proxy them internally.

Use Case
Consider the following scenario:
    •   A user runs a Spark job.
    •   The Spark UI is hosted in the Spark History Server (SHS).
    •   Multiple SHS instances are deployed for high availability (HA).
    •   The tracking URL points to a Knox Gateway, which routes requests to the available SHS instances.

This setup ensures high availability for the tracking UI. If one SHS instance becomes unavailable, another can continue serving the UI.

Problem Statement
When the Knox Gateway forwards a user’s HTTP request to the Yarn Proxy, the Yarn Proxy attempts to proxy the request back to the Knox Gateway. However, this proxied request does not include the JWT token. As a result, Knox initiates authentication instead of forwarding the request to the appropriate SHS instance.

Proposed Solution
For security reasons, the JWT token must not be forwarded to the tracking URL. Therefore, when an application registers a tracking URL that includes a specific flag indicating that it is served behind a reverse proxy, the Yarn Proxy should redirect the user directly to the tracking URL instead of attempting to proxy the request internally.

Config
New config was created: yarn.web-proxy.redirect-flag
Copy link
Member

@brumi1024 brumi1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @K0K0V0K for the patch and @p-szucs for the review. Merging to trunk.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 24s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 11s Maven dependency ordering for branch
+1 💚 mvninstall 27m 56s trunk passed
+1 💚 compile 4m 11s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 3m 51s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 28s trunk passed
+1 💚 mvnsite 1m 36s trunk passed
+1 💚 javadoc 1m 53s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 28s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 17s trunk passed
+1 💚 shadedclient 17m 51s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for patch
+1 💚 mvninstall 1m 1s the patch passed
+1 💚 compile 3m 47s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 3m 47s the patch passed
+1 💚 compile 3m 47s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 3m 47s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 19s /results-checkstyle-hadoop-yarn-project_hadoop-yarn.txt hadoop-yarn-project/hadoop-yarn: The patch generated 1 new + 186 unchanged - 0 fixed = 187 total (was 186)
+1 💚 mvnsite 1m 31s the patch passed
+1 💚 javadoc 1m 22s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 25s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 3m 50s the patch passed
+1 💚 shadedclient 17m 57s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 0m 40s hadoop-yarn-api in the patch passed.
+1 💚 unit 3m 54s hadoop-yarn-common in the patch passed.
+1 💚 unit 1m 5s hadoop-yarn-server-web-proxy in the patch passed.
+1 💚 asflicense 0m 32s The patch does not generate ASF License warnings.
111m 19s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8300/3/artifact/out/Dockerfile
GITHUB PR #8300
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint
uname Linux f829e3db7be6 5.15.0-164-generic #174-Ubuntu SMP Fri Nov 14 20:25:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 3ee560a
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8300/3/testReport/
Max. process+thread count 616 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy U: hadoop-yarn-project/hadoop-yarn
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8300/3/console
versions git=2.43.0 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@brumi1024 brumi1024 merged commit 8b45c32 into apache:trunk Mar 10, 2026
4 checks passed
@roczei
Copy link

roczei commented Mar 19, 2026

Hi All,

Created a follow up PR for this because the current implementation might not correctly identify this parameter when it is part of a larger query string containing multiple parameters. For example in the following URL:

...?parameter1=true&parameter2=true&yarn_knox_proxy=true&doAs=user

Currently the logic expects the query string to be exactly yarn_knox_proxy=true, it will fail to recognize the flag when other parameters (like doAs, parameter1, or parameter2) are present.

The parameter detection should be improved to properly parse the query string and identify the yarn_knox_proxy=true key-value pair regardless of its position or the presence of other parameters.

Could you please take a look? Thank you!

#8357

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants