-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38903] Update flink-shaded to 21.0 #27407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
flink-runtime/src/main/java/org/apache/flink/runtime/rest/handler/AbstractRestHandler.java
Outdated
Show resolved
Hide resolved
pom.xml
Outdated
| <groupId>io.netty</groupId> | ||
| <artifactId>netty-bom</artifactId> | ||
| <version>4.1.100.Final</version> | ||
| <version>4.2.6.Final</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update flink-shaded and bump netty here in different PRs?
Probably it will simplify the review part
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I think that's a great idea. I'll shave off this PR to only contain the flink-shaded upgrade, and create another PR for Netty-specific changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snuyanzin thank you for your reviews, I will create other PRs after this gets merged:
- Upgrade Netty
- Upgrade Pekko
I also experienced a few test failures after upgrading flink-shaded-netty, but I couldn't find out exactly what causes these. What I know is that they are all intermittent failures, caused by various race conditions. My suspicion is that some Netty internals have changed, as all these newly failing tests relied on subtle behavior and timing between components.
I don't think we'll have a successful CI like this, but I'll create PRs to fix each flaky tests.
flink-core/src/main/java/org/apache/flink/configuration/SecurityOptions.java
Show resolved
Hide resolved
2536488 to
0e93417
Compare
|
based on trace from the recent failure https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=71641&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=1ffc5ec2-7913-50ff-0177-3fca16f1b8f0&l=78731 |
| "checkpointed_size" : { | ||
| "type" : "object", | ||
| "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:util.stats:StatsSummaryDto", | ||
| "id" : "urn:jsonschema:org:apache:flink:runtime:rest:messages:util:stats:StatsSummaryDto", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this related to flink-shaded upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is generated by flink-shaded-jackson-module-jsonSchema, and I had to re-generate it because RuntimeRestAPIStabilityTest was failing otherwise, instructing me to re-generate this file.
According to the test, the API was modified in a compatible way with this upgrade.
| messageHeaders.getResponseStatusCode(), | ||
| responseHeaders)); | ||
| responseHeaders), | ||
| ctx.executor()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@snuyanzin the issue was this stuck thread:
Jan 14 21:15:14 at org.apache.flink.core.testutils.BlockerSync.awaitBlocker(BlockerSync.java:59)
Jan 14 21:15:14 - locked <0x00000000aa141c48> (a java.lang.Object)
Jan 14 21:15:14 at org.apache.flink.runtime.rest.RestServerEndpointITCase.testShouldWaitForHandlersWhenClosing(RestServerEndpointITCase.java:597)
Something has changed internally in Netty, and this test consistently times out on Azure CI. The issue arises when the response is not written from the event loop, and Netty enqueues the write operations on the event loop instead, and then we close the REST server itself (which is the point of the test), the writes don't always go through properly.
By running the write operations on the event loop, this issue goes away. I think it's better to do it this way anyways, so that Netty does not have to create 3 write tasks to eventually execute on the event loop thread.
Nevermind, unfortunately this only solved the issue on my own computer, but the CI still failed at the exact same steps. I'll continue digging...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nevermind, this is the same issue as before, I was on wrong branch locally. Unfortunately keeping the Netty version as-is did not help get rid of this.
If you try to run this test suite locally, you will see these errors: Received fatal alert: certificate_unknown. This is because starting from Netty 4.2, SSL hostname verification is on by default. Setting security.ssl.verify-hostname has NO effect, because this is not used anywhere. So I suggest to change its default value to false, and use it to set SSL hostname verification, as that would allow us to have the same behavior as before.
The reason you don't see any errors in the CI is that this error causes one of the tests to hang indefinitely, so no output will be shown of the errors.
…EventLoop thread" This reverts commit af71b5a.
What is the purpose of the change
Update flink-shaded to 21.0, with the biggest change being the new Netty version.
Brief change log
rest_api_v1.snapshotVerifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving): noDocumentation