-
Notifications
You must be signed in to change notification settings - Fork 9
JPERF-811: Fix flaky shouldTolerateEarlyFinish test by waiting until … #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -56,7 +56,8 @@ class SshTest { | |
| SshContainer().useSsh { sshHost -> | ||
| installPing(sshHost) | ||
|
|
||
| val fail = sshHost.runInBackground("nonexistent-command") | ||
| val fail = sshHost.runInBackground("nonexistant-command") | ||
| sshHost.waitForAllProcessesToFinish("nonexistant-command", Duration.ofMillis(100)) | ||
| val failResult = fail.stop(Duration.ofMillis(20)) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So this fails due to We can see that
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW. It seems arbitrary to have both
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I traced it down to ancient times behind the wall: https://bulldog.internal.atlassian.com/browse/JPT-292
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I already had the fixes, but I was checking the effect of each commit via rebase (to populate the CHANGELOG correctly). And it turns out that |
||
|
|
||
| Assert.assertEquals(127, failResult.exitStatus) | ||
|
|
@@ -66,4 +67,11 @@ class SshTest { | |
| private fun installPing(sshHost: Ssh) { | ||
| sshHost.newConnection().use { it.execute("apt-get update -qq && apt-get install iputils-ping -y") } | ||
| } | ||
|
|
||
| private fun Ssh.waitForAllProcessesToFinish( | ||
| processCommand: String, | ||
| timeout: Duration | ||
| ) = this.newConnection().use { | ||
| it.execute("wait `pgrep '$processCommand'`", timeout) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably a bit of an overkill, but I don't have a better idea how we can wait for the process to finish. It's also a shotgun, because we are waiting not only for the one process we started, but also all of other processes that might be named the same. I think we don't expect there to be any other processes like that anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the chance of other processes with the same name is minimal, and this solution should be sufficient. |
||
| } | ||
| } | ||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the test shows that the
runInBackgroundAPI cannot be used easily. We should fix the problem in the API rather than in tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could wait for the process to start before returning from
runInBackgroundsimilarly to how my solution proposal does it in this test, however this approach is very system specific (usage ofwaitandpgrep). I don't know how we could do that in more portable way.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be to just return some specific
SshConnection.SshResultas part ofBackgroundProcess.stopwhen we don't really get theexitStatus, however I don't know what would that be and we need to return something if we want to maintain the current API ofBackgroundProcessThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dagguh I'd like to fix the flakiness of the test. This is my main goal.
My understanding of possible implementations of your suggestion is to either make it less portable or break the API (see my 2 previous comments). I don't like any of those options and I'm a bit stuck with this PR. Do you have any other ideas how this could be fixed? If not then maybe you have opinion about which of those 2 is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pczuj why are we afraid to break the API and properly apply semantic versioning?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can just break the API, it's a tool in our belt.
It can require some extra effort. Let's say we release
ssh:3.0.0. Theinfrastructureis asshconsumer:It will have a choice:
ssh:[2.3.0,3.0.0)- stay on oldssh, this is the default state. It takes no effort, but what was the point ofssh:3.0.0, if no consumer needs it?ssh:[2.3.0,4.0.0)- use the new, without dropping the old. Only possible if the removed API was not used.ssh:[3.0.0,4.0.0)- use the new, drop the old. This means that all consumers ofinfrastructure, which are still onssh:[2.0.0,3.0.0), will no longer be compatible. Therefore that bump is breakage ofinfrastructure(its POM contract), which is a another major release. This bump would cascade to multiple libs:aws-infrastructureandjira-performance-tests, even if they don't usesshdirectly.We used scenario 3 many times in the past. It's a bit annoying, but not that scary.
We successfully used scenario 2 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean we have to break API. I meant: the tests found a real flakiness pain, let's fix it for the consumers as well.
I would avoid it too. Not only does it lose generality, but also more moving parts: brittle and complex.