Replace cpu-benchmark with similar stress-ng monte-carlo test#137
Replace cpu-benchmark with similar stress-ng monte-carlo test#137quantumsteve wants to merge 11 commits intomainfrom
Conversation
|
Hi @quantumsteve, it seems the tests for Dask are hanging. Could you please take a look at them? Thanks! |
|
Confirming that tests hang locally (not only on the GitHub runner). Let me know if you need help/guidance regarding this, as I implemented the testing infrastructure for translators/loggers. Oh, and other tests hang, so it's not specific to Dask, which is good as it means it should be easier to diagnose/fix. AND, some tests fail, like for the Bash translator, which should also be relatively easy to diagnose. All the testing code is in |
Yes, I'm struggling to run some tests locally. It looks like the containers are unable to write in my /tmp directory. Do I need to change permissions? |
The permissions should be fine, as I have dealt with that as well. I'll take a look today and let you know what I find out. Testing with Docker is pretty finecky due to users/permissions. |
One thing I noticed is that |
Ok, news. Connecting to the container and running wfbench my hand, not involving wfcommons at any of my test infrastructure hangs: That should be easy to diagnose, and I'll look at it soon-ish. |
|
Ok, so the culprit is |
|
@quantumsteve A fix is to call |
bin/wfbench
Outdated
| proc.wait() | ||
| if io_proc is not None and io_proc.is_alive(): | ||
| # io_proc.terminate() | ||
| io_proc.terminate() |
There was a problem hiding this comment.
I think calling terminate here is causing the io_proc to not finish.
There was a problem hiding this comment.
yes, but the problem is that if we call join() instead of terminate, then it seems to hang... so the I/O process isn't the kind of process that ever terminates perhaps? I'll inspect the code tomorrow Thursday when I have a minute.
the "* 1024 * 1024" to "* 1000 * 1000". The alternative was to document "MiB", but as far as I can tell none of the above layers use 1024. This said, _every time_ we've used any unit other than just byte, we've had errors. Perhaps we should change it all to "memory size in bytes" and that's it.
Signed-off-by: Steven Hahn <hahnse@ornl.gov>
First attempt at replacing cpu-benchmark with a nearly identical test in stress-ng.
differences
int32_t, while samples in cpu-benchmark areint64_tstress-ng launches multiple processes, which changes the logic for stopping parent and child processes. A quick search recommended psutil.