Conversation
Script collects version of driver, toolkit and tensorflow in a way that doesn't require communication with the card
|
Starting this PR to start the discussion about another attempt at building a script for a build level smoke test for gpu functionality. The committed script We could then have a JSON file of tested version combinations that this script would compare against and raise a warning if the current combination isn't on the list. This would again be run by config_R_cuda.sh like our last attempt did. It adds the maintenance burden of keeping up the JSON file, but may help to catch issues during the build. I'll also work on another PR that will have an Ansible playbook to spin up an EC2 gpu instance, run the stand alone gpu test script in tests/gpu/misc/test-gpu.sh, and then shut down the EC2 instance. This could be used to test new conbinations and populate the above mentioned JSON file. Ideally the playbook would be triggered by a manual GitHub action. How does this sound? |
eitsupi
left a comment
There was a problem hiding this comment.
I created the scripts/tests directory before, do you think test-config-cuda.sh would be better placed under the directory?
And, I was wondering about a few details about scripts, so I commented on them. If they are inappropriate, please ignore them.
scripts/test-config-cuda.sh
Outdated
| echo $VERSION_TOOLKIT | ||
|
|
||
| # tensorflow | ||
| if ! VERSION_TF_OUTPUT=`python -c 'import tensorflow as tf; print(tf.__version__)' 2>&1`; |
There was a problem hiding this comment.
$(...) may be preferable to `...`. https://github.com/koalaman/shellcheck/wiki/SC2006
There was a problem hiding this comment.
Thanks @eitsupi! I made the change as you suggested in my latest commit. I didn't realize backticks were deprecated. I'll try to use parentheses going forward.
tests/gpu/misc/examples_tf.R
Outdated
| @@ -0,0 +1,36 @@ | |||
|
|
|||
| ## Tensorflow: | |||
| install.packages('keras', repos='http://cran.us.r-project.org') | |||
There was a problem hiding this comment.
Is there any reason to use a US CRAN mirror?
There was a problem hiding this comment.
Should we need this line at all since tensorflow is installed in the containers by scripts/install_tensorflow.sh?
When I attempt to run exampls_tf.R without it though I get the following error:
Error in library(keras) : there is no package called ‘keras’
Execution halted
Running library(tensorflow) in R gives a similar error message.
Running scripts/test-config-cuda.sh reports the correct tensorflow version, but it's checking the version from Python with python -c 'import tensorflow as tf; print(tf.__version__)'.
Do I need to add the path where tensorflow is installed by scripts/install_tensorflow.sh somewhere so R sees it?
There was a problem hiding this comment.
What I wanted to point out is that the repos argument may simply be unnecessary.
There was a problem hiding this comment.
Yes @eitsupi, you're right. It worked fine when I removed repos. Thanks! I've committed that change.
It still seems like it shouldn't need to install that though. I tried adding the path /opt/venv/reticulate with the following code, but I got the same error:
old_path <- Sys.getenv("PATH")
Sys.setenv(PATH = paste(old_path, "/opt/venv/reticulate", sep = ":"))
Is it fine to install keras here or does the fact that I need to install it indicate that there's an issue with the tensorflow install?
Thanks eitsupi!
and added command line argument checking
|
I have a question that may have gotten lost in a comment above. Is it ok to install keras here (scripts/tests/examples_tf.R) or does the fact that I need to install it indicate that there's an issue with the tensorflow install? It seems like keras should already be installed, but I get an error if I just run |
|
I think it makes sense to install keras here, as the python package are not installed in the base image (see install_tensorflow.sh) I've tried to move away from the idea of a 'global' venv with python modules pre-installed, since in my experience different users / different projects often want their own versions of all the modules. I think we need to remove the multi-user venv setup created by setting |
|
Oh, I see. That makes sense to me now. Thank you for explaining @cboettig! I think this PR is ready to go. Please let me know if you'd like any other changes. |
|
@noamross can you give this another look over too? |
Script collects version of driver, toolkit and tensorflow in a way that doesn't require communication with the card