- Tune Spark configuration parameters in a hands-off manner
- Learn from tuning experiences over time to:
- Tune more efficiently over time,
- Answer counterfactual questions about application performance, and
- Suggest interventions to improve application performance (potentially even code changes or environment updates apart from configuration setting)
This package assumes that Apache Spark is installed, and the following environment
variables have already been set: SPARK_HOME, and, optionally, HADOOP_CONF_DIR.
See dev-README.md for details.
All Python dependencies are listed in:
requirements.txtbuild.gradlesetup.py
./gradlew clean buildto download and install all dependencies from scratch, and run tests../gradlew flake8to lint for style issues../gradlew pytestto run tests../gradlew build -x getRequirementsto install all dependencies (assumes they've already been downloaded).
Some interesting build artifacts are:
build/deployable/bin/sparktunerbuild/deployable/bin/sparktuner.pexbuild/distributions/sparktuner-0.1.0.tar.gzbuild/wheel-cache/sparktuner-0.1.0-py2-none-any.whl
The Python virtual environment resides in build/venv, and can be activated using
source build/venv/bin/activate and deactivated using deactivate.
To see usage information, ./build/deployable/bin/sparktuner --help
Sample commands:
-
build/deployable/bin/sparktuner --no-dups --name sartre_spark_sortre --path ../../sparkScala/sort/build/libs/sort-0.1-all.jar --deploy_mode client --master "local[*]" --class com.umayrh.sort.Main --spark_parallelism "1,10" --program_conf "10000 /tmp/sparktuner_sort" -
build/deployable/bin/sparktuner --no-dups --name sartre_spark_sortre --path ../../sparkScala/sort/build/libs/sort-0.1-all.jar --deploy_mode client --master "local[*]" --class com.umayrh.sort.Main --executor_memory "50mb,1gb" --program_conf "10000 /tmp/sparktuner_sort" -
build/deployable/bin/sparktuner --no-dups --name sartre_spark_sortre --path ../../sparkScala/sort/build/libs/sort-0.1-all.jar --deploy_mode client --master "local[*]" --class com.umayrh.sort.Main --driver_memory "1GB,6GB" --program_conf "1000000 /tmp/sparktuner_sort"