Record of data investigation, experimentation, and other thoughts are in my Lab Notebook. All relevant Java source code is in src/main/java.
The project can be built with Gradle, specifically gradle fatJar will create a jar file that can be used with Spark. See my Lab Notebook for relevant spark-submit commands.