PredictionIO is an open source machine learning framework.
Two apps are composed to make a basic PredictionIO service:
- Engine: a specialized machine learning app which provides training of a model and then queries against that model; generated from a template or custom code.
- Eventserver: a simple HTTP API app for capturing events to process from other systems; shareable between multiple engines.
This buildpack will deploy both of these apps: Engine when engine.json is present and otherwise Eventserver.
The limited resources of a single dyno restrict use of typically large, statistically significant datasets. Only Performance-L dynos with 14GB RAM (currently $16/day) provide reasonable utility in this configuration.
✏️ Throughout these docs, code terms that start with $ represent a value (shell variable) that should be replaced with a customized value, e.g $eventserver_name, $engine_name, $postgres_addon_id…
git clone https://github.com/heroku/predictionio-buildpack.git pio-eventserver
cd pio-eventserver
heroku create $eventserver_name
heroku addons:create heroku-postgresql:hobby-dev
heroku buildpacks:add -i 1 https://github.com/heroku/predictionio-buildpack.git
heroku buildpacks:add -i 2 heroku/scala- Note the Postgres add-on identifier, e.g.
postgresql-aerodynamic-00000; use it below in place of$postgres_addon_id - You may want to specify
heroku-postgresql:standard-0instead, because the freehobby-devdatabase is limited to 10,000 records.
We delay deployment until the database is ready.
heroku pg:wait && git push heroku masterSelect an engine from the gallery. Download a .tar.gz from Github and open/expand it on your local computer.
🚨 Avoid engines that persist their model to the filesystem, which is incompatible with the emphermeral filesystem of Heroku dynos. These engines must be modified to use Amazon S3 or the database for persistence.
cd into the engine's directory, and ensure it is a git repo:
git initheroku create $engine_name
heroku buildpacks:add -i 1 https://github.com/heroku/heroku-buildpack-jvm-common.git
heroku buildpacks:add -i 2 https://github.com/heroku/predictionio-buildpack.githeroku run 'pio app new $pio_app_name' -a $eventserver_name- This returns an access key for the app; use it below in place of
$pio_app_access_key.
Replace the Postgres ID & eventserver config values with those from above:
heroku addons:attach $postgres_addon_id
heroku config:set \
PIO_EVENTSERVER_HOSTNAME=$eventserver_name.herokuapp.com \
PIO_EVENTSERVER_PORT=80 \
PIO_EVENTSERVER_ACCESS_KEY=$pio_app_access_key \
PIO_EVENTSERVER_APP_NAME=$pio_app_name- See environment variables for config details.
Modify this file to make sure the appName parameter matches the app record created in the eventserver.
"datasource": {
"params" : {
"appName": "$pio_app_name"
}
}- If the
appNameparam is missing, you may need to upgrade the template.
🚨 Mandatory: data is required for training to succeed and then to serve predictive queries.
This step will vary based on the engine. Typically, a command formatted like the following, should be run locally:
python ./data/import_eventserver.py \
--url https://$eventserver_name.herokuapp.com \
--access_key $pio_app_access_key- check the engine's
data/directory for exact naming & format. pip install predictioniomay be required for the import script to run
git add .
git commit -m "Initial PIO engine"
git push heroku masterpio train will automatically run during release-phase of the Heroku app.
heroku run train
# You may need to revive the app from "crashed" state.
heroku restartOnce deployed, scale up the processes to avoid memory issues:
heroku ps:scale \
web=1:Performance-M \
release=0:Performance-L \
train=0:Performance-LPredictionIO provides an Evaluation mode for engines, which uses cross-validation to help select optimum engine parameters.
src/main/scala/Evaluation.scala support Evaluation mode.
To run evaluation on Heroku, ensure src/main/scala/Evaluation.scala references the engine's name through the environment. Check the source file to verify that appName is set to sys.env("PIO_EVENTSERVER_APP_NAME"). For example:
DataSourceParams(appName = sys.env("PIO_EVENTSERVER_APP_NAME"), evalK = Some(5))♻️ If that change was made, then commit, deploy, & re-train before proceeding.
Next, start a console & change to the engine's directory:
heroku run bash --size Performance-L
$ cd pio-engine/Then, start the process, specifying the evaluation & engine params classes from the Evaluation.scala source file. For example:
$ pio eval \
org.template.classification.AccuracyEvaluation \
org.template.classification.EngineParamsList \
-- --driver-class-path /app/lib/postgresql_jdbc.jar \
--executor-memory 10gOnce pio eval completes, still in the Heroku console, copy the contents of best.json:
$ cat best.json♻️ Paste into your local engine.json, commit, & deploy.
Engine deployments honor the following config vars:
PIO_OPTS-
options passed as
pio $opts -
example:
heroku config:set PIO_OPTS='--variant best.json'
-
PIO_SPARK_OPTS&PIO_TRAIN_SPARK_OPTS-
deploy & training options passed through to
spark-submit $opts -
example:
heroku config:set \ PIO_SPARK_OPTS='--executor-memory 1g' \ PIO_TRAIN_SPARK_OPTS='--executor-memory 10g'
-
PIO_EVENTSERVER_HOSTNAME$eventserver_name.herokuapp.com
PIO_EVENTSERVER_PORT- always
80for Heroku apps
- always
PIO_EVENTSERVER_APP_NAME&PIO_EVENTSERVER_ACCESS_KEY- generated by running
pio app new $pio_app_nameon the eventserver
- generated by running
PIO_TRAIN_ON_RELEASE- set
falseto disable automatic training - subsequent deploys will crash a deployed engine until it's retrained; use manual training
- set
pio commands that require DB access will need to have the driver specified as an argument (bug with PIO 0.9.5 + Spark 1.6.1):
pio $command -- --driver-class-path /app/lib/postgresql_jdbc.jarheroku run "cd pio-engine && pio $command -- --driver-class-path /app/lib/postgresql_jdbc.jar"Check engine status:
heroku run "cd pio-engine && pio status -- --driver-class-path /app/lib/postgresql_jdbc.jar"