Heroku buildpack for PredictionIO

PredictionIO is an open source machine learning framework.

Two apps are composed to make a basic PredictionIO service:

Engine: a specialized machine learning app which provides training of a model and then queries against that model; generated from a template or custom code.
Eventserver: a simple HTTP API app for capturing events to process from other systems; shareable between multiple engines.

This buildpack will deploy both of these apps: Engine when engine.json is present and otherwise Eventserver.

The limited resources of a single dyno restrict use of typically large, statistically significant datasets. Only Performance-L dynos with 14GB RAM (currently $16/day) provide reasonable utility in this configuration.

Docs 📚

✏️ Throughout these docs, code terms that start with $ represent a value (shell variable) that should be replaced with a customized value, e.g $eventserver_name, $engine_name, $postgres_addon_id…

Eventserver
1. Create the eventserver
2. Deploy the eventserver
Engine
Training
- Automatic training
- Manual training
Scale-up
Evaluation
Configuration
- Environment variables
Running commands

Eventserver

Create the eventserver

git clone https://github.com/heroku/predictionio-buildpack.git pio-eventserver
cd pio-eventserver

heroku create $eventserver_name
heroku addons:create heroku-postgresql:hobby-dev
heroku buildpacks:add -i 1 https://github.com/heroku/predictionio-buildpack.git
heroku buildpacks:add -i 2 heroku/scala

Note the Postgres add-on identifier, e.g. postgresql-aerodynamic-00000; use it below in place of $postgres_addon_id
You may want to specify heroku-postgresql:standard-0 instead, because the free hobby-dev database is limited to 10,000 records.

Deploy the eventserver

We delay deployment until the database is ready.

heroku pg:wait && git push heroku master

Engine

Select an engine from the gallery. Download a .tar.gz from Github and open/expand it on your local computer.

🚨 Avoid engines that persist their model to the filesystem, which is incompatible with the emphermeral filesystem of Heroku dynos. These engines must be modified to use Amazon S3 or the database for persistence.

Create an engine

cd into the engine's directory, and ensure it is a git repo:

git init

Create a Heroku app for the engine

heroku create $engine_name
heroku buildpacks:add -i 1 https://github.com/heroku/heroku-buildpack-jvm-common.git
heroku buildpacks:add -i 2 https://github.com/heroku/predictionio-buildpack.git

Create a PredictionIO app in the eventserver

heroku run 'pio app new $pio_app_name' -a $eventserver_name

This returns an access key for the app; use it below in place of $pio_app_access_key.

Configure the Heroku app to use the eventserver

Replace the Postgres ID & eventserver config values with those from above:

heroku addons:attach $postgres_addon_id
heroku config:set \
  PIO_EVENTSERVER_HOSTNAME=$eventserver_name.herokuapp.com \
  PIO_EVENTSERVER_PORT=80 \
  PIO_EVENTSERVER_ACCESS_KEY=$pio_app_access_key \
  PIO_EVENTSERVER_APP_NAME=$pio_app_name

See environment variables for config details.

Update `engine.json`

Modify this file to make sure the appName parameter matches the app record created in the eventserver.

  "datasource": {
    "params" : {
      "appName": "$pio_app_name"
    }
  }

If the appName param is missing, you may need to upgrade the template.

Import data

🚨 Mandatory: data is required for training to succeed and then to serve predictive queries.

This step will vary based on the engine. Typically, a command formatted like the following, should be run locally:

python ./data/import_eventserver.py \
  --url https://$eventserver_name.herokuapp.com \
  --access_key $pio_app_access_key

check the engine's data/ directory for exact naming & format.
pip install predictionio may be required for the import script to run

Deploy to Heroku

git add .
git commit -m "Initial PIO engine"
git push heroku master

Training

Automatic training

pio train will automatically run during release-phase of the Heroku app.

Manual training

heroku run train

# You may need to revive the app from "crashed" state.
heroku restart

Scale up

Once deployed, scale up the processes to avoid memory issues:

heroku ps:scale \
  web=1:Performance-M \
  release=0:Performance-L \
  train=0:Performance-L

Evaluation

PredictionIO provides an Evaluation mode for engines, which uses cross-validation to help select optimum engine parameters.

⚠️ Only engines that contain src/main/scala/Evaluation.scala support Evaluation mode.

Changes required for evaluation

To run evaluation on Heroku, ensure src/main/scala/Evaluation.scala references the engine's name through the environment. Check the source file to verify that appName is set to sys.env("PIO_EVENTSERVER_APP_NAME"). For example:

DataSourceParams(appName = sys.env("PIO_EVENTSERVER_APP_NAME"), evalK = Some(5))

♻️ If that change was made, then commit, deploy, & re-train before proceeding.

Perform evaluation

Next, start a console & change to the engine's directory:

heroku run bash --size Performance-L
$ cd pio-engine/

Then, start the process, specifying the evaluation & engine params classes from the Evaluation.scala source file. For example:

$ pio eval \
    org.template.classification.AccuracyEvaluation \
    org.template.classification.EngineParamsList  \
    -- --driver-class-path /app/lib/postgresql_jdbc.jar \
      --executor-memory 10g

Re-deploy best parameters

Once pio eval completes, still in the Heroku console, copy the contents of best.json:

$ cat best.json

♻️ Paste into your local engine.json, commit, & deploy.

Configuration

Environment variables

Engine deployments honor the following config vars:

PIO_OPTS
- options passed as pio $opts
- see: pio command reference
- example:
```
heroku config:set PIO_OPTS='--variant best.json'
```
PIO_SPARK_OPTS & PIO_TRAIN_SPARK_OPTS
- deploy & training options passed through to spark-submit $opts
- see: spark-submit reference
- example:
```
heroku config:set \
  PIO_SPARK_OPTS='--executor-memory 1g' \
  PIO_TRAIN_SPARK_OPTS='--executor-memory 10g'
```
PIO_EVENTSERVER_HOSTNAME
- $eventserver_name.herokuapp.com
PIO_EVENTSERVER_PORT
- always 80 for Heroku apps
PIO_EVENTSERVER_APP_NAME & PIO_EVENTSERVER_ACCESS_KEY
- generated by running pio app new $pio_app_name on the eventserver
PIO_TRAIN_ON_RELEASE
- set false to disable automatic training
- subsequent deploys will crash a deployed engine until it's retrained; use manual training

Running commands

pio commands that require DB access will need to have the driver specified as an argument (bug with PIO 0.9.5 + Spark 1.6.1):

pio $command -- --driver-class-path /app/lib/postgresql_jdbc.jar

To run directly with Heroku CLI

heroku run "cd pio-engine && pio $command -- --driver-class-path /app/lib/postgresql_jdbc.jar"

Useful commands

Check engine status:

heroku run "cd pio-engine && pio status -- --driver-class-path /app/lib/postgresql_jdbc.jar"

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.profile.d		.profile.d
bin		bin
config		config
project		project
.gitignore		.gitignore
Procfile		Procfile
Procfile-engine		Procfile-engine
README.md		README.md
app.json		app.json
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heroku buildpack for PredictionIO

Docs 📚

Eventserver

Create the eventserver

Deploy the eventserver

Engine

Create an engine

Create a Heroku app for the engine

Create a PredictionIO app in the eventserver

Configure the Heroku app to use the eventserver

Update `engine.json`

Import data

Deploy to Heroku

Training

Automatic training

Manual training

Scale up

Evaluation

Changes required for evaluation

Perform evaluation

Re-deploy best parameters

Configuration

Environment variables

Running commands

To run directly with Heroku CLI

Useful commands

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heroku buildpack for PredictionIO

Docs 📚

Eventserver

Create the eventserver

Deploy the eventserver

Engine

Create an engine

Create a Heroku app for the engine

Create a PredictionIO app in the eventserver

Configure the Heroku app to use the eventserver

Update engine.json

Import data

Deploy to Heroku

Training

Automatic training

Manual training

Scale up

Evaluation

Changes required for evaluation

Perform evaluation

Re-deploy best parameters

Configuration

Environment variables

Running commands

To run directly with Heroku CLI

Useful commands

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Update `engine.json`

Packages