This is a project configured to allow us to start building reproducible models and associated documentation. This stage in the project contains everything up to sampling.
Bits and pieces in use in this project:
- Packrat
- Git
- Travis-CI
- Makefile
Lets use Titanic data!
- Install PASWR package from CRAN
- Install RSQLite package from CRAN
- Make a titanic db in
data-raw
library(DBI)
library(RSQLite)
titanicdb<-dbConnect(SQLite(),dbname="data-raw/titanic.sqlite")- Add titanic data to the db
library(PASWR)
dbWriteTable(titanicdb, "titanic", titanic3, overwrite=TRUE)- Get data
- Clean it
- Sample it
- Process the features
- GLM all features
- GLM two features
- glmnet - regularized GLM
- gbm - multiple GLM which fit different segments of data
- Compare metrics for all models