Saturn Data Repository for Apache Spark

A secure and distributed storage service for DataFrames. Currently built on top of Apache Cassandra. Integrates with Apache Spark as a Spark Data Source and works across all support Spark programming languages (Java/Scala, Python, R).

Benefits

Like a Git repository for your DataFrames. Supports version history and managed/partitioned storage of DataFrames by storage container (called repositories) and name.

Example Use

In Scala:

val sc = SparkContext.getOrCreate()
val sqlContext = SqlContext.getOrCreate(sc)

// Load the products DataFrame from the repository - this will load the 'tip' revision
val products = sqlContext.read.
    .format("com.c12e.repository.saturn")
    .option("host", "localhost")
    .option("repository", "products")
    .option("name", "acme_products")
    .load()
    
// Create a new DataFrame - just containing Men's shirts
val shirts = products.filter("categoryNamePath LIKE 'Clothing & Accessories > Men > Shirts%'")

// Store the new DataFrame
shirts.write.format("com.c12e.repository.saturn")
    .option("host", "localhost")
    .option("repository", "products")
    .option("name", "acme_mens_shirts")
    .save()

Getting Started

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
project		project
src		src
.gitignore		.gitignore
.sbtopts		.sbtopts
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
sbt		sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Saturn Data Repository for Apache Spark

Benefits

Example Use

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Saturn Data Repository for Apache Spark

Benefits

Example Use

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages