Skip to content
This repository was archived by the owner on Apr 15, 2024. It is now read-only.

CognitiveScale/spark-saturn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Saturn Data Repository for Apache Spark

A secure and distributed storage service for DataFrames. Currently built on top of Apache Cassandra. Integrates with Apache Spark as a Spark Data Source and works across all support Spark programming languages (Java/Scala, Python, R).

Benefits

Like a Git repository for your DataFrames. Supports version history and managed/partitioned storage of DataFrames by storage container (called repositories) and name.

Example Use

In Scala:

val sc = SparkContext.getOrCreate()
val sqlContext = SqlContext.getOrCreate(sc)

// Load the products DataFrame from the repository - this will load the 'tip' revision
val products = sqlContext.read.
    .format("com.c12e.repository.saturn")
    .option("host", "localhost")
    .option("repository", "products")
    .option("name", "acme_products")
    .load()
    
// Create a new DataFrame - just containing Men's shirts
val shirts = products.filter("categoryNamePath LIKE 'Clothing & Accessories > Men > Shirts%'")

// Store the new DataFrame
shirts.write.format("com.c12e.repository.saturn")
    .option("host", "localhost")
    .option("repository", "products")
    .option("name", "acme_mens_shirts")
    .save()

Getting Started

TBD

About

A secure and distributed storage service for DataFrames.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors