Skip to content

dan1elt0m/polymo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Polymo

Single pyspark API connector for all REST API's.

test docs PyPI - Python Version

Welcome to Polymo

Declarative API ingestion wigh Pyspark. Uses the new Pyspark 4 custom data sources under the hood.

Polymo Builder UI - connector preview screen

How does it work?

Define a config file manually or use the recommended, lightweight builder UI. Once you are happy with your config, all you need to do is register the Polymo reader and tell Spark where to find the config:

from pyspark.sql import SparkSession
from polymo import ApiReader

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

df = (
    spark.read.format("polymo")
    .option("config_path", "./config.yml")  # YAML you saved from the Builder
    .option("token", "YOUR_TOKEN")  # Only if the API needs one
    .load()
)

df.show()

Streaming works too:

spark.readStream.format("polymo")

Prefer everything in Python? Use the PolymoConfig model.

from pyspark.sql import SparkSession
from polymo import ApiReader, PolymoConfig

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

jp_posts = PolymoConfig(
    base_url="https://jsonplaceholder.typicode.com",
    path="/posts",
)

df = (
    spark.read.format("polymo")
    .option("config_json", jp_posts.config_json())
    .load()
)
df.show()

Polymo reads in batches and can read pages in parallel. Therefore Polymo can be much faster than row based solutions like UDFs.

How to start?

Locally you probably want to install polymo along with the Builder UI:

pip install "polymo[builder]"

This comes with all UI deps such as pyspark

Running Polymo on a spark cluster usually doesn't require these UI deps. In that case, just install the bare minimum deps with

pip install polymo

Launch the builder UI

polymo builder

(Optional) Run the Builder in Docker

docker compose up --build builder

Where to Next

Read the docs here

Other material:

Contributing

It's still early days, but Polymo already supports a lot of features! Is there something missing? Raise an issue or contribute!

Contributions and early feedback welcome!