Welcome to Polymo

Single pyspark API connector for all REST API's.

Welcome to Polymo

Declarative API ingestion wigh Pyspark. Uses the new Pyspark 4 custom data sources under the hood.

How does it work?

Define a config file manually or use the recommended, lightweight builder UI. Once you are happy with your config, all you need to do is register the Polymo reader and tell Spark where to find the config:

from pyspark.sql import SparkSession
from polymo import ApiReader

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

df = (
    spark.read.format("polymo")
    .option("config_path", "./config.yml")  # YAML you saved from the Builder
    .option("token", "YOUR_TOKEN")  # Only if the API needs one
    .load()
)

df.show()

Streaming works too:

spark.readStream.format("polymo")

Prefer everything in Python? Use the PolymoConfig model.

from pyspark.sql import SparkSession
from polymo import ApiReader, PolymoConfig

spark = SparkSession.builder.getOrCreate()
spark.dataSource.register(ApiReader)

jp_posts = PolymoConfig(
    base_url="https://jsonplaceholder.typicode.com",
    path="/posts",
)

df = (
    spark.read.format("polymo")
    .option("config_json", jp_posts.config_json())
    .load()
)
df.show()

Polymo reads in batches and can read pages in parallel. Therefore Polymo can be much faster than row based solutions like UDFs.

How to start?

Locally you probably want to install polymo along with the Builder UI:

pip install "polymo[builder]"

This comes with all UI deps such as pyspark

Running Polymo on a spark cluster usually doesn't require these UI deps. In that case, just install the bare minimum deps with

pip install polymo

Launch the builder UI

polymo builder

(Optional) Run the Builder in Docker

docker compose up --build builder

The service listens on port 8000; open http://localhost:8000 once Uvicorn reports it is running.

Where to Next

Read the docs here

Other material:

Step by step example: medium blogpost

Contributing

It's still early days, but Polymo already supports a lot of features! Is there something missing? Raise an issue or contribute!

Contributions and early feedback welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github/workflows		.github/workflows
builder-ui		builder-ui
docs		docs
examples		examples
notebooks		notebooks
src/polymo		src/polymo
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
mkdocs.yml		mkdocs.yml
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
playwright.human.config.ts		playwright.human.config.ts
postcss.config.js		postcss.config.js
pyproject.toml		pyproject.toml
tailwind.config.ts		tailwind.config.ts
tsconfig.base.json		tsconfig.base.json
tsconfig.json		tsconfig.json
uv.lock		uv.lock
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to Polymo

How does it work?

How to start?

Launch the builder UI

(Optional) Run the Builder in Docker

Where to Next

Contributing

About

Uh oh!

Releases 18

Packages

Languages

License

dan1elt0m/polymo

Folders and files

Latest commit

History

Repository files navigation

Welcome to Polymo

How does it work?

How to start?

Launch the builder UI

(Optional) Run the Builder in Docker

Where to Next

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Languages

Packages