🚀 Feast Feature Store — Complete Guide

This guide combines both core commands and advanced usage patterns for Feast — the open-source feature store for machine learning.
Use it for quick reference, project setup, or production design.

🔧 1. Install Feast

pip install feast

To install a specific version:

pip install feast==0.38.0

📁 2. Initialize a New Feature Repository

feast init my_feature_repo
cd my_feature_repo

This creates:

feature_repo/ → feature definitions
feature_store.yaml → config file
data/ → sample data

🧱 3. Core Concepts

Concept	Description
Entity	Unique key identifying data rows (e.g. `driver_id`)
Feature View	Group of features from a common source
Feature Service	Bundle of features for model training/serving
Online Store	Low-latency serving store (Redis, DynamoDB, etc.)
Offline Store	Batch store for training (BigQuery, Parquet, etc.)
On-Demand Transform	Computed features in real-time during retrieval

🧩 4. Example: Entities & Feature Views

from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64

driver_entity = Entity(name="driver_id")

driver_stats_source = FileSource(
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

driver_stats_fv = FeatureView(
    name="driver_stats",
    entities=["driver_id"],
    ttl=timedelta(days=1),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    source=driver_stats_source,
)

💾 5. Apply Repository

feast apply

📤 6. Materialize Data to Online Store

feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

or a fixed range:

feast materialize 2025-10-01T00:00:00 2025-10-18T00:00:00

🔍 7. Retrieve Features

Historical Features (for training)

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_stats:conv_rate",
        "driver_stats:acc_rate",
        "driver_stats:avg_daily_trips",
    ],
).to_df()

Online Serving

feature_vector = store.get_online_features(
    features=["driver_stats:conv_rate", "driver_stats:acc_rate"],
    entity_rows=[{"driver_id": 1001}],
).to_dict()

⚙️ 8. Configuration Example (`feature_store.yaml`)

project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db
offline_store:
  type: file

⚡ 9. Real-Time & Batch Feature Ingestion

Batch Ingestion (Offline)

Batch features come from Parquet, BigQuery, Snowflake, etc. You typically materialize them periodically.

Example (BigQuery):

offline_store:
  type: bigquery
  dataset: feast_offline

Real-Time Ingestion (Online)

You can write features directly to the online store using the Python API.

store.write_to_online_store(
    feature_view_name="driver_stats",
    data=[{"driver_id": 1001, "conv_rate": 0.85, "acc_rate": 0.9, "event_timestamp": datetime.utcnow()}],
)

Or stream them using a service like Kafka → consumer → Feast online store.

🔁 10. On-Demand Feature Transformations

Feast supports real-time computed features.

Example:

from feast import on_demand_feature_view, RequestSource
from feast.types import Float32

input_request = RequestSource(
    name="inputs",
    schema={"trip_distance": Float32, "trip_time": Float32},
)

@on_demand_feature_view(
    sources=[driver_stats_fv, input_request],
    schema=[Field(name="avg_speed", dtype=Float32)]
)
def compute_avg_speed(inputs):
    return pd.DataFrame({"avg_speed": inputs["trip_distance"] / inputs["trip_time"]})

Now you can request avg_speed alongside other features in online retrieval.

🧩 11. Using Redis, BigQuery, and AWS S3

Redis (Online Store)

online_store:
  type: redis
  connection_string: "localhost:6379"

BigQuery (Offline Store)

offline_store:
  type: bigquery
  dataset: feast_dataset

AWS S3 (Offline via File Source)

from feast import FileSource
driver_stats_source = FileSource(
    path="s3://my-bucket/driver_stats.parquet",
    timestamp_field="event_timestamp",
)

You can also integrate S3 with Redshift or Athena for hybrid ingestion.

🔄 12. CI/CD Automation with Feast

Example GitHub Actions Workflow

name: Feast CI
on: [push]

jobs:
  feast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - name: Install Feast
        run: pip install feast
      - name: Validate Feast repo
        run: feast plan
      - name: Apply changes
        run: feast apply
      - name: Materialize data
        run: feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

You can also deploy your registry and online store as infrastructure (e.g., managed Redis/BigQuery) and run feast apply via a CI pipeline.

🧪 13. Testing, Monitoring, and Versioning

Testing Features

Use pytest to validate feature definitions and schemas.
Test get_historical_features() outputs with sample data.
Mock online store writes to ensure format correctness.

Monitoring

Track materialization latency and TTL freshness.
Monitor feature drift using statistical checks on feature distributions.
Log online/offline feature parity to detect inconsistencies.

Versioning

Store each feature repo version in Git.
Use Feast’s built-in registry for lineage tracking.
Pin registry snapshots to model versions for reproducibility.

📊 14. Advanced Tips

Use Feast with Tecton or Vertex AI Feature Store for managed scale.
Employ Delta Lake or Iceberg tables for offline feature storage.
Integrate Feast SDK in Airflow or Kubeflow pipelines for automation.
Serve online features with Feast + FastAPI microservices.

🧰 15. Common CLI Commands

Command	Description
`feast init <repo>`	Initialize new repo
`feast apply`	Register entities/features
`feast plan`	Preview pending changes
`feast materialize`	Load data to online store
`feast serve`	Run a local feature server
`feast registry-dump`	Inspect feature registry

✅ 16. Production Best Practices

Use Redis or DynamoDB for low-latency online serving.
Keep BigQuery/Snowflake as offline truth source.
Automate materialization via Airflow or Prefect.
Secure secrets and connections via environment variables.
Monitor registry changes in CI/CD.
Regularly validate online/offline feature parity.

🏁 Summary

Feast enables feature standardization, consistency, and scalability across ML systems.
With real-time ingestion, batch retrieval, and CI/CD integration, it bridges data engineering and ML operations.

Learn more:
📘 Feast Docs
💻 GitHub Repository

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
my_project		my_project
.gitignore		.gitignore
Driver_Ranking_demo-feastl.ipynb		Driver_Ranking_demo-feastl.ipynb
Fraud_Detection_feast.ipynb		Fraud_Detection_feast.ipynb
feast_cheatsheet.md		feast_cheatsheet.md
readme.md		readme.md

Folders and files

Latest commit

History

Repository files navigation

🚀 Feast Feature Store — Complete Guide

🔧 1. Install Feast

📁 2. Initialize a New Feature Repository

🧱 3. Core Concepts

🧩 4. Example: Entities & Feature Views

💾 5. Apply Repository

📤 6. Materialize Data to Online Store

🔍 7. Retrieve Features

Historical Features (for training)

Online Serving

⚙️ 8. Configuration Example (feature_store.yaml)

⚡ 9. Real-Time & Batch Feature Ingestion

Batch Ingestion (Offline)

Real-Time Ingestion (Online)

🔁 10. On-Demand Feature Transformations

🧩 11. Using Redis, BigQuery, and AWS S3

Redis (Online Store)

BigQuery (Offline Store)

AWS S3 (Offline via File Source)

🔄 12. CI/CD Automation with Feast

Example GitHub Actions Workflow

🧪 13. Testing, Monitoring, and Versioning

Testing Features

Monitoring

Versioning

📊 14. Advanced Tips

🧰 15. Common CLI Commands

✅ 16. Production Best Practices

🏁 Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⚙️ 8. Configuration Example (`feature_store.yaml`)

Packages