Skip to content

AbhaySingh71/Feast-Feature-Store

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Feast Feature Store β€” Complete Guide

This guide combines both core commands and advanced usage patterns for Feast β€” the open-source feature store for machine learning.
Use it for quick reference, project setup, or production design.


πŸ”§ 1. Install Feast

pip install feast

To install a specific version:

pip install feast==0.38.0

πŸ“ 2. Initialize a New Feature Repository

feast init my_feature_repo
cd my_feature_repo

This creates:

  • feature_repo/ β†’ feature definitions
  • feature_store.yaml β†’ config file
  • data/ β†’ sample data

🧱 3. Core Concepts

Concept Description
Entity Unique key identifying data rows (e.g. driver_id)
Feature View Group of features from a common source
Feature Service Bundle of features for model training/serving
Online Store Low-latency serving store (Redis, DynamoDB, etc.)
Offline Store Batch store for training (BigQuery, Parquet, etc.)
On-Demand Transform Computed features in real-time during retrieval

🧩 4. Example: Entities & Feature Views

from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64

driver_entity = Entity(name="driver_id")

driver_stats_source = FileSource(
    path="data/driver_stats.parquet",
    timestamp_field="event_timestamp",
    created_timestamp_column="created",
)

driver_stats_fv = FeatureView(
    name="driver_stats",
    entities=["driver_id"],
    ttl=timedelta(days=1),
    schema=[
        Field(name="conv_rate", dtype=Float32),
        Field(name="acc_rate", dtype=Float32),
        Field(name="avg_daily_trips", dtype=Int64),
    ],
    source=driver_stats_source,
)

πŸ’Ύ 5. Apply Repository

feast apply

πŸ“€ 6. Materialize Data to Online Store

feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

or a fixed range:

feast materialize 2025-10-01T00:00:00 2025-10-18T00:00:00

πŸ” 7. Retrieve Features

Historical Features (for training)

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_stats:conv_rate",
        "driver_stats:acc_rate",
        "driver_stats:avg_daily_trips",
    ],
).to_df()

Online Serving

feature_vector = store.get_online_features(
    features=["driver_stats:conv_rate", "driver_stats:acc_rate"],
    entity_rows=[{"driver_id": 1001}],
).to_dict()

βš™οΈ 8. Configuration Example (feature_store.yaml)

project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
  type: sqlite
  path: data/online_store.db
offline_store:
  type: file

⚑ 9. Real-Time & Batch Feature Ingestion

Batch Ingestion (Offline)

Batch features come from Parquet, BigQuery, Snowflake, etc. You typically materialize them periodically.

Example (BigQuery):

offline_store:
  type: bigquery
  dataset: feast_offline

Real-Time Ingestion (Online)

You can write features directly to the online store using the Python API.

store.write_to_online_store(
    feature_view_name="driver_stats",
    data=[{"driver_id": 1001, "conv_rate": 0.85, "acc_rate": 0.9, "event_timestamp": datetime.utcnow()}],
)

Or stream them using a service like Kafka β†’ consumer β†’ Feast online store.


πŸ” 10. On-Demand Feature Transformations

Feast supports real-time computed features.

Example:

from feast import on_demand_feature_view, RequestSource
from feast.types import Float32

input_request = RequestSource(
    name="inputs",
    schema={"trip_distance": Float32, "trip_time": Float32},
)

@on_demand_feature_view(
    sources=[driver_stats_fv, input_request],
    schema=[Field(name="avg_speed", dtype=Float32)]
)
def compute_avg_speed(inputs):
    return pd.DataFrame({"avg_speed": inputs["trip_distance"] / inputs["trip_time"]})

Now you can request avg_speed alongside other features in online retrieval.


🧩 11. Using Redis, BigQuery, and AWS S3

Redis (Online Store)

online_store:
  type: redis
  connection_string: "localhost:6379"

BigQuery (Offline Store)

offline_store:
  type: bigquery
  dataset: feast_dataset

AWS S3 (Offline via File Source)

from feast import FileSource
driver_stats_source = FileSource(
    path="s3://my-bucket/driver_stats.parquet",
    timestamp_field="event_timestamp",
)

You can also integrate S3 with Redshift or Athena for hybrid ingestion.


πŸ”„ 12. CI/CD Automation with Feast

Example GitHub Actions Workflow

name: Feast CI
on: [push]

jobs:
  feast:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.10"
      - name: Install Feast
        run: pip install feast
      - name: Validate Feast repo
        run: feast plan
      - name: Apply changes
        run: feast apply
      - name: Materialize data
        run: feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")

You can also deploy your registry and online store as infrastructure (e.g., managed Redis/BigQuery) and run feast apply via a CI pipeline.


πŸ§ͺ 13. Testing, Monitoring, and Versioning

Testing Features

  • Use pytest to validate feature definitions and schemas.
  • Test get_historical_features() outputs with sample data.
  • Mock online store writes to ensure format correctness.

Monitoring

  • Track materialization latency and TTL freshness.
  • Monitor feature drift using statistical checks on feature distributions.
  • Log online/offline feature parity to detect inconsistencies.

Versioning

  • Store each feature repo version in Git.
  • Use Feast’s built-in registry for lineage tracking.
  • Pin registry snapshots to model versions for reproducibility.

πŸ“Š 14. Advanced Tips

  • Use Feast with Tecton or Vertex AI Feature Store for managed scale.
  • Employ Delta Lake or Iceberg tables for offline feature storage.
  • Integrate Feast SDK in Airflow or Kubeflow pipelines for automation.
  • Serve online features with Feast + FastAPI microservices.

🧰 15. Common CLI Commands

Command Description
feast init <repo> Initialize new repo
feast apply Register entities/features
feast plan Preview pending changes
feast materialize Load data to online store
feast serve Run a local feature server
feast registry-dump Inspect feature registry

βœ… 16. Production Best Practices

  • Use Redis or DynamoDB for low-latency online serving.
  • Keep BigQuery/Snowflake as offline truth source.
  • Automate materialization via Airflow or Prefect.
  • Secure secrets and connections via environment variables.
  • Monitor registry changes in CI/CD.
  • Regularly validate online/offline feature parity.

🏁 Summary

Feast enables feature standardization, consistency, and scalability across ML systems.
With real-time ingestion, batch retrieval, and CI/CD integration, it bridges data engineering and ML operations.

Learn more:
πŸ“˜ Feast Docs
πŸ’» GitHub Repository

About

Feast

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors