This guide combines both core commands and advanced usage patterns for Feast β the open-source feature store for machine learning.
Use it for quick reference, project setup, or production design.
pip install feastTo install a specific version:
pip install feast==0.38.0feast init my_feature_repo
cd my_feature_repoThis creates:
feature_repo/β feature definitionsfeature_store.yamlβ config filedata/β sample data
| Concept | Description |
|---|---|
| Entity | Unique key identifying data rows (e.g. driver_id) |
| Feature View | Group of features from a common source |
| Feature Service | Bundle of features for model training/serving |
| Online Store | Low-latency serving store (Redis, DynamoDB, etc.) |
| Offline Store | Batch store for training (BigQuery, Parquet, etc.) |
| On-Demand Transform | Computed features in real-time during retrieval |
from datetime import timedelta
from feast import Entity, FeatureView, Field, FileSource
from feast.types import Float32, Int64
driver_entity = Entity(name="driver_id")
driver_stats_source = FileSource(
path="data/driver_stats.parquet",
timestamp_field="event_timestamp",
created_timestamp_column="created",
)
driver_stats_fv = FeatureView(
name="driver_stats",
entities=["driver_id"],
ttl=timedelta(days=1),
schema=[
Field(name="conv_rate", dtype=Float32),
Field(name="acc_rate", dtype=Float32),
Field(name="avg_daily_trips", dtype=Int64),
],
source=driver_stats_source,
)feast applyfeast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")or a fixed range:
feast materialize 2025-10-01T00:00:00 2025-10-18T00:00:00training_df = store.get_historical_features(
entity_df=entity_df,
features=[
"driver_stats:conv_rate",
"driver_stats:acc_rate",
"driver_stats:avg_daily_trips",
],
).to_df()feature_vector = store.get_online_features(
features=["driver_stats:conv_rate", "driver_stats:acc_rate"],
entity_rows=[{"driver_id": 1001}],
).to_dict()project: my_feature_repo
registry: data/registry.db
provider: local
online_store:
type: sqlite
path: data/online_store.db
offline_store:
type: fileBatch features come from Parquet, BigQuery, Snowflake, etc. You typically materialize them periodically.
Example (BigQuery):
offline_store:
type: bigquery
dataset: feast_offlineYou can write features directly to the online store using the Python API.
store.write_to_online_store(
feature_view_name="driver_stats",
data=[{"driver_id": 1001, "conv_rate": 0.85, "acc_rate": 0.9, "event_timestamp": datetime.utcnow()}],
)Or stream them using a service like Kafka β consumer β Feast online store.
Feast supports real-time computed features.
Example:
from feast import on_demand_feature_view, RequestSource
from feast.types import Float32
input_request = RequestSource(
name="inputs",
schema={"trip_distance": Float32, "trip_time": Float32},
)
@on_demand_feature_view(
sources=[driver_stats_fv, input_request],
schema=[Field(name="avg_speed", dtype=Float32)]
)
def compute_avg_speed(inputs):
return pd.DataFrame({"avg_speed": inputs["trip_distance"] / inputs["trip_time"]})Now you can request avg_speed alongside other features in online retrieval.
online_store:
type: redis
connection_string: "localhost:6379"offline_store:
type: bigquery
dataset: feast_datasetfrom feast import FileSource
driver_stats_source = FileSource(
path="s3://my-bucket/driver_stats.parquet",
timestamp_field="event_timestamp",
)You can also integrate S3 with Redshift or Athena for hybrid ingestion.
name: Feast CI
on: [push]
jobs:
feast:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.10"
- name: Install Feast
run: pip install feast
- name: Validate Feast repo
run: feast plan
- name: Apply changes
run: feast apply
- name: Materialize data
run: feast materialize-incremental $(date -u +"%Y-%m-%dT%H:%M:%S")You can also deploy your registry and online store as infrastructure (e.g., managed Redis/BigQuery) and run feast apply via a CI pipeline.
- Use pytest to validate feature definitions and schemas.
- Test
get_historical_features()outputs with sample data. - Mock online store writes to ensure format correctness.
- Track materialization latency and TTL freshness.
- Monitor feature drift using statistical checks on feature distributions.
- Log online/offline feature parity to detect inconsistencies.
- Store each feature repo version in Git.
- Use Feastβs built-in registry for lineage tracking.
- Pin registry snapshots to model versions for reproducibility.
- Use Feast with Tecton or Vertex AI Feature Store for managed scale.
- Employ Delta Lake or Iceberg tables for offline feature storage.
- Integrate Feast SDK in Airflow or Kubeflow pipelines for automation.
- Serve online features with Feast + FastAPI microservices.
| Command | Description |
|---|---|
feast init <repo> |
Initialize new repo |
feast apply |
Register entities/features |
feast plan |
Preview pending changes |
feast materialize |
Load data to online store |
feast serve |
Run a local feature server |
feast registry-dump |
Inspect feature registry |
- Use Redis or DynamoDB for low-latency online serving.
- Keep BigQuery/Snowflake as offline truth source.
- Automate materialization via Airflow or Prefect.
- Secure secrets and connections via environment variables.
- Monitor registry changes in CI/CD.
- Regularly validate online/offline feature parity.
Feast enables feature standardization, consistency, and scalability across ML systems.
With real-time ingestion, batch retrieval, and CI/CD integration, it bridges data engineering and ML operations.
Learn more:
π Feast Docs
π» GitHub Repository