Jano is a Python library for defining temporal partitions and backtesting schemes over time-correlated datasets.
The missing layer between ML models and production temporal validation.
Documentation: marmurar.github.io/jano
It is designed for cases where a plain train_test_split() is not enough: transactional data, production simulations, repeated retraining, walk-forward validation, model monitoring, rule evaluation, or any experiment where the ordering of time matters.
The core accepts pandas.DataFrame, numpy.ndarray and polars.DataFrame inputs through a unified API. Jano keeps native pandas, NumPy and Polars paths for partition planning when that is safe, and falls back to pandas materialization for reporting and user-facing slices.
The project is named after Janus, the Roman god of beginnings, transitions and thresholds. That framing fits the library well: Jano helps define how a dataset moves from training periods into evaluation periods, fold after fold.
Jano also ships an optional local MCP server so AI agents can use the library through a small, explicit tool surface instead of generating Python ad hoc.
Current MCP tools:
preview_local_datasetplan_walk_forward_simulationrun_walk_forward_simulationrun_walk_forward_baseline_model
Install it in a Python 3.10+ environment:
python -m pip install "jano[mcp]"Run it locally over stdio:
jano-mcpOr use the module entrypoint:
python -m jano.mcp_serverExample MCP client configuration:
{
"mcpServers": {
"jano": {
"command": "jano-mcp"
}
}
}The MCP layer is intentionally opinionated: it exposes planning, walk-forward simulation and simple baseline-model execution first, while the full Python library remains available when you need custom composition.
This is meant for MCP-aware coding assistants such as Claude Code, Claude Desktop, Cursor, Codex runtimes with MCP support, and other local agent environments. The server runs locally and reads only the file paths you provide to its tools; Jano does not upload datasets anywhere by itself.
Jano includes three surfaces intended to make the project easier for AI agents to use and extend:
- Architecture notes in
docs/architecture/explain the project layers, accepted decisions, specs and open RFCs. - The canonical agent guide in
docs/ai/jano-agent-guide.mdexplains which Jano API to use for common temporal validation tasks. - Tool-specific adapters provide lightweight entry points for Codex, Claude and Cursor:
skills/jano/SKILL.mdCLAUDE.md.cursor/rules/jano.mdc
Use the MCP server when an agent should execute Jano operations over local datasets. Use the skill or agent guide when an agent needs to reason about Jano, write code with the library or modify the repository safely.
Many machine learning datasets are not just tabular; they are structured over time and often across multiple entities such as users, routes, sellers or products. In those settings, a more faithful view of the data is not "a bag of independent rows" but a temporally ordered process.
Standard evaluation tooling usually assumes observations are i.i.d. enough that a static split is acceptable. That assumption breaks quickly when time matters: future information leaks into training, performance estimates become optimistic, and offline validation stops reflecting what really happens in production.
Most train/test utilities answer a simple question:
"How do I split this dataset once?"
Jano is meant to answer a richer one:
"How would this system have behaved over time if I had trained, retrained and evaluated it under a specific temporal policy?"
That difference is the core of the project. Jano treats evaluation as a temporal simulation rather than a static partition. Instead of defining one split, it defines a policy over time: train window, evaluation horizon, shift between iterations and optional leakage-control gaps. Running that policy produces a sequence of causally valid folds rather than one aggregate estimate.
That also makes it a useful way to evidence drift in simulation results, because temporal shifts in behavior, performance or calibration become visible fold after fold.
That makes it useful not only for machine learning, but for any workflow where the data is time-dependent:
- Backtesting predictive models on transactional data.
- Simulating daily or weekly retraining in production.
- Comparing rolling versus expanding windows.
- Introducing explicit gaps between training and evaluation periods.
- Defining
train/testortrain/validation/testpartitions with durations, row counts or percentages. - Surfacing drift in simulation outcomes by making temporal changes explicit across folds.
Jano is being reshaped as a small, explicit temporal partitioning toolkit with an interface inspired by sklearn.model_selection.
The design goals are:
- Clear, composable temporal partition definitions.
- Low hidden state and predictable behavior.
- Compatibility with pandas-first workflows.
- A splitter-style API that can evolve toward stronger scikit-learn interoperability.
- Rich split objects for inspection, auditability and simulation.
The recommended high-level surface is intentionally small:
WalkForwardPolicyfor production-like walk-forward evaluation,WalkForwardRunnerwhen you want Jano to execute a model over those folds and manage retraining cadence,TrainHistoryPolicyfor fixed-test, growing-train questions,DriftMonitoringPolicyfor fixed-train, moving-test questions.
Those classes sit on top of the lower-level building blocks that remain available:
TemporalSimulationfor explicit simulation objects,TemporalBacktestSplitterfor manual fold iteration,TrainGrowthPolicyandPerformanceDecayPolicyfor lower-level temporal hypothesis primitives.
The workflow is intentionally compositional:
- start simple with predefined layouts and strategies,
- move to
plan()when you want to inspect or filter iterations before running them, - use higher-level policies such as
TrainGrowthPolicyorPerformanceDecayPolicywhen the question is already encapsulated, - and fall back to manual fold iteration when you want to compose everything yourself: partitions, gaps, feature history and model training logic.
The cleanest mental model is to treat Jano as five layers that can stay independent:
TemporalBacktestSplitterfor temporal geometry and manual fold iteration.plan()for inspecting and filtering that geometry before materialization.TemporalSimulationandWalkForwardPolicyfor fold-level simulation and reporting.WalkForwardRunnerfor training, predicting and measuring over temporal folds with explicit retrain rules.- higher-level studies and policies for operational questions such as train sufficiency, decay and retraining cadence.
That separation is deliberate. The splitter remains the free-form core. Runners and studies extend what Jano can do at the simulation layer, but they do not replace manual fold iteration.
It supports:
single,rollingandexpandingstrategies.train_testandtrain_val_testlayouts.- Segment sizes defined as durations like
"30D", row counts like5000, or fractions like0.7. - Calendar-aligned duration windows with
calendar_frequency="D"when you want complete days instead of elapsed-time windows anchored at the first timestamp. - Optional gaps before validation or test segments.
- Plain index output through
split(). - Rich fold objects through
iter_splits(). - Simulation summaries, HTML timeline reports and plot-ready chart data through
describe_simulation(). - An adaptive partition engine that keeps pandas, NumPy and Polars inputs native for planning when it is safe, and falls back to pandas when stability is more important.
import pandas as pd
from jano import TemporalPartitionSpec, WalkForwardPolicy
frame = pd.DataFrame(
{
"timestamp": pd.date_range("2024-01-01", periods=60, freq="D"),
"feature": range(60),
"target": range(100, 160),
}
)
policy = WalkForwardPolicy(
time_col="timestamp",
partition=TemporalPartitionSpec(
layout="train_test",
train_size="30D",
test_size="1D",
),
step="1D",
strategy="rolling",
)
result = policy.run(frame, title="One month in production")
print(result.total_folds)
print(result.engine_metadata.to_dict())
print(result.summary.to_frame().head())
print(result.chart_data.segment_stats)By default, engine="auto" lets Jano choose the safest fast path for partitioning:
from jano import TemporalPartitionSpec, WalkForwardPolicy, WalkForwardRunner
policy = WalkForwardPolicy(
time_col="timestamp",
partition=TemporalPartitionSpec(
layout="train_test",
train_size="30D",
test_size="7D",
),
step="7D",
strategy="rolling",
)
runner = WalkForwardRunner(
model=model,
target_col="target",
feature_cols=["feature"],
retrain="periodic",
retrain_interval=2,
metrics=["mae", "rmse"],
)
run = runner.run(policy, frame)
print(run.to_frame().head())
print(run.summary())
print(run.metric_trajectory().head())
print(run.retrain_events())
report_data = run.report_data(include_predictions=False)Supported retrain modes are:
retrain=Trueorretrain="always"to refit on every fold.retrain=Falseorretrain="never"to train once and benchmark a fixed model.retrain="periodic"withretrain_interval=Kto refit everyKfolds.retrain_policy=DriftBasedRetrain(...)when the next retrain decision should depend on previously observed fold metrics.
Runner results are intentionally data-first rather than dashboard-first:
run.fold_summary()returns temporal fold geometry and retraining metadata.run.metric_trajectory()returns metrics in long format, ready for plotting.run.retrain_events()returns only folds where the estimator was refit.run.predictions_frame()returns row-level test predictions.run.report_data()/run.to_dict()return structured dictionaries for notebooks, agents, dashboards or presentation tools.
pandas inputs stay pandas, Polars inputs use Polars column extraction, and NumPy arrays
use array indexing. You can force a path with engine="pandas", engine="polars" or
engine="numpy" when you need deterministic behavior for a pipeline.
If you want to inspect the full simulation geometry before materializing folds, plan it first:
plan = policy.plan(frame, title="One month in production")
print(plan.total_folds)
print(plan.to_frame().head())
filtered = plan.exclude_windows(
train=[("2025-12-20", "2026-01-05")],
).select_from_iteration(5)
result = filtered.materialize()That plan frame includes the explicit iteration index, segment boundaries and row counts for each fold.
You can also anchor a simulation to a specific date and limit how many folds are materialized:
policy = WalkForwardPolicy(
time_col="timestamp",
partition=TemporalPartitionSpec(
layout="train_test",
train_size="15D",
test_size="4D",
),
step="1D",
strategy="rolling",
start_at="2025-09-01",
max_folds=15,
)
result = policy.run(frame, title="15 daily retraining iterations")The recommended walk-forward surface also supports end_at when you want to constrain the simulation to a bounded time window before folds are generated.
When a single timestamp is not enough, WalkForwardPolicy, TemporalSimulation and TemporalBacktestSplitter can also receive a TemporalSemanticsSpec. That lets you keep one column as the reported timeline while using different timestamp columns to decide whether train, validation or test rows are actually eligible. This is useful for production-style leakage control, for example when a target only becomes available at arrived_at even if the operational timeline is anchored on departured_at.
For numpy.ndarray inputs, use integer column references:
import numpy as np
values = np.array(
[
["2025-09-01", 1.2, 10],
["2025-09-02", 1.5, 11],
["2025-09-03", 1.1, 12],
],
dtype=object,
)
splitter = TemporalBacktestSplitter(
time_col=0,
partition=TemporalPartitionSpec(
layout="train_test",
train_size="2D",
test_size="1D",
),
step="1D",
strategy="single",
)from jano import TemporalBacktestSplitter, TemporalPartitionSpec
splitter = TemporalBacktestSplitter(
time_col="timestamp",
partition=TemporalPartitionSpec(
layout="train_val_test",
train_size=0.6,
validation_size=0.2,
test_size=0.2,
),
step=0.2,
strategy="single",
)
for split in splitter.iter_splits(frame):
print(split.summary())This is a special use case. It is useful when you want to study whether more training history really improves the same test slice.
from jano import TrainHistoryPolicy
policy = TrainHistoryPolicy(
"timestamp",
cutoff="2025-09-15",
train_sizes=["7D", "14D", "21D", "28D"],
test_size="4D",
)
result = policy.evaluate(
frame,
model=model,
target_col="target",
feature_cols=["feature_1", "feature_2"],
metrics=["mae", "rmse"],
)
print(result.to_frame()[["train_size", "rmse"]])
print(result.find_optimal_train_size(metric="rmse", tolerance=0.01))That pattern keeps test fixed while train expands toward the past. It is a practical way to study data efficiency or to estimate how much history is actually needed.
The opposite special case is also common: keep train fixed and move test forward day by day to estimate how long a model or rule keeps its performance without retraining. The two patterns answer different questions:
- fixed
test+ growingtrain: how much history do I actually need? - fixed
train+ movingtest: for how long does performance hold after deployment?
Example of the second pattern:
from jano import DriftMonitoringPolicy
policy = DriftMonitoringPolicy(
"timestamp",
cutoff="2025-09-15",
train_size="30D",
test_size="3D",
step="1D",
max_windows=10,
)
result = policy.evaluate(
frame,
model=model,
target_col="target",
feature_cols=["feature_1", "feature_2"],
metrics=["mae", "rmse"],
)
print(result.to_frame()[["window", "test_start", "rmse"]])
print(result.find_drift_onset(metric="rmse", threshold=0.15, baseline="first"))This is the next-level composed question: if each outer test window is allowed to choose its own optimal training history, how much history is needed on average?
from jano import RollingTrainHistoryPolicy, TemporalPartitionSpec
policy = RollingTrainHistoryPolicy(
"timestamp",
partition=TemporalPartitionSpec(
layout="train_test",
train_size="30D",
test_size="1D",
),
step="1D",
strategy="rolling",
max_folds=10,
train_sizes=["5D", "10D", "15D", "30D"],
)
result = policy.evaluate(
frame,
model=model,
target_col="target",
feature_cols=["feature_1", "feature_2"],
metrics="rmse",
metric="rmse",
tolerance=0.01,
)
print(result.to_frame().head())
print(result.summary())The supervised fold can stay fixed while feature engineering still asks for different lookback windows per feature group.
from jano import FeatureLookbackSpec
split = next(splitter.iter_splits(frame))
lookbacks = FeatureLookbackSpec(
default_lookback="15D",
group_lookbacks={"lag_features": "65D"},
feature_groups={"lag_features": ["lag_30", "lag_60"]},
)
history = split.slice_feature_history(
frame,
lookbacks,
time_col="timestamp",
segment_name="train",
)
recent_context = history["__default__"]
lag_context = history["lag_features"]This is useful when recent features only need a short window while lagged or seasonal features need much deeper historical context for the same model.
summary = splitter.describe_simulation(frame, title="Walk-forward simulation")
html = splitter.describe_simulation(frame, output="html")
chart_data = splitter.describe_simulation(frame, output="chart_data")
print(summary.total_folds)
print(summary.to_frame().head())
print(chart_data.segment_stats)That gives you three ways to consume the same simulation:
summaryfor tabular metadata and export helpers,htmlfor a standalone visual report,chart_datafor direct Python plotting without reparsing HTML.
The generated report shows each fold across the dataset timeline, with richer summary cards, clearer segment labels and row counts per partition.
Install the current release from PyPI:
python -m pip install janoTo use Polars inputs directly:
python -m pip install "jano[polars]"For local development:
python -m pip install -e ".[dev]"
python -m pytest --cov=jano --cov-report=term-missing
python -m sphinx -b html docs docs/_build/htmlJano also exposes its runtime version through jano.__version__.
The repository includes a dedicated GitHub Actions workflow for PyPI publication through trusted publishing.
The release path is:
- Update
jano/_version.py. - Run
python -m pytest -q. - Run
python -m buildandpython -m twine check dist/*. - Push a tag like
v0.3.0.
That tag triggers the Publish workflow, which builds the wheel and source distribution and publishes them to PyPI.
In parallel, the repository also includes a GitHub Release workflow that can create a GitHub Release and attach the built wheel and source distribution for any v* tag.
The repository includes:
- GitHub Actions for tests across multiple Python versions.
- GitHub Pages publication for Sphinx documentation.
- Coverage reporting with
pytest-cov. - Codecov upload and status tracking.
- A coverage gate set to 99%.
Jano is an early public project with a usable core and an API that is still being refined as the simulation layer grows.
The low-level temporal partitioning surface is the most stable part of the library: TemporalBacktestSplitter, TemporalPartitionSpec, TemporalSimulation, WalkForwardPolicy and plan() are the foundation for manual fold iteration, auditability and simulation planning.
The higher-level execution and study APIs, including WalkForwardRunner, retrain policies, train-history studies and drift-monitoring helpers, are intentionally evolving. They are covered by tests and documented, but naming and ergonomics may still change while Jano is being shaped into a broader temporal experimentation framework.
Current distribution and quality signals:
- PyPI package: jano.
- Latest tested release line:
0.3.x. - Test suite:
114 passed. - Coverage gate:
99%minimum. - Current measured coverage:
99.25%. - Documentation: marmurar.github.io/jano.
For production use, pin an explicit version and review release notes before upgrading. For experimentation, temporal validation design work and prototype evaluation pipelines, the project is ready to use.
If you use Jano in research, technical reports, benchmarks or production validation work, please cite it using the metadata in CITATION.cff.
- Marcos Manuel Muraro
Feedback and design discussion are especially valuable right now. If you are using temporal backtesting for ML, analytics, operations or experimentation, that context can help shape the API in the right direction.
