This repo sketches a benchmark harness where AI agents must rediscover hidden physical laws by running experiments inside virtual worlds. Each world is governed by a fully known but potentially "weird" law; we can score agents on how well their discovered formulas predict motion or match the ground truth.
- A lightweight simulation core (
laws_benchmark/) that supports arbitrary force laws, multiple bodies, and RK4/Euler integrators. - A library of composable law primitives (
laws.py) including inverse-power gravity, velocity coupling, time-varying fields, polynomial wells, hidden latent pulses, region-gated switches, nonlinear superposition, stochastic kicks, hidden-coordinate forces, and symbolic laws. - A law generator (
generator.py) that assembles easy/medium/hard universes and random initial conditions in any dimension. - Experiment/rollout support (
simulation.py) that lets a policy inject forces each step, producing trajectories with actions attached. - Evaluation helpers (
evaluation.py) that score a candidate law by acceleration fit, trajectory replay error, and symbolic equivalence checks (symbolic_expression_similarity). - Dataset export/import (
dataset.py) and a minimal gym-style interface (envs.py) so agents can interact step-by-step or store rollouts. - Example scripts:
examples/basic_run.py(hard scenario with naive law) andexamples/symbolic_and_export.py(symbolic matching, dataset export, env demo).
-
Run a hard scenario with numeric laws:
python examples/basic_run.pyThis builds a 3-body, 3D world with a composite law (gravity + velocity coupling + time field + hidden pulses + region switch + nonlinear mixing), rolls out 200 steps with random pushes, and prints acceleration/position errors for a naive candidate law.
-
See symbolic checks, dataset export, and env usage:
python examples/symbolic_and_export.py -
Use active-exploration policies and curriculum (snippet):
from laws_benchmark import EnvironmentSpec, generate_trajectory, axis_sweep_policy, latin_hypercube_policy, linear_curriculum import random spec = EnvironmentSpec(dim=3, num_bodies=3, steps=150, complexity="medium", seed=0) policy = axis_sweep_policy(strength=0.3) traj, law = generate_trajectory(spec, policy) # Build a staged curriculum of increasingly hard worlds steps = linear_curriculum(levels=4, start_dim=2, max_dim=4, start_bodies=2, max_bodies=4) for step in steps: print(step.description)
- Law primitives:
NewtonianInversePowerLaw,VelocityCouplingLaw,TimeVaryingFieldLaw,PolynomialWellLaw,HiddenDimensionPulseLaw,RegionSwitchLaw,NonlinearSuperpositionLaw,StochasticKickLaw,HiddenCoordinateLaw, andSymbolicLaw. They can be summed viaCompositeLaw, letting you build exotic universes (velocity-dependent forces, time oscillations, piecewise rules, latent modulation, stochasticity, and symbolic definitions). - Random law generator:
EnvironmentSpecdefines dimension, body count, timestep, and complexity.generate_trajectoryproduces a rollout plus the ground-truth law. Complexity levels adjust which primitives are combined. - Experiment policies: Provide a function
(time, state) -> forcesto inject probes. Included options:random_push_policy,axis_sweep_policy,latin_hypercube_policy,scheduled_impulse_policy, andzero_policy. Policies are passed toSimulation.rollout. - Evaluation:
acceleration_msecompares predicted accelerations against the true law on recorded states;predictive_position_msereplays the trajectory with a candidate law under the same actions and measures positional drift;symbolic_expression_similarityuses SymPy to test whether a discovered formula matches the ground truth up to numerical tolerance. - Datasets:
save_trajectory_json/load_trajectory_jsonserialize trajectories (including actions) for offline analysis. - Env interface:
BenchmarkEnvprovidesreset/stepmethods with observations and optional rewards, mirroring gym-style loops without adding dependencies. - Curriculum:
linear_curriculumbuilds a progression ofEnvironmentSpecobjects that increase dimension, body count, steps, and complexity, useful for staged training/evaluation.
- Expand experiment policies further (gradient probes, adaptive excitation tuned to local curvature).
- Add richer law families (higher-order derivatives, discontinuities, explicit control-affine terms, hidden coordinates that only appear in some regions).
- Add more export formats (Arrow, Parquet) and automated dataset sampling utilities.
- Introduce more curriculum shapes (randomized progressions, competence-based advancement).
laws_benchmark/core.py: dataclasses for bodies, world state, and derivatives.laws_benchmark/integrators.py: Euler and RK4 integrators.laws_benchmark/laws.py: law primitives and samplers.laws_benchmark/simulation.py: simulator and trajectory recording with action hooks.laws_benchmark/generator.py: environment spec, random initial states, and policies.laws_benchmark/evaluation.py: scoring utilities.examples/basic_run.py: runnable demo.