A unified toolkit for researchers and engineers working on AI physical reasoning. PRKit provides a shared foundation for representing physics problems, running inference with multiple model providers, evaluating outputs with physics-aware comparators, and building structured annotation workflows.
PRKit applies a βunified interfaceβ idea to the full physical-reasoning loop (data β annotation β inference β evaluation), rather than focusing on datasets alone.
PRKit centers on core components that define the physical reasoning ontology. Three integrated subpackages build on this foundation:
- Core components:
PhysicsDomain,AnswerCategory,PhysicsProblem,Answer,PhysicalDataset,PhysicsSolution,BaseModelClient,create_model_client,PRKitLoggerβthe shared abstractions used across the toolkit. prkit_datasets: A Datasets-like hub that downloads/loads benchmarks into the unified schema (PhysicsProblem,PhysicalDataset).prkit_annotation: Workflow-oriented tools for structured, lower-level labels (e.g., domain/subdomain, theorem usage).prkit_evaluation: Evaluate-like components for physics-oriented scoring and comparison (e.g., symbolic/numerical answer matching).
from prkit.prkit_datasets import DatasetHub
from prkit.prkit_core.model_clients import create_model_client
# Load any benchmark into the unified schema (PhysicsProblem, PhysicalDataset)
dataset = DatasetHub.load("physreason", variant="full", split="test")
# Run inference with the unified model client (core component)
client = create_model_client("gpt-4.1-mini")
for problem in dataset[:3]:
print(client.chat(problem.question)[:200)The same pattern works across different datasets and model providersβswap the dataset name or model identifier.
Quick Links:
- π§ CORE.md - Core components: domain model, model client, logger, and definitions
- π DATASETS.md - Complete guide to supported datasets and benchmarks
- π EVALUATION.md - Evaluation metrics and comparison strategies
- π CHANGELOG.md - Version history and release notes
- Python 3.10+ (required)
# Install the latest stable version
pip install physical-reasoning-toolkit
# Verify installation
python -c "import prkit; print(prkit.__version__)"Step 1: Clone the Repository
git clone https://github.com/sherryzyh/physical_reasoning_toolkit.git
cd physical_reasoning_toolkitStep 2: Install
# Install the package (regular install for end users)
pip install .
# Verify installation
python -c "import prkit; print('β
Toolkit installed successfully!')"Option 1: Export as environmental variable
# For model provider integration (optional)
export OPENAI_API_KEY="your-openai-api-key"
export GEMINI_API_KEY="your-gemini-api-key"
export DEEPSEEK_API_KEY="your-deepseek-api-key"
# For logging configuration (optional)
export PRKIT_LOG_LEVEL=INFO
export PRKIT_LOG_FILE=/var/log/prkit.log # Optional: defaults to {cwd}/prkit_logs/prkit.log if not setOption 2: Create a .env file at your project root
π See CORE.md (Model Client section) for supported providers and usage.
python -c "
import prkit
from prkit.prkit_datasets import DatasetHub
from prkit.prkit_annotation.workflows import WorkflowComposer
print('β
All packages imported successfully!')
print(f'PRKit version: {prkit.__version__}')
"physical_reasoning_toolkit/
βββ src/prkit/ # Main package (modern src-layout)
β βββ prkit_core/ # Core components (domain models, model clients, logging)
β βββ prkit_datasets/ # Dataset loading and management
β βββ prkit_annotation/ # Annotation workflows and tools
β βββ prkit_evaluation/ # Evaluation metrics and benchmarks
βββ tests/ # Unit tests
βββ pyproject.toml # Package configuration
βββ LICENSE # MIT License
βββ README.md # This file
Note: The actual dataset files are stored externally (see Environment Setup section). This repository contains only the toolkit code, examples, and documentation.
In Repository (Code & Documentation):
- β src/prkit/: Complete toolkit with core components and 3 subpackages
- β tests/: Unit tests (for contributors)
External (Data & Runtime):
- π Data Directory: Dataset files (set via
DATASET_CACHE_DIR) - π API Keys: Model provider credentials (if applicable)
- π Log Files: Runtime logs (default:
{cwd}/prkit_logs/prkit.log, can be overridden viaPRKIT_LOG_FILE)
The toolkit is organized around core components and three subpackages that use them. Subpackages depend only on prkit_core; there are no direct dependencies between prkit_datasets, prkit_annotation, and prkit_evaluation.
| Component | Purpose |
|---|---|
prkit_core |
Core components, see below |
prkit_datasets |
Dataset hub: loaders, downloaders, unified schema |
prkit_evaluation |
Comparators and accuracy metrics |
prkit_annotation |
Workflow pipelines for domain/theorem annotation |
The essential building blocks of the physical-reasoning-toolkit. All datasets, inference, evaluation, and annotation workflows use these components.
- PhysicsDomain β Enumeration of physics subfields (mechanics, thermodynamics, quantum mechanics, optics, etc.) for problem classification. Aligned with UGPhysics, PHYBench, TPBench. Use
PhysicsDomain.from_string()for flexible parsing. - AnswerCategory β Enumeration of answer types for normalization and evaluation:
NUMBER,PHYSICAL_QUANTITY,EQUATION,FORMULA,TEXT,OPTION. Drives how answers are compared (numerical precision, symbolic equivalence, exact match). - PhysicsProblem β The canonical representation of a physics problem. Required:
problem_id,question. Optional:answer(Answer),solution,domain,image_path,problem_type(MC/OE),options,correct_option. Supports dictionary-like access andload_images()for visual problems. - Answer β Unified answer model.
valueholds the number (NUMBER), numeric part (PHYSICAL_QUANTITY), option string (OPTION), or plain string (EQUATION, FORMULA, TEXT).unitis optional and used only for PHYSICAL_QUANTITY. Type checks, unit helpers, LaTeX handling, option indexing. - PhysicalDataset β Collection of
PhysicsProbleminstances. Indexing, slicing,get_by_id(),filter_by_domain(),take(),sample(),save_to_json()/from_json(). Providesget_statistics()for domain and problem-type distribution. - PhysicsSolution β Bundles a
PhysicsProblem, modelagent_answer, and optionalintermediate_steps. Captures the full solution trace for evaluation and analysis. - BaseModelClient β Abstract base for model clients. Subclasses implement
chat(user_prompt, image_paths=None). - PRKitLogger β Centralized logging with colored output, file logging, and env config (
PRKIT_LOG_LEVEL,PRKIT_LOG_FILE, etc.).
π See CORE.md for the full domain model, entity relationships, subpackage dependency diagram, and import reference.
Answer comparators (symbolic, numerical, textual, option-based), accuracy evaluator, and physics-focused assessment protocols.
π EVALUATION.md
Dataset hub with a Datasets-like interface: DatasetHub.load() for PHYBench, PhysReason, UGPhysics, SeePhys, PhyX (plus JEEBench, TPBench loaders). Auto-download, variant selection, and reproducible sampling.
π DATASETS.md
Modular workflows (domain classification, theorem extraction) via WorkflowComposer and presets. Model-assisted and human-in-the-loop.
π ANNOTATION.md
# Check Python version
python --version # Should be 3.10+
# If using wrong version
python -m venv venv
source venv/bin/activate# Reinstall in development mode
pip install -e .
# Check installation
pip show physical-reasoning-toolkit# Set data directory (external to repository)
export DATASET_CACHE_DIR=/path/to/your/data
# Check directory structure
ls -la $DATASET_CACHE_DIR
# Verify dataset files exist
ls -la $DATASET_CACHE_DIR/ugphysics/
ls -la $DATASET_CACHE_DIR/PhysReason/- Review logs: Check logging output for detailed error information
- Verify setup: Run the testing commands above
- Check data: Ensure datasets are properly downloaded and accessible
- Check documentation: Start with the root docs linked below
- GitHub Issues: Report bugs or request features
- Discussions: Share ideas and get help
# Clone and install in development mode
git clone https://github.com/sherryzyh/physical_reasoning_toolkit.git
cd physical_reasoning_toolkit
pip install -e ".[dev]"
# Run code quality tools
black src/
isort src/
mypy src/
# Run tests
pytest tests/- Follow existing patterns: Use consistent logging and error handling
- Add tests: Include tests for new functionality
- Update documentation: Add examples and update README files
- Maintain compatibility: Ensure changes don't break existing functionality
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Ensure all tests pass
- Submit a pull request with clear description
If you use PRKit in your research, please cite it as follows:
BibTeX:
@software{zhang2026physicalreasoningtoolkit,
author = {Zhang, Yinghuan},
title = {Physical Reasoning Toolkit},
year = {2026},
license = {MIT},
url = {https://github.com/sherryzyh/physical_reasoning_toolkit},
abstract = {A unified toolkit for researchers and engineers working on AI physical reasoning. PRKit provides a shared foundation for representing physics problems, running inference with multiple model providers, evaluating outputs with physics-aware comparators, and building structured annotation workflows.}
}For citation files, see CITATION.cff and CITATION.bib in the repository root.
PRKit integrates and builds upon several excellent physics reasoning benchmarks and datasets. We thank the creators of:
- PhysReason, PHYBench, UGPhysics, SeePhys, PhyX, and other benchmark datasets
- The open-source community for their valuable contributions and feedback
Note: For detailed citations and references to the original dataset papers, please see the Citations section in DATASETS.md.
This project is licensed under the MIT License - see the LICENSE file for details.
Ready to advance physics reasoning research! πβ¨
Quick Links: pip install physical-reasoning-toolkit | GitHub | Documentation | Issues