NeSyS: Neuro-Symbolic Synergy for Interactive World Modeling

This is the official code and data release for the paper:

Neuro-Symbolic Synergy for Interactive World Modeling Hongyu Zhao, Siyu Zhou, Haolin Yang, Zengyi Qin, Tianyi Zhou arXiv:2602.10480

Large language models exhibit strong general-purpose reasoning capabilities, yet they frequently hallucinate when used as world models. NeSyS bridges this gap by integrating the probabilistic semantic priors of LLMs with executable symbolic rules. The symbolic world model directly constrains the LLM by modifying its output probability distribution, and the neural world model is fine-tuned only on transitions not covered by symbolic rules -- reducing training data by 50% without loss of accuracy.

Results

Backbone	Method	ScienceWorld	Webshop	Plancraft
Strong Baseline	GPT-5-mini	55.4	81.4	73.8
Llama3.2-1B	SFT (100% data)	64.4	47.5	80.5
Llama3.2-1B	Ours (Reduced data)	68.3 (45%)	92.2 (60%)	87.7 (35%)
Qwen3-4B	SFT (100% data)	68.3	47.3	90.1
Qwen3-4B	Ours (Reduced data)	71.0 (45%)	92.6 (60%)	88.4 (35%)

Note: Percentages in parentheses indicate the fraction of data used by our method.

Overview

nesys/
  dataset/              # Transition MCQ benchmark (3 envs × 2 splits)
  eval_results/         # Pre-computed neural evaluation logs (logprobs + predictions)
  final_rules/          # Final symbolic rule files used in the paper
  create_transition_mcq_rules.py   # Rule evaluation / creation tool
  eval_transition_mcq_logprob.py   # Neural MCQ evaluator 
  replicate_our_main_results.sh    # Reproduce NeSyS re-ranking results
  generate_eval_summaries.sh  # Regenerate eval logs
  example_evaluation_script.sh     # Quick-start example

Hugging Face Resources

We release datasets and model adapters on the Hugging Face Hub.

Dataset: cindermond/nesys-world-model-benchmark Three tasks (plancraft, scienceworld, webshop) with dev and test splits, stored as data/<task>/<split>.jsonl.

Model adapters (LoRA, PEFT):

Environment	Llama-3.2-1B-Instruct	Qwen3-4B
PlanCraft	`cindermond/world-model-plancraft-llama3-2-1b-instruct-filtered`	`cindermond/world-model-plancraft-qwen3-4b-filtered`
ScienceWorld	`cindermond/world-model-scienceworld-llama3-2-1b-instruct-filtered`	`cindermond/world-model-scienceworld-qwen3-4b-filtered`
WebShop	`cindermond/world-model-webshop-llama3-2-1b-instruct-filtered`	`cindermond/world-model-webshop-qwen3-4b-filtered`

Each adapter was trained on the filtered subset of transitions (those not covered by the learned symbolic rules).

Installation

pip install -r requirements.txt

Quick Start: Reproduce NeSyS Results

The fastest way to verify our main results is to run the symbolic re-ranking on the pre-computed neural evaluation logs (no GPU needed):

cd nesys
bash replicate_our_main_results.sh

This loads the neural logprobs from eval_results/, applies the final symbolic rules from final_rules/, learns per-rule weights on the dev set, and evaluates on the test set -- printing the NeSyS accuracy for each environment and model.

Run Neural Evaluation

To evaluate a base model or fine-tuned adapter on the benchmark from Hugging Face:

cd nesys

# Base model only
python eval_transition_mcq_logprob.py \
  --base_model meta-llama/Llama-3.2-1B-Instruct \
  --dataset_repo_id cindermond/nesys-world-model-benchmark \
  --task scienceworld \
  --split test \
  --output_prefix example_test_scienceworld

# With a fine-tuned adapter from HF
python eval_transition_mcq_logprob.py \
  --base_model meta-llama/Llama-3.2-1B-Instruct \
  --adapter cindermond/world-model-scienceworld-llama3-2-1b-instruct-filtered \
  --dataset_repo_id cindermond/nesys-world-model-benchmark \
  --task scienceworld \
  --split test \
  --output_prefix eval_results/scienceworld_sft_test_llama3-2-1b-instruct_filtered

You can also pass local JSONL files via --dataset_paths instead of --dataset_repo_id/--task/--split.

To regenerate all the evaluation summaries used in the paper:

cd nesys
bash generate_eval_summaries.sh

Citation

If you find our work useful in your research or applications, please consider citing the following paper and starring this repository.

@article{zhao2026nesys,
  title   = {Neuro-Symbolic Synergy for Interactive World Modeling},
  author  = {Zhao, Hongyu and Zhou, Siyu and Yang, Haolin and Qin, Zengyi and Zhou, Tianyi},
  journal = {arXiv preprint arXiv:2602.10480},
  year    = {2026}
}

License

The code in this repository is released for research purposes. The WebShop dataset files contain web page text from the original WebShop environment; please comply with the original dataset terms when redistributing derivatives.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
nesys		nesys
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeSyS: Neuro-Symbolic Synergy for Interactive World Modeling

Results

Overview

Hugging Face Resources

Installation

Quick Start: Reproduce NeSyS Results

Run Neural Evaluation

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeSyS: Neuro-Symbolic Synergy for Interactive World Modeling

Results

Overview

Hugging Face Resources

Installation

Quick Start: Reproduce NeSyS Results

Run Neural Evaluation

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages