GIMO-MemoryLab

GIMO-MemoryLab is a research and engineering repository for multi-turn recommendation, memory-aware feedback modeling, and closed-loop evaluation. The codebase provides executable pipelines for structured memory experiments, scope-aware critique processing, controlled preference-pair construction, and artifact-oriented evaluation outputs.

What This Repository Contains

The current codebase includes the following components:

DriftAware-GIMO: structured memory for positive, negative, hard, and soft preference tracking under interest drift.
CritiqueScope-GIMO: fast/slow critique memory that distinguishes temporary feedback from durable user constraints.
CritiqueWorld: a CPU-only, API-free closed-loop testbed for checking whether critique memory actually changes future recommendation slates.
CDPO bridge tooling: controlled preference-pair export, validation, manifest generation, train/dev split materialization, and readable audit reports.

For the current implementation and experiment status, start with RESEARCH_STATUS.md.

Quick Start

Clone this repository and install the Python dependencies:

git clone https://github.com/Rongfeng-Guo/agent-rec.git
cd agent-rec
pip install -r requirements.txt

Local Data and API Setup

AILO environment assets

The AILO simulator path expects the embedding index assets used by the project task pipeline.

Download the index file.
Unzip the downloaded file into user_simulator/embedding/.

Additional simulator notes are available in user_simulator/readme.md.

API configuration

Any path in this repository that calls an LLM uses an OpenAI-compatible API interface.

Put your endpoint and key in config/api_config.json.
Closed-source models can be configured directly through that file.
Open-source models can be exposed through a local OpenAI-compatible server such as vllm.

The CritiqueWorld evaluation path does not require an API key.

Main Workstreams

1. Baseline GIMO training path

This repository includes the SFT, GPE, HAP, and CDPO training entry points used by the project workflow after datasets, model weights, and GPU resources are configured.

SFT:

cd LLaMA-Factory
bash gimo/{dataset}/sft/sft.sh

GPE rollout:

cd GPE_HAP
python rewrite_v3.py --domain {dataset}

CDPO training:

cd LLaMA-Factory
bash gimo/{dataset}/gimo/adpo_v1_sample1.sh

2. DriftAware-GIMO

StructuredMemory adds explicit slots for positive preferences, negative preferences, hard constraints, and soft preferences for preference-drift analysis and memory-state inspection.

Example:

env = UserAgentEnv(
    persona_path="user_simulator/task/Yelp_test.jsonl",
    user_id=0,
    item_id=0,
    config_path="config/api_config.json",
    format_path="config",
    domain="restaurant",
    model_type="openai",
    memory_mode="structured",
)

Run the offline benchmark:

python -m user_simulator.evaluation.drift_memory_eval

Protocol details are documented in docs/driftaware_gimo.md.

3. CritiqueScope-GIMO

CritiqueScopeMemory models natural-language feedback as scope-aware memory updates with separate handling for temporary and durable signals.

Fast memory handles temporary fatigue, session context, and immediate diversity requests.
Slow memory keeps durable constraints and preferences that are supported by persistent language or repeated evidence.

Example:

env = UserAgentEnv(
    persona_path="user_simulator/task/Yelp_test.jsonl",
    user_id=0,
    item_id=0,
    config_path="config/api_config.json",
    format_path="config",
    domain="restaurant",
    model_type="openai",
    memory_mode="critiquescope",
)

Run the memory-level diagnostic benchmark:

python -B -m user_simulator.evaluation.critique_scope_eval

Build controlled preference pairs:

python -B -m user_simulator.evaluation.critique_uplift_pairs --output critique_pairs.jsonl

See docs/critiquescope_gimo.md for the full schema and protocol.

4. CritiqueWorld closed-loop evaluation

CritiqueWorld is a closed-loop evaluation environment for measuring how memory interventions affect recommendation trajectories, branch rollouts, and counterfactual preference-pair exports.

Recommended full pipeline, oracle parser:

python -B -m user_simulator.evaluation.run_closed_loop_pipeline \
  --modes none flat structured time_decay critiquescope \
  --scenarios all \
  --seeds 0 1 2 3 4 \
  --max-turns 12 \
  --top-k 5 \
  --parser-mode oracle \
  --output-dir outputs/closed_loop_oracle

Deterministic parser:

python -B -m user_simulator.evaluation.run_closed_loop_pipeline \
  --modes none flat structured time_decay critiquescope \
  --scenarios all \
  --seeds 0 1 2 \
  --max-turns 12 \
  --top-k 5 \
  --parser-mode deterministic \
  --output-dir outputs/closed_loop_deterministic

This pipeline runs the benchmark, validates cdpo_pairs.jsonl, materializes cdpo_train.jsonl and cdpo_dev.jsonl, builds the dataset manifest, and writes closed_loop_report.md plus pipeline_metadata.json.

Validity gate:

python -B -m user_simulator.evaluation.run_validity_gate \
  --modes none flat structured time_decay critiquescope \
  --scenarios all \
  --seeds 0 1 2 3 4 \
  --max-turns 12 \
  --top-k 5 \
  --output-dir outputs/validity_gate \
  --fail-on-critical-invariant

Pipeline with validity gate:

python -B -m user_simulator.evaluation.run_closed_loop_pipeline \
  --modes none flat structured time_decay critiquescope \
  --scenarios all \
  --seeds 0 1 2 3 4 \
  --max-turns 12 \
  --top-k 5 \
  --parser-mode oracle \
  --run-validity-gate \
  --fail-on-critical-invariant \
  --output-dir outputs/closed_loop_oracle

Interpretation: the branch-level uplift and regret numbers are controlled counterfactual rollout proxies intended for diagnostic evaluation.

More detail lives in docs/critique_world.md and docs/experiment_protocol.md.

Repository Outputs

The main generated artifacts currently tracked in this repository include:

outputs/memory_baselines
outputs/memory_baselines_noisy
outputs/closed_loop_oracle
outputs/closed_loop_deterministic
outputs/validity_gate

These folders contain JSONL trajectories, summary tables, validation files, dataset manifests, train/dev split files, and Markdown audit reports for the current controlled experiments.

Current Position

Available and validated:

controlled memory-level and closed-loop evaluation without calling paid APIs
CDPO bridge export with validation and dataset manifests
materialized train/dev split generation
deterministic regression tests for CritiqueScope and CritiqueWorld

Requires external setup:

full SFT and CDPO training
real AILO rollout evaluation
OpenAI-compatible parser mode
GPU-backed model experiments

Dependency Notes

This repository reuses parts of the GIMO project structure and related training tooling. The following external components remain relevant to the current implementation:

GIMO training and simulator structure
ECPO for related evaluation ideas
LLaMA-Factory for the training framework used by the original stack

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github/workflows		.github/workflows
GPE_HAP		GPE_HAP
LLaMA-Factory		LLaMA-Factory
configs/memory_baselines		configs/memory_baselines
docs		docs
outputs		outputs
pic		pic
scripts		scripts
tests		tests
user_simulator		user_simulator
RESEARCH_STATUS.md		RESEARCH_STATUS.md
Readme.md		Readme.md
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GIMO-MemoryLab

What This Repository Contains

Quick Start

Local Data and API Setup

AILO environment assets

API configuration

Main Workstreams

1. Baseline GIMO training path

2. DriftAware-GIMO

3. CritiqueScope-GIMO

4. CritiqueWorld closed-loop evaluation

Repository Outputs

Current Position

Dependency Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GIMO-MemoryLab

What This Repository Contains

Quick Start

Local Data and API Setup

AILO environment assets

API configuration

Main Workstreams

1. Baseline GIMO training path

2. DriftAware-GIMO

3. CritiqueScope-GIMO

4. CritiqueWorld closed-loop evaluation

Repository Outputs

Current Position

Dependency Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages