feat: LoRA self-reference RL — use policy trainer as KL reference by mayinghan · Pull Request #299 · fw-ai/cookbook

mayinghan · 2026-04-06T00:39:30Z

When training with LoRA, the base model (without adapter) can serve as the KL divergence reference model. This eliminates the need for a separate FORWARD_ONLY reference trainer job, halving reference GPU cost.

Changes:

Add BaseReferenceClient: lightweight forward-only wrapper for base model logprobs, sharing the policy client's service session
Add ReconnectableClient.create_base_reference(): creates a base-only model handle (base-) on the same LORA_TRAINER, with LoRA adapters disabled for forward passes
Store _svc on ReconnectableClient so the service session can be reused for both policy and reference model handles
New cookbook example: training/examples/rl/lora_self_ref/ demonstrating GRPO with LoRA where no separate reference training shape is needed

Depends on SDK changes:

tinker CreateModelRequest.base_only field
FiretitanServiceClient.create_base_training_client() method

When training with LoRA, the base model (without adapter) can serve as the KL divergence reference model. This eliminates the need for a separate FORWARD_ONLY reference trainer job, halving reference GPU cost. Changes: - Add BaseReferenceClient: lightweight forward-only wrapper for base model logprobs, sharing the policy client's service session - Add ReconnectableClient.create_base_reference(): creates a base-only model handle (base-<hex>) on the same LORA_TRAINER, with LoRA adapters disabled for forward passes - Store _svc on ReconnectableClient so the service session can be reused for both policy and reference model handles - New cookbook example: training/examples/rl/lora_self_ref/ demonstrating GRPO with LoRA where no separate reference training shape is needed Depends on SDK changes: - tinker CreateModelRequest.base_only field - FiretitanServiceClient.create_base_training_client() method Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LoRA self-reference RL — use policy trainer as KL reference#299

feat: LoRA self-reference RL — use policy trainer as KL reference#299
mayinghan wants to merge 1 commit intomainfrom
feat/lora-self-ref-rl-example

mayinghan commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mayinghan commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant