Skip to content

feat: LoRA self-reference RL — use policy trainer as KL reference#299

Open
mayinghan wants to merge 1 commit intomainfrom
feat/lora-self-ref-rl-example
Open

feat: LoRA self-reference RL — use policy trainer as KL reference#299
mayinghan wants to merge 1 commit intomainfrom
feat/lora-self-ref-rl-example

Conversation

@mayinghan
Copy link
Copy Markdown
Contributor

When training with LoRA, the base model (without adapter) can serve as the KL divergence reference model. This eliminates the need for a separate FORWARD_ONLY reference trainer job, halving reference GPU cost.

Changes:

  • Add BaseReferenceClient: lightweight forward-only wrapper for base model logprobs, sharing the policy client's service session
  • Add ReconnectableClient.create_base_reference(): creates a base-only model handle (base-) on the same LORA_TRAINER, with LoRA adapters disabled for forward passes
  • Store _svc on ReconnectableClient so the service session can be reused for both policy and reference model handles
  • New cookbook example: training/examples/rl/lora_self_ref/ demonstrating GRPO with LoRA where no separate reference training shape is needed

Depends on SDK changes:

  • tinker CreateModelRequest.base_only field
  • FiretitanServiceClient.create_base_training_client() method

When training with LoRA, the base model (without adapter) can serve as
the KL divergence reference model.  This eliminates the need for a
separate FORWARD_ONLY reference trainer job, halving reference GPU cost.

Changes:
- Add BaseReferenceClient: lightweight forward-only wrapper for base
  model logprobs, sharing the policy client's service session
- Add ReconnectableClient.create_base_reference(): creates a base-only
  model handle (base-<hex>) on the same LORA_TRAINER, with LoRA
  adapters disabled for forward passes
- Store _svc on ReconnectableClient so the service session can be
  reused for both policy and reference model handles
- New cookbook example: training/examples/rl/lora_self_ref/ demonstrating
  GRPO with LoRA where no separate reference training shape is needed

Depends on SDK changes:
- tinker CreateModelRequest.base_only field
- FiretitanServiceClient.create_base_training_client() method

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant