feat: Explore parametric personality layer via LoRA fine-tuning

## Summary

RLM's memory is entirely **non-parametric** — everything is stored as text in SQLite and retrieved via FTS5 search. This works well for discrete facts and episode recall, but misses a class of knowledge that's hard to capture as database rows: implicit working patterns, communication style preferences, domain expertise contours, and behavioral tendencies that emerge across hundreds of sessions.

This issue proposes a **parametric personality layer**: a LoRA-fine-tuned local model (Llama 3.1 8B) trained on archived conversations that captures these implicit patterns and periodically distills them into artifacts the existing RLM pipeline can consume.

**Priority:** Future exploration (Phase 5) — depends on structured fact extraction being implemented first

## Core Insight

LLMs are excellent at encoding patterns from training data. The same mechanism could encode "who this user is to work with" if fine-tuned on their conversation history. The key is scoping it correctly:

- **NOT for discrete fact recall** — structured storage (SQLite FTS5) is strictly better
- **FOR pattern encoding** — preferences, working style, domain fluency, implicit knowledge

## Architecture: Offline Distillation Oracle

The personality model is **not** a live oracle during sessions. It's a batch process that periodically distills implicit knowledge into text artifacts consumed by existing pipelines.

```
Step 1: Synthetic Q&A generation (Claude API, batch mode)
  Input: 20-30 recent transcripts → Output: ~500-1000 Q&A pairs

Step 2: LoRA fine-tune (Llama 3.1 8B + QLoRA via Unsloth)
  ~30-60 min on consumer GPU → ~/.rlm/personality/adapter/

Step 3: Knowledge extraction (query fine-tuned model, batch)
  Output: USER_PROFILE.md + structured facts + pattern entries
```

### Why offline distillation, not a live oracle

| | Live Oracle | Offline Distillation |
|---|---|---|
| GPU during sessions | Required | Not needed |
| Latency | 2-5s per query | Zero (pre-generated text) |
| Integration complexity | New MCP tool, model server | Feeds into existing pipeline |
| User hardware | Needs GPU every session | Needs GPU once a month |

## CLI Interface

```bash
rlm personality build-dataset   # Generate Q&A pairs from transcripts
rlm personality train           # Fine-tune LoRA adapter
rlm personality extract         # Query model, generate artifacts
rlm personality run             # All 3 steps in sequence
rlm personality status          # Show adapter age, last run, stats
```

## Engineering Effort

| Component | Lines | Effort |
|---|---|---|
| Training data pipeline | ~800-1000 | 2 weeks |
| Fine-tuning orchestration | ~400 | 1 week |
| Knowledge extraction | ~500 | 1 week |
| CLI + hooks + MCP | ~300 | 3-4 days |
| Evaluation framework | ~200 | 3-4 days |
| **Total** | **~2200-2400** | **~5-6 weeks** |

## Ongoing Costs

~$1-2/month (Claude Haiku batch API + cloud GPU time)

## Key Risks

1. **Training data quality** — Q&A generation prompt is make-or-break
2. **8B model expressiveness** — small model may not capture subtle patterns
3. **Evaluation difficulty** — "it feels like it knows me" is hard to quantify
4. **GPU accessibility** — limits adoption
5. **Marginal value** — if structured fact extraction captures 90% of value, this may not justify complexity

## Recommended First Step

Manual proof-of-concept: take 5 archived transcripts, generate Q&A pairs via Claude, assess if they're specific and surprising vs. generic platitudes.

## Dependencies

- Structured fact extraction (separate issue) should be implemented first
- Python: `unsloth`, `transformers`, `peft`, `trl` (training only)
- Optional: Ollama for local model serving

## References

- [@rohit4verse's X post](https://x.com/rohit4verse/status/2012925228159295810)
- [Martian-Engineering/agent-memory](https://github.com/Martian-Engineering/agent-memory)
- [Unsloth](https://github.com/unslothai/unsloth) — 2x faster LoRA fine-tuning
- [QLoRA paper](https://arxiv.org/abs/2305.14314)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Explore parametric personality layer via LoRA fine-tuning #35

Summary

Core Insight

Architecture: Offline Distillation Oracle

Why offline distillation, not a live oracle

CLI Interface

Engineering Effort

Ongoing Costs

Key Risks

Recommended First Step

Dependencies

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	Live Oracle	Offline Distillation
GPU during sessions	Required	Not needed
Latency	2-5s per query	Zero (pre-generated text)
Integration complexity	New MCP tool, model server	Feeds into existing pipeline
User hardware	Needs GPU every session	Needs GPU once a month

Component	Lines	Effort
Training data pipeline	~800-1000	2 weeks
Fine-tuning orchestration	~400	1 week
Knowledge extraction	~500	1 week
CLI + hooks + MCP	~300	3-4 days
Evaluation framework	~200	3-4 days
Total	~2200-2400	~5-6 weeks

feat: Explore parametric personality layer via LoRA fine-tuning #35

Description

Summary

Core Insight

Architecture: Offline Distillation Oracle

Why offline distillation, not a live oracle

CLI Interface

Engineering Effort

Ongoing Costs

Key Risks

Recommended First Step

Dependencies

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions