Clean context in. Better answers out.
An open framework for giving any AI durable, structured, honest context about you — the layer for what your AI knows, not how it talks. Part of the PersonaSync family.
Every AI you use is a stranger with amnesia. You re-explain your whole life story every session just to get a useful answer, and none of it carries over to the next one. An AI is only as good as what you feed it.
There's a second, quieter problem: it has nothing real to measure your progress against, so when you ask how you're doing, it flatters you instead. "Great work!" is not a baseline, a target, and a delta. Honest feedback needs something to compare against.
PCL separates the two jobs that solve this:
- Context — a structured, portable record of what you're working toward, that travels with you so the conversation just starts.
- Honest measurement — scoring that compares where you are against the baselines and goals you set, and reports how sure it is instead of faking certainty.
This repo ships the first job (extraction into structured context) as a working tool, and specifies the second (honest scoring) as the next build.
PCL is two things in one repository:
- A framework — a typed schema for the things you're working toward (called axes), an LLM extraction prompt that turns plain-language goals into that schema, an evaluation of how well it does so, a working MCP read tool that serves that context to any AI, and a design spec for the scoring layer.
- A web tool — a no-account interface where you paste your goals the way you'd actually say them, and Claude returns them as structured, confidence-aware axes you confirm before anything is saved.
The framework is the moat; the web tool is one surface of it — the extraction layer, the first slice that went live. What's agentic about PCL, and what's still in progress, are covered in the sections that follow.
Built for AI 502 (Generative AI) at Grand Valley State University, summer 2026. The architecture and evaluation methodology are designed to extend beyond the course timeline.
PCL is built to be the grounding layer for an agentic system. The target is an MCP server that exposes a person's declared context as tools any AI agent can call — so the agent pulls what it needs when a task needs it, and acts on real stated goals instead of guessing from behavior. Pull, not push: nothing auto-acts or nags; the agent fetches context on request.
What's live in this repo is the full grounding path — extraction turns free text into typed, confirmed
context, and an MCP server now exposes that context to outside agents through a single read tool,
describe_axes. An outside client (Claude Desktop) calls it and pulls the stored axes on demand, no
pasting. The server serves declared context only: the scoring engine is still in progress, so an agent
can read your context but not yet a score of it. That measurement layer is the next build.
The diagram shows what's live (extraction, and the MCP read surface that serves the stored context) alongside the scoring layer still in progress.
┌──────────────────────┐ ┌───────────────────────────┐
│ Free text │ → │ Extraction layer (LIVE) │
│ your goals/habits, │ │ Claude + │
│ in your own words │ │ extraction_prompt.txt │
└──────────────────────┘ └───────────────────────────┘
│
▼
┌───────────────────────────────┐
│ Structured axes │
│ typed · baseline · target · │
│ cadence · confidence flag │
│ tagged as a draft you confirm │
└───────────────────────────────┘
│
┌──────────────────┴───────────────────┐
▼ ▼
┌───────────────────────────┐ ┌───────────────────────────┐
│ Valuation engine │ │ MCP context layer │
│ honest scoring against │ │ any AI pulls your context │
│ your own baselines │ │ via describe_axes │
│ (IN PROGRESS) │ │ (LIVE — read tool) │
└───────────────────────────┘ └───────────────────────────┘
Single source of truth. You state your goals once, in plain language. The axis schema is the contract every later layer reads from — the scoring engine scores axes, the MCP layer serves axes — so the thing you type is the thing that travels.
No invented precision. The extraction never makes up detail it wasn't given. Thin data and vague goals come back flagged, not faked, and every axis is a draft you confirm before it counts.
- Open the live tool: the Hugging Face Space
- Paste your goals and habits into the box, however you'd actually say them (or click one of the examples)
- Click Structure my context
- Read the structured axes that come back — typed, with baselines/targets/cadences where you stated them, and a confidence flag where the model is unsure
No account, no login — the hosted demo stores nothing.
To run it locally:
git clone https://github.com/artiebowman/personasync-pcl.git
cd personasync-pcl
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-... # your Anthropic API key
python app.pyRun locally and the Save button is live (hidden on the hosted demo, which has nowhere to write): confirm the extracted axes and they're written to a local SQLite store at local/pcl.db.
The agentic step — an outside AI calling your context. The server needs Python 3.12+ (the mcp SDK requires ≥3.10), so it runs in its own environment, separate from the Gradio app:
uv venv .venv-mcp --python 3.12
uv pip install --python .venv-mcp -r requirements-mcp.txtRegister it with Claude Desktop — edit ~/Library/Application Support/Claude/claude_desktop_config.json and add a pcl entry under mcpServers, using absolute paths to .venv-mcp/bin/python, mcp_server.py, and PCL_DB_PATH=local/pcl.db. Fully quit Claude Desktop (Cmd+Q) and reopen, then ask: "Use the pcl tool to describe my axes." It calls describe_axes, pulls your saved axes, and answers from them — one read tool, declared context only; scoring's still in progress (see the Build log).
Image: The pcl server registered as a connector in Claude Desktop.
personasync-pcl/
├── app.py ← the live extraction demo (Gradio) + local Save
├── extract_core.py ← shared extraction call (app + eval)
├── save_core.py ← shared save transforms (CLI + UI)
├── save_axes.py ← CLI: confirm + save extracted axes locally
├── run_eval.py ← runs the extraction eval for real
├── mcp_server.py ← MCP server: the describe_axes read tool (stdio)
├── schema.sql ← SQLite schema for the local context store
├── db.py ← storage layer over the local store
├── extraction_prompt.txt ← the system prompt that drives extraction
├── sample-extraction.json ← example extraction output (save-path fixture)
├── requirements.txt ← pinned dependencies (Gradio 4.44.1 + Anthropic SDK)
├── requirements-mcp.txt ← MCP server deps (Python 3.12+)
├── test_*.py ← storage + MCP stdio round-trip tests
├── prd.md ← product requirements + honesty constraints
├── domain-primer.md ← the axis model and why it's shaped this way
├── architecture.md ← extraction, scoring, and MCP-layer design
├── evaluation.md ← eval protocol + extraction test cases
├── extraction-eval.md ← live extraction-layer eval (qualitative)
├── future-work.md ← scoped roadmap beyond the course
├── feedback-log.md ← running log of design decisions + feedback
├── claude.md ← project guide for AI collaborators
├── source/ ← early ideation notes
│ ├── personasync-idea.md
│ └── project-arc-outline.md
├── LICENSE ← MIT
└── README.md ← this file
A short narrative of the design decisions behind this version.
PCL's full arc is extraction → honest scoring → a context layer any AI can query. The scoring engine is the conceptual core, but it's also the largest build, and a half-working scorer demonstrates nothing. The extraction layer, by contrast, is both genuinely useful on its own and the input contract everything downstream depends on — so it was chosen as the slice to ship and evaluate first. Getting the schema and the honesty behavior right here de-risks every later layer.
The formative draft was one prompt, one response — structured extraction, nothing more. The final's job was to make that context callable: a tool an AI actually invokes, not a roadmap promise. So this stage built the layer the whole framework points at.
Three pieces. A local SQLite store holds confirmed axes (the §1.2 schema). A confirm-before-save step — a CLI and a Save button in the local app — turns extraction's draft into stored, user-authored context, filling required blanks and resolving anything left undetermined rather than guessing. And an MCP server exposes that store over stdio through a single read tool, describe_axes, behind an omit gate applied before anything leaves.
The pattern is pull, not push: an outside client (Claude Desktop) calls describe_axes and pulls the declared context on demand — the agent lives in the consumer, not in this app. A cross-client query from Claude Desktop, returning the stored axes with no pasting, is the working demonstration.
Image: Claude Desktop calls describe_axes and answers from the saved axes — no pasting.
Honest scope holds here too: the server exposes one read tool, declared context only. The scoring engine is still in progress, so it pulls your context — it doesn't yet measure it. get_scores and get_profile are deliberately not exposed, since neither can be served honestly without the scorer. Privacy stays a design stance: the store is local and gitignored, the omit gate is structurally present, but user-facing privacy controls aren't fully wired.
The extraction prompt was iterated against a small set of hand-written cases covering everyday habits, deliberately thin data, and multi-year plans. The v1 prompt structured clear goals well but two behaviors needed correcting:
- It occasionally manufactured a baseline or target the user never stated, rather than leaving it unknown. v2 made the "flag, don't fake" rule explicit and tied confidence to whether the value was stated, inferred, or missing.
- It mislabeled some maintenance goals as improvement goals — e.g. "keep my coffee to 1–2 cups" read as a reduction target rather than a band to hold. This is logged as a known limitation, not yet fully fixed: mode-from-intent on band-style goals is the first item in the next build.
The hardest constraint in this project was resisting the urge to look more finished than it is. Three rules held throughout:
- The confidence shown in the demo is illustrative — for testing and demonstration, not the output of a real scoring engine.
- The scoring engine is in progress, not built. The repo says so everywhere, including in the live tool's roadmap.
- Privacy is a design stance (local-first, omit-by-design), not yet enforced by code — so it's described as intent, never claimed as a shipped guarantee.
Getting the tool live on Hugging Face Spaces surfaced three real, instructive breakages, each fixed by pinning the environment back to what the app was verified against:
audioopremoved in Python 3.13. The Space defaulted to 3.13; Gradio'spydubdependency imports theaudioopstdlib module, which 3.13 dropped. Fixed by pinningpython_version: "3.11".- Starlette/FastAPI too new for Gradio 4.44.1. Newer Starlette changed the template-response signature, so Gradio passed arguments in the wrong order and every page load threw
unhashable type: 'dict'. Fixed by pinningfastapi==0.112.2andstarlette==0.38.6. - An invisible character in the API key. A
U+2028line-separator hitched a ride when the key was pasted into the Space secret, breaking the ASCII-only HTTP header (UnicodeEncodeError). Fixed by stripping the key on read, and by surfacing real error messages in the UI instead of a blank "Error".
Review of the draft was direct: deployment and documentation held up, but nothing agentic ran — one prompt and one response, with the MCP server and scoring engine still on paper. The response wasn't a token tool bolted onto the app; it was to build the MCP server itself — the load-bearing piece the rest of the framework, and the product it's headed toward, depend on.
The eval moved the same way — from described to run. run_eval.py now makes a real extraction call per documented case and checks the behaviors structurally. Running it for real earned its keep immediately: the thin-data money goal turned out to be handled two honest ways across runs — left out entirely, or included with a null, flagged target — but never with an invented number. The eval was rewritten to test the behavior the project actually promises (no fabricated figure) rather than one brittle outcome, and extraction-eval.md now owns both variants.
Documented as future work, not cut from scope:
- The valuation engine — the statistically honest scorer the whole framework points at.
- The evaluation harness — the planted-signals scorer-validation harness specified in
evaluation.md, plus the synthetic-data generator it runs against (specified, not yet built). - All axis types + maintenance mode — including the mode-from-intent fix for band-style goals.
- The rest of the context layer —
describe_axesships now (see Build log); further tools, and anything that serves a score, wait on the valuation engine. - Light templates — optional pre-filled starting points to reduce the blank-page cost.
Two layers, each evaluated at the stage it's built.
Live extraction layer — measured. "Good" is defined and tested in extraction-eval.md: a
four-criterion qualitative review (correct typing, stated-only values, flag-don't-fake, correct mode)
run against the tool's built-in example inputs and hand-scored pass/partial/fail. It records what
works and the one known weakness — maintenance-vs-improvement mode on band-style goals.
Scoring engine — designed, not yet run. evaluation.md specifies the planted-signals harness
that will validate the in-progress scorer: synthetic data with known properties paired to pre-asserted
outputs, across a per-axis-type coverage matrix. The harness and its data generator are build work
ahead; evaluation.md is the contract they'll be built against.
PCL has a sibling, PersonaSync (repo), built for the same course.
- PersonaSync personalizes how the AI talks to you — voice, style, anti-patterns.
- PCL personalizes what the AI knows about you — goals, baselines, progress.
They're designed to compose: PersonaSync's prompt-assembly contract reserves an optional context block, and PCL's MCP layer is what would fill it at runtime. One handles voice, one handles context, under one roof.
Looking ahead — Project 3. PersonaSync (Project 1) and PCL (Project 2) are two halves of one idea. Project 3 ships them as a single product: a web app plus Chrome extension that carries both your voice and your context into whatever AI you're using — no copy-paste, no re-explaining. P1 gave your AI a voice; P2 gives it context; P3 is both, everywhere you work.
MIT License. See LICENSE for full text.
Built by Artie Bowman for AI 502 (Generative AI) at Grand Valley State University, summer 2026. Instructor: Zach DeBruine.
Forks and issues welcome — the framework is built to be extended.
- Live now: Free text → structured, confidence-aware context · an MCP read tool (
describe_axes) any AI can call, demoed cross-client from Claude Desktop - In progress: The valuation engine — honest scoring against your own baselines
- Future state: The rest of the context layer (more tools, score-serving) · light templates · joining PersonaSync's voice layer
- Research: testing whether declared, structured goals beat inferred ones as a context signal — explored as a separate track (see
future-work.md).

