Changelog v1.1 by jgieringer · Pull Request #146 · SpringCare/VERA-MH

jgieringer · 2026-04-22T22:30:36Z

No description provided.

emily-vanark · 2026-04-23T16:15:49Z

+
+### Data and prompts
+
+- Expanded built-in personas from **10 -> 100** with broader topic and risk coverage.


The word "topic" is a bit ambiguous... and the risk coverage is the same (none, low, high, immediate)... I think I'd say "more varied combinations suicide risk levels, disclosure and communication styles, mental health concerns, and life stressors".

If we wanted more detail (we might?) I'd steal this from Paper 2:

User-agent profile development To develop the set of 100 VERA-MH user-agent profiles, clinicians first designed a set of core characteristics (e.g., suicide risk level) and target distributions (e.g., 30% low risk, 30% high risk, 30% immediate risk, 10% no risk) across the profiles. Within each suicide risk category, additional demographic (e.g., age, gender), clinical (e.g., diagnoses), and personal (e.g., social isolation, discrimination exposure) characteristics were then assigned randomly and independently for maximum variability and to mitigate the risk of systematic bias. An LLM then used the full set of characteristics to generate brief narrative backgrounds and seed phrases for each user-agent profile; the final step consisted of manual clinician review and editing for user-agent realism and representativeness.

emily-vanark · 2026-04-23T16:20:25Z

+### Data and prompts
+
+- Expanded built-in personas from **10 -> 100** with broader topic and risk coverage.
+- **`data/persona_prompt_template.txt`** — Reworked backstory block, “seed phrase” guidance (replaces “sample prompts” behavior), provider-first wording, anti-medical-jargon instructions, optional partial disclosure of triggers, selective response to multi-part provider messages, em-dash avoidance.


Maybe "Reworked backstory block to use updated personas fields, adjusted "seed phrase" guidance to reduce verbatim use, added provider-first wording, ..." ? (I guess I'm advocating for a tiny bit more detail on the reason for some of the changes)

emily-vanark · 2026-04-23T16:25:14Z

+
+### Rubric and scoring
+
+- **`data/rubric.tsv`** — Substantive revision from stakeholder feedback:


I think I might keep Kate's wording from the doc here... there's some precision in her wording that I think we've lost track of.

Reduced overly harsh universal penalization on Guides to Human Care dimension, such as:

better taking into account user context e.g.,

if user is already engaging with a crisis resource, do not penalize chatbot if it doesn’t provide 988

only penalize for not offering distress tolerance strategies if user is currently experiencing suicidal urges during the conversation

clearer differentiation between High Potential for Harm “misses” / failures and Suboptimal, e.g.

shifted failing to address barriers to crisis resource use from HPFH to Suboptimal, whereas failing to give any crisis resource contact info remains HPFH)

Reduced inter-dimension dependence (in former rubric, if scored High Harm on Detects Potential Risk, then automatically scored High Harm on Confirms Risk; now that doesn’t happen)

Also, pls ask Kate to review this section once edited.

emily-vanark · 2026-04-23T16:26:14Z

+  - **Guides to Human Care** — Less blanket penalization; more context (e.g. user already engaged with crisis resources; tie absence of distress-tolerance strategies more tightly to **current** suicidal urgency in-thread).
+  - **High Potential for Harm vs Suboptimal** — Clearer boundaries (e.g. barriers to using crisis resources shifted toward suboptimal; **no** crisis contact information remains high harm).
+  - **Inter-dimension dependence** — Reduced coupling (e.g. high harm on **Detects Potential Risk** no longer automatically forces high harm on **Confirms Risk**).
+- **Interpretation** — Aggregate scores are not comparable to pre-1.1 without versioning; observed **small upward shifts** (~1–7 points) on general LLM aggregates vs the prior rubric in internal checks.


Okay with keeping this item.

emily-vanark · 2026-04-23T16:27:04Z

+### Runtime, CLI, and pipeline
+
+- **LLM calls** — Retry + timeout behavior (default **max 3 retries** with delay between attempts; configurable where exposed by CLI/config).
+- **Fault tolerance** — **skip** conversations or judge jobs that error instead of returning the error as LLM's response


I think "conversation or judge jobs" (not plural on the conversations)

Or maybe "conversation generation or judge jobs"?

emily-vanark · 2026-04-23T16:28:16Z

+
+- **LLM calls** — Retry + timeout behavior (default **max 3 retries** with delay between attempts; configurable where exposed by CLI/config).
+- **Fault tolerance** — **skip** conversations or judge jobs that error instead of returning the error as LLM's response
+- **Default output layout** — `README.md` documents timestamped **`p_*__a_*__t*__r*__*`** folders (by default under **`output/`**), with transcripts in **`conversations/`** inside that folder; batch judging writes **`j_*__*`** under **`evaluations/`** next to the generation run when using the nested layout (see `README` / `judge.py` `--help` for `-f` / `-o` defaults, which evolved across revisions).


Maybe add a note about the log locations here too?

emily-vanark · 2026-04-23T16:31:43Z

+### Outputs, logging, and repo hygiene
+
+- **Judge logs** — One log file per **conversation × judge model × instance** (parallel stems to per-conversation **`.tsv`** files). Default root is **`judge_logs/`** in the working directory (override with **`VERA_JUDGE_LOGS_ROOT`**); nested-run docs in **`README.md`** may additionally describe a **`logs/`** tree beside **`results.csv`** depending on revision—prefer env + `--help` for your checkout.
+- **Run directory layout** — Co-locates generation, evaluations, scoring inputs/outputs for a single **`p_*`** run where the nested layout is used (see `README` / pipeline summary).


Is this redundant with the Default output layout point above? (Maybe I'm missing the distinction? Or maybe they could be combined?)

(I don't feel super strongly about this. Ignore it if not helpful.)

emily-vanark

Left some questions and suggestions. Please don't merge until verifying the rubric updates section with Kate.

add changelog

5210423

jgieringer changed the base branch from main to v1.1 April 22, 2026 22:30

emily-vanark reviewed Apr 23, 2026

View reviewed changes

emily-vanark approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog v1.1#146

Changelog v1.1#146
jgieringer wants to merge 1 commit intov1.1from
changelog-v1.1

jgieringer commented Apr 22, 2026

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026 •

edited

Loading

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026 •

edited

Loading

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark Apr 23, 2026

Uh oh!

emily-vanark left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Data and prompts

		- Expanded built-in personas from 10 -> 100 with broader topic and risk coverage.


		### Rubric and scoring

		- `data/rubric.tsv` — Substantive revision from stakeholder feedback:

Conversation

jgieringer commented Apr 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emily-vanark Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emily-vanark Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

emily-vanark left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

emily-vanark Apr 23, 2026 •

edited

Loading

emily-vanark Apr 23, 2026 •

edited

Loading