dx_chat_entropy is a Python-first research repository for estimating, auditing, and
comparing likelihood ratios (LRs) in clinical diagnostic-reasoning tasks. It supports
transcript-to-assessment workflows, pairwise differential LR estimation, and one-vs-rest LR
estimation with a Bayes-coherent projection step.
The manuscript associated with this work is under journal review. It is not accepted or published, and there is no public article DOI, PMID, PMCID, volume, issue, or page/article number to cite yet. Until a public scholarly record exists, cite the repository software and the commit or release used.
Prerequisites:
- Python 3.11+
uvOPENAI_API_KEYin your environment or.envfor model-backed runs
Initial setup:
make uv-sync-notebooks
make notebook-kernel
make fmt
make lint
make test
make auditIf you use VS Code notebooks, select the kernel Python (dx-chat-entropy).
Most notebook import problems in this repo are kernel-selection problems rather than missing packages.
This public repository contains research software, public workflow inputs, and generated aggregate/model-output artifacts needed to inspect the current analysis workflow. It does not contain an accepted manuscript, a public preprint, publisher text, private reviewer materials, API keys, or local machine state.
The internal preprint/manuscript draft is intentionally not mirrored here. When a public
preprint, accepted manuscript, or final article appears, update README.md, llms.txt,
CITATION.cff, and GitHub metadata in the same pull request.
Maintainer and corresponding repository contact:
- Brian W. Locke, MD; ORCID: 0000-0002-3588-5238; GitHub: @reblocke
Author order, affiliations, funding, acknowledgments, and conflicts of interest should be taken from the eventual public manuscript record when it exists. Do not infer or publish unverified publication metadata from private drafts.
config/: scenario registries and workflow configurationdata/raw/: active public source spreadsheets, transcript PDFs, and assessment templatesdata/processed/: generated manifests, intermediate workbooks, and model outputs retained for review of the current workflownotebooks/: interactive entry points, QA notebooks, and older analysis notebooksscripts/: canonical batch runtimes, audit tools, and packaging toolssrc/dx_chat_entropy/: shared parsing, runtime, audit, and bundle logicdocs/: specifications, design decisions, pipeline notes, and data-management policyarchive/: historical source code, run notes, and legacy provenance material that is not part of the active workflow unless explicitly named by a script
In general, the scripts are the canonical batch entry points. The notebooks are for interactive execution, QA, or older workflows that are still kept for reference.
Committed active inputs are clinical-reasoning scenarios, LR matrices, assessment templates, and transcript artifacts used by the current workflow. Do not add private manuscript drafts, private reviewer materials, personal correspondence, API outputs containing secrets, local machine state, or third-party/publisher PDFs.
For machine-readable variable and artifact documentation, see:
data_dictionary.mddata_dictionary.csvdocs/SPECIFICATION.mddocs/DATA_MANAGEMENT.md
If a raw source has a defect, do not edit it in place. Preserve the raw file and correct the issue in code or in a generated output layer.
If you are not sure which path you need:
- use the assessment workflow to turn transcripts into assessment workbooks,
- use the differential workflow when each finding should compare two diagnoses at a time,
- use the one-vs-rest workflow when you want a full diagnosis-by-finding LR table plus a coherent version of that table.
Use this when you want to extract findings from transcript PDFs and populate the assessment workbook.
Run order:
notebooks/10_assessment_extract_features.ipynbnotebooks/11_assessment_estimate_lrs.ipynb
Inputs:
data/raw/chatbot_transcripts/*.pdfdata/raw/assessment_templates/asssessment_template_new.xlsx
Outputs:
data/processed/assessments/answers_*.xlsxdata/processed/assessments/completed_lrs.xlsx
Use this when each scenario should be broken into all diagnosis-pair comparisons and each finding should receive a differential LR for that pair.
Run order:
notebooks/20_differential_build_inputs.ipynbscripts/run_differential_batch.pyornotebooks/21_differential_estimate_lrs.ipynbscripts/audit_differential_outputs.py
Example run:
DX_MODEL_ID=gpt-5.3-chat-latest \
DX_RESUME_MODE=skip_passing \
uv run --group notebooks python scripts/run_differential_batch.py
uv run --group notebooks python scripts/audit_differential_outputs.py \
--manifest data/processed/lr_differential/manifests/pairs_manifest.csv \
--outputs-root data/processed/lr_differential/outputs_by_model \
--summary-out data/processed/lr_differential/manifests/quality_summary.csv \
--invalid-out data/processed/lr_differential/manifests/invalid_rows.csvInputs:
- canonical raw LR matrices under
data/raw/lr_matrices/ - scenario registry in
config/lr_differential_scenarios.yaml
Outputs:
- pairwise input workbooks in
data/processed/lr_differential/inputs/ - model-scoped filled outputs in
data/processed/lr_differential/outputs_by_model/ - manifests, ledgers, logs, and audit CSVs in
data/processed/lr_differential/manifests/
Notes:
- The model ID in the example is illustrative; swap in the model you actually want to run.
21_differential_estimate_lrs.ipynbis the interactive wrapper around the same runtime logic used by the script.22_differential_prepare_inputs_qa.ipynbis for inspection and QA, not the canonical transformation step.
Use this when you want a full LR table for each diagnosis versus all others in a scenario, followed by a coherence step that converts independently estimated one-vs-rest LRs into a Bayes-coherent multiclass version.
Run order:
scripts/build_one_vs_rest_inputs.pyscripts/run_one_vs_rest_batch.pyscripts/project_one_vs_rest_coherent_lrs.pyscripts/audit_one_vs_rest_outputs.py
Example run:
uv run --group notebooks python scripts/build_one_vs_rest_inputs.py \
--config config/lr_differential_scenarios.yaml
uv run --group notebooks python scripts/run_one_vs_rest_batch.py \
--manifest data/processed/lr_one_vs_rest/manifests/inputs_manifest.csv \
--model-id gpt-5.3-chat-latest
uv run --group notebooks python scripts/project_one_vs_rest_coherent_lrs.py \
--model-id gpt-5.3-chat-latest
# audit raw outputs
uv run --group notebooks python scripts/audit_one_vs_rest_outputs.py \
--manifest data/processed/lr_one_vs_rest/manifests/inputs_manifest.csv \
--outputs-root data/processed/lr_one_vs_rest/outputs_by_model \
--summary-out data/processed/lr_one_vs_rest/manifests/quality_summary_gpt-5.3-chat-latest.csv \
--invalid-out data/processed/lr_one_vs_rest/manifests/invalid_cells_gpt-5.3-chat-latest.csv \
--model-id gpt-5.3-chat-latest
# audit coherent outputs
uv run --group notebooks python scripts/audit_one_vs_rest_outputs.py \
--manifest data/processed/lr_one_vs_rest/manifests/inputs_manifest.csv \
--coherence-mode \
--priors-manifest data/processed/lr_one_vs_rest/manifests/schema_priors.csv \
--raw-outputs-root data/processed/lr_one_vs_rest/outputs_by_model \
--outputs-root data/processed/lr_one_vs_rest/coherent_outputs_by_model \
--summary-out data/processed/lr_one_vs_rest/manifests/coherent_quality_summary_gpt-5.3-chat-latest.csv \
--invalid-out data/processed/lr_one_vs_rest/manifests/coherent_invalid_cells_gpt-5.3-chat-latest.csv \
--coherence-summary-out data/processed/lr_one_vs_rest/manifests/coherence_quality_summary_gpt-5.3-chat-latest.csv \
--coherence-invalid-out data/processed/lr_one_vs_rest/manifests/coherence_invalid_rows_gpt-5.3-chat-latest.csv \
--model-id gpt-5.3-chat-latestInputs:
- raw LR matrices in
data/raw/lr_matrices/ - scenario registry in
config/lr_differential_scenarios.yaml
Outputs:
- normalized one-vs-rest input workbooks in
data/processed/lr_one_vs_rest/inputs/ - raw model outputs in
data/processed/lr_one_vs_rest/outputs_by_model/ - coherent outputs in
data/processed/lr_one_vs_rest/coherent_outputs_by_model/ - manifests and quality summaries in
data/processed/lr_one_vs_rest/manifests/
Notes:
- The model ID in the example is illustrative; swap in the model you actually want to run.
- The coherence step is a separate local projection stage. It does not overwrite the raw one-vs-rest outputs.
notebooks/32_one_vs_rest_project_coherent_lrs.ipynbis a notebook wrapper for the coherence projection over existing raw outputs.notebooks/30_one_vs_rest_estimate_lrs.ipynbandnotebooks/31_one_vs_rest_compare_lr_estimates.ipynbare the older comparison workflow, not the canonical batch runtime.
For external review or handoff:
- Differential pipeline:
scripts/run_differential_and_package.sh - One-vs-rest pipeline:
scripts/package_one_vs_rest_review_bundle.py
These packages are meant to ship the relevant code, manifests, and outputs for a workflow. They are not full-repository snapshots and should not include private manuscript drafts, API keys, or local system state.
Core package dependencies are declared in pyproject.toml. Use the notebooks dependency
group for model-backed notebook and batch runs.
| Use | Command | Notes |
|---|---|---|
| Core development | uv sync |
Installs package, tests, and lint tooling. |
| Notebook/model workflows | make uv-sync-notebooks |
Installs the notebooks dependency group. |
| Notebook kernel | make notebook-kernel |
Registers Python (dx-chat-entropy). |
| Repository checks | make fmt && make lint && make test && make audit |
Run before PRs. |
Use the document that matches the question:
llms.txt: compact machine-readable index for LLMs and search systemsREADME.md: what this repo does, which workflow to choose, and how to run itdata_dictionary.mdanddata_dictionary.csv: source, derived, and output artifact dictionarydocs/SPECIFICATION.md: detailed pipeline contracts, artifact paths, manifests, and review-bundle scopedocs/PIPELINES.md: short index of current pipelines and notebook/script orderdocs/DATA_MANAGEMENT.md: active-vs-archive data placement and provenance rulesdocs/DECISIONS.md: non-obvious design and policy decisionsAGENTS.md: project-specific coding-agent instructions
Until a public paper, preprint, or conference record exists, cite the repository software:
Locke BW. dx_chat_entropy: Clinical reasoning entropy and likelihood-ratio workflows.
GitHub. https://github.com/reblocke/dx_chat_entropy. Commit or release used.
Machine-readable citation metadata are in CITATION.cff. Do not add a
preferred-citation for the manuscript until the public scholarly record exists.
Repository code is released under the MIT License. Third-party materials, clinical source documents, private drafts, publisher artifacts, and externally supplied data remain under their original terms and should not be copied into the public branch unless their public license and provenance are documented.
Usually the notebook is attached to the wrong interpreter.
Fix:
- Run
make uv-sync-notebooks - Run
make notebook-kernel - In VS Code, switch to
Python (dx-chat-entropy) - Restart the kernel
Treat that as a migration or legacy-path issue. Active workflows should read from
data/raw/ and write to data/processed/ unless the workflow is explicitly labeled archival.
Use GitHub issues or pull requests for repository-specific questions. For publication or data-access questions, contact Brian W. Locke through the contact route listed on his public GitHub profile or CV.