Skip to content

Add portfolio demo: README, Streamlit app, synthetic data, CI#1

Merged
edgarbc merged 4 commits into
mainfrom
copilot/portfoliodocs-patient-analyzer
Dec 2, 2025
Merged

Add portfolio demo: README, Streamlit app, synthetic data, CI#1
edgarbc merged 4 commits into
mainfrom
copilot/portfoliodocs-patient-analyzer

Conversation

Copilot AI commented Nov 29, 2025

Copy link
Copy Markdown
Contributor

Transforms patient_analyzer into a reproducible portfolio project with an interactive demo, synthetic data, and CI smoke tests. Uses lightweight TF-IDF embeddings to avoid large model downloads in CI.

Changes

README.md

  • TL;DR, problem statement, quickstart (venv + pip), demo instructions
  • Privacy statement emphasizing 100% synthetic data
  • Screenshot of running demo

demo/

  • sample_patients.csv: 10 synthetic patients with age, gender, diagnosis_code, text_note
  • app.py: Streamlit app with TF-IDF embeddings, nearest neighbor search, 2D PCA visualization
    • Comments indicate where to plug sentence-transformers/Bio_ClinicalBERT for production

tests/smoke_test.py

  • 6 tests covering data loading, embedding computation, neighbor search, PCA reduction, full pipeline

.github/workflows/ci.yml

  • Installs minimal deps, runs pytest smoke tests
  • Explicit permissions block for security

requirements.txt

  • Minimal deps: numpy, pandas, scikit-learn, streamlit, matplotlib
  • Heavy deps (torch, transformers) commented as optional

Screenshot

Demo Screenshot

Usage

pip install -r requirements.txt
streamlit run demo/app.py

Do not merge without owner approval.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • checkip.amazonaws.com
    • Triggering command: /home/REDACTED/.local/bin/streamlit streamlit run demo/app.py --server.headless true --server.port 8501 (dns block)
    • Triggering command: /home/REDACTED/.local/bin/streamlit streamlit run demo/app.py --server.headless true --server.port 8502 (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

Goal: Improve patient_analyzer to be a showcase portfolio project by adding a clear README, a small interactive demo (Streamlit app) that visualizes patient embeddings and nearest neighbors using synthetic data, a sample dataset, and a CI smoke test that runs the demo script headlessly. Create branch portfolio/docs-patient-analyzer and open PR to main. Do not merge.

Required changes:

  1. README.md - TL;DR, problem statement (comparing patients in embedding space), quickstart (venv + pip install -r requirements.txt), how to run the Streamlit demo locally, privacy & data-use statement, expected outputs screenshot placeholder, license.
  2. requirements.txt - minimal deps: "numpy", "pandas", "scikit-learn", "streamlit", "sentence-transformers" (or omit heavy transformer and instead include a synthetic embedding generator). To keep CI light, the smoke test will not download large models.
  3. demo/sample_patients.csv - small synthetic CSV with 10 patients and a few features (age, gender, diagnosis_code, text_note). Add a script to generate synthetic embeddings for them.
  4. demo/app.py - Streamlit app that reads sample_patients.csv, computes simple TF-IDF or random embeddings (to avoid large model download in CI), shows a selected patient and top-3 nearest patients with a simple visualization (table and 2D scatter via PCA). Include comments on where to plug real embeddings (sentence-transformers) in production.
  5. tests/smoke_test.py - script that runs python -m demo.app (or uses streamlit to run in headless mode) or simply imports app functions and runs a small function to compute neighbors; this should be invoked by CI to ensure demos run.
  6. .github/workflows/ci.yml - runs tests/smoke_test.py; installs requirements and executes the test.

PR description: Add README, Streamlit demo with synthetic data, and CI smoke test to make patient_analyzer reproducible and demonstrable for portfolio. Emphasize synthetic data & privacy. Do not merge without owner approval.

Notes: Keep CI lightweight and avoid downloading large transformer models. Where heavy models are helpful, add instructions in README for optional steps ('extras' extras_require)."

This pull request was created as a result of the following prompt from Copilot chat.

Goal: Improve patient_analyzer to be a showcase portfolio project by adding a clear README, a small interactive demo (Streamlit app) that visualizes patient embeddings and nearest neighbors using synthetic data, a sample dataset, and a CI smoke test that runs the demo script headlessly. Create branch portfolio/docs-patient-analyzer and open PR to main. Do not merge.

Required changes:

  1. README.md - TL;DR, problem statement (comparing patients in embedding space), quickstart (venv + pip install -r requirements.txt), how to run the Streamlit demo locally, privacy & data-use statement, expected outputs screenshot placeholder, license.
  2. requirements.txt - minimal deps: "numpy", "pandas", "scikit-learn", "streamlit", "sentence-transformers" (or omit heavy transformer and instead include a synthetic embedding generator). To keep CI light, the smoke test will not download large models.
  3. demo/sample_patients.csv - small synthetic CSV with 10 patients and a few features (age, gender, diagnosis_code, text_note). Add a script to generate synthetic embeddings for them.
  4. demo/app.py - Streamlit app that reads sample_patients.csv, computes simple TF-IDF or random embeddings (to avoid large model download in CI), shows a selected patient and top-3 nearest patients with a simple visualization (table and 2D scatter via PCA). Include comments on where to plug real embeddings (sentence-transformers) in production.
  5. tests/smoke_test.py - script that runs python -m demo.app (or uses streamlit to run in headless mode) or simply imports app functions and runs a small function to compute neighbors; this should be invoked by CI to ensure demos run.
  6. .github/workflows/ci.yml - runs tests/smoke_test.py; installs requirements and executes the test.

PR description: Add README, Streamlit demo with synthetic data, and CI smoke test to make patient_analyzer reproducible and demonstrable for portfolio. Emphasize synthetic data & privacy. Do not merge without owner approval.

Notes: Keep CI lightweight and avoid downloading large transformer models. Where heavy models are helpful, add instructions in README for optional steps ('extras' extras_require)."


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits November 29, 2025 17:29
Co-authored-by: edgarbc <4164895+edgarbc@users.noreply.github.com>
Co-authored-by: edgarbc <4164895+edgarbc@users.noreply.github.com>
Co-authored-by: edgarbc <4164895+edgarbc@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve patient_analyzer with README and demo features Add portfolio demo: README, Streamlit app, synthetic data, CI Nov 29, 2025
Copilot AI requested a review from edgarbc November 29, 2025 17:35

@edgarbc edgarbc left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@edgarbc edgarbc marked this pull request as ready for review December 2, 2025 14:08
@edgarbc edgarbc merged commit 6543a65 into main Dec 2, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants