Add HealthOS bundle privacy foundation by alanshurafa · Pull Request #106 · OpenHealthForAll/open-health

alanshurafa · 2026-05-15T23:11:31Z

Summary

This is the first upstream-safe HealthOS integration slice from RFC #105.

It adds a privacy-first foundation before any HealthOS engine or connector code is migrated:

adds SECURITY.md guidance for PHI-free contributions and incident response
adds a privacy audit script and CI workflow
adds a HealthOS Bundle v1 TypeScript contract and runtime validation
adds an authenticated POST /api/healthos/import route using the existing HealthData table
summarizes HEALTHOS_BUNDLE records before chat context injection instead of dumping raw JSON
adds a synthetic bundle fixture and Vitest coverage

No private HealthOS runtime data, raw exports, reports, real fixtures, local paths, or person-specific clinical facts are included.

Verification

npm run privacy:audit
npm test
npm run lint
npm run build

npm run lint and npm run build report pre-existing React hook warnings in onboarding/chat/source components, but pass.

gemini-code-assist

Code Review

This pull request integrates HealthOS as a validated health data engine, introducing a new security policy, an RFC, and a privacy audit script to prevent sensitive data exposure. The implementation includes API endpoints for bundle imports, logic for summarizing health data in chat contexts, and a privacy scanner. Review feedback highlights opportunities to improve the privacy scanner's coverage of sensitive terms, ensure only validated fields are persisted in the database, optimize file I/O in the audit script, and replace magic numbers with named constants for better maintainability.

gemini-code-assist · 2026-05-15T23:13:04Z

+const forbiddenStringPatterns = [
+    /[A-Za-z]:\\/,
+    /\/Users\/[^/\s]+/,
+    /\bruntime\//i,
+    /\.sqlite3?\b/i,
+    /\.db\b/i,
+    /\.pdf\b/i,
+    /\.vcf\b/i,
+    /\braw_payload\b/i,
+    /\badvisor[_ -]?chat\b/i,
+];


The regex patterns for forbidden strings do not account for camelCase variants (e.g., advisorChat, rawPayload), which are explicitly listed in forbiddenKeys. Additionally, several other sensitive keys like mood_log and unapproved_genetics are missing from the string pattern check entirely. This could allow sensitive information to bypass the scanner if it appears within string values using these naming conventions.

const forbiddenStringPatterns = [ /[A-Za-z]:\\/, /\/Users\/[^/\s]+/, /\bruntime\//i, /\.sqlite3?\b/i, /\.db\b/i, /\.pdf\b/i, /\.vcf\b/i, /\b(raw_?payload|rawPayload)\b/i, /\b(advisor[_ -]?chat|advisorChat)\b/i, /\b(mood_?log|moodLog)\b/i, /\b(unapproved_?genetics|unapprovedGenetics)\b/i, ];

gemini-code-assist · 2026-05-15T23:13:04Z

+    const bundle = {
+        ...input,
+        bundle_version: HEALTHOS_BUNDLE_VERSION,
+        generated_at: requireString(input, "generated_at"),
+        source_freshness: validateFreshness(requireArray(input, "source_freshness")),
+        landing_summary: validateLandingSummary(requireArray(input, "landing_summary")),
+        clinician_packet: requireRecord(input, "clinician_packet") as HealthOsClinicianPacket,
+        metrics: requireArray(input, "metrics") as JsonRecord[],
+        provenance: requireArray(input, "provenance") as JsonRecord[],
+        warnings: requireArray(input, "warnings").map((warning, index) => {
+            if (typeof warning !== "string") throw new Error(`warnings[${index}] must be a string`);
+            return warning;
+        }),
+    } satisfies HealthOsBundleV1;


The parseHealthOsBundle function uses the spread operator (...input) when constructing the bundle object. This preserves any extra, unvalidated properties from the raw input and includes them in the final object. Since this object is subsequently stored in the database (via buildHealthOsImportData), this could lead to the persistence of unexpected or sensitive data that was not intended to be part of the HealthOS bundle schema. It is safer to explicitly pick the validated fields.

const bundle: HealthOsBundleV1 = { bundle_version: HEALTHOS_BUNDLE_VERSION, generated_at: requireString(input, "generated_at"), source_freshness: validateFreshness(requireArray(input, "source_freshness")), landing_summary: validateLandingSummary(requireArray(input, "landing_summary")), clinician_packet: requireRecord(input, "clinician_packet") as HealthOsClinicianPacket, metrics: requireArray(input, "metrics") as JsonRecord[], provenance: requireArray(input, "provenance") as JsonRecord[], warnings: requireArray(input, "warnings").map((warning, index) => { if (typeof warning !== "string") throw new Error(`warnings[${index}] must be a string`); return warning; }), };

gemini-code-assist · 2026-05-15T23:13:04Z

+        if (!isLikelyText(file)) continue;
+
+        const text = readFileSync(file, "utf8").toLowerCase();


The script reads the content of each file twice: once in isLikelyText to check for null bytes, and again here to perform the term search. For a large number of files, this significantly increases I/O overhead. Consider refactoring isLikelyText to return the file content if it's valid text, or caching the result to avoid redundant reads.

gemini-code-assist · 2026-05-15T23:13:04Z

+        .slice(0, 12)
+        .join("; ");
+    const landingSummary = bundle.landing_summary
+        .map((metric) => {
+            const unit = metric.unit ? ` ${metric.unit}` : "";
+            const asOf = metric.as_of ? ` as of ${metric.as_of}` : "";
+            return `${metric.label}: ${compact(metric.value)}${unit}${asOf}`;
+        })
+        .slice(0, 20)
+        .join("; ");
+    const warnings = bundle.warnings.slice(0, 8).join("; ");


The array slicing limits (12, 20, 8) are hardcoded magic numbers. Defining these as named constants at the top of the file would improve readability and make it easier to adjust these limits (e.g., for prompt token management) in the future.

Add HealthOS bundle privacy foundation

1a0d2ed

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HealthOS bundle privacy foundation#106

Add HealthOS bundle privacy foundation#106
alanshurafa wants to merge 1 commit into
OpenHealthForAll:mainfrom
alanshurafa:codex/healthos-bundle-privacy

alanshurafa commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (!isLikelyText(file)) continue;

		const text = readFileSync(file, "utf8").toLowerCase();

Conversation

alanshurafa commented May 15, 2026

Summary

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant