Skip to content

Add HealthOS bundle privacy foundation#106

Open
alanshurafa wants to merge 1 commit into
OpenHealthForAll:mainfrom
alanshurafa:codex/healthos-bundle-privacy
Open

Add HealthOS bundle privacy foundation#106
alanshurafa wants to merge 1 commit into
OpenHealthForAll:mainfrom
alanshurafa:codex/healthos-bundle-privacy

Conversation

@alanshurafa
Copy link
Copy Markdown

Summary

This is the first upstream-safe HealthOS integration slice from RFC #105.

It adds a privacy-first foundation before any HealthOS engine or connector code is migrated:

  • adds SECURITY.md guidance for PHI-free contributions and incident response
  • adds a privacy audit script and CI workflow
  • adds a HealthOS Bundle v1 TypeScript contract and runtime validation
  • adds an authenticated POST /api/healthos/import route using the existing HealthData table
  • summarizes HEALTHOS_BUNDLE records before chat context injection instead of dumping raw JSON
  • adds a synthetic bundle fixture and Vitest coverage

No private HealthOS runtime data, raw exports, reports, real fixtures, local paths, or person-specific clinical facts are included.

Verification

  • npm run privacy:audit
  • npm test
  • npm run lint
  • npm run build

npm run lint and npm run build report pre-existing React hook warnings in onboarding/chat/source components, but pass.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates HealthOS as a validated health data engine, introducing a new security policy, an RFC, and a privacy audit script to prevent sensitive data exposure. The implementation includes API endpoints for bundle imports, logic for summarizing health data in chat contexts, and a privacy scanner. Review feedback highlights opportunities to improve the privacy scanner's coverage of sensitive terms, ensure only validated fields are persisted in the database, optimize file I/O in the audit script, and replace magic numbers with named constants for better maintainability.

Comment on lines +6 to +16
const forbiddenStringPatterns = [
/[A-Za-z]:\\/,
/\/Users\/[^/\s]+/,
/\bruntime\//i,
/\.sqlite3?\b/i,
/\.db\b/i,
/\.pdf\b/i,
/\.vcf\b/i,
/\braw_payload\b/i,
/\badvisor[_ -]?chat\b/i,
];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The regex patterns for forbidden strings do not account for camelCase variants (e.g., advisorChat, rawPayload), which are explicitly listed in forbiddenKeys. Additionally, several other sensitive keys like mood_log and unapproved_genetics are missing from the string pattern check entirely. This could allow sensitive information to bypass the scanner if it appears within string values using these naming conventions.

const forbiddenStringPatterns = [
    /[A-Za-z]:\\/,
    /\/Users\/[^/\s]+/,
    /\bruntime\//i,
    /\.sqlite3?\b/i,
    /\.db\b/i,
    /\.pdf\b/i,
    /\.vcf\b/i,
    /\b(raw_?payload|rawPayload)\b/i,
    /\b(advisor[_ -]?chat|advisorChat)\b/i,
    /\b(mood_?log|moodLog)\b/i,
    /\b(unapproved_?genetics|unapprovedGenetics)\b/i,
];

Comment on lines +104 to +117
const bundle = {
...input,
bundle_version: HEALTHOS_BUNDLE_VERSION,
generated_at: requireString(input, "generated_at"),
source_freshness: validateFreshness(requireArray(input, "source_freshness")),
landing_summary: validateLandingSummary(requireArray(input, "landing_summary")),
clinician_packet: requireRecord(input, "clinician_packet") as HealthOsClinicianPacket,
metrics: requireArray(input, "metrics") as JsonRecord[],
provenance: requireArray(input, "provenance") as JsonRecord[],
warnings: requireArray(input, "warnings").map((warning, index) => {
if (typeof warning !== "string") throw new Error(`warnings[${index}] must be a string`);
return warning;
}),
} satisfies HealthOsBundleV1;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parseHealthOsBundle function uses the spread operator (...input) when constructing the bundle object. This preserves any extra, unvalidated properties from the raw input and includes them in the final object. Since this object is subsequently stored in the database (via buildHealthOsImportData), this could lead to the persistence of unexpected or sensitive data that was not intended to be part of the HealthOS bundle schema. It is safer to explicitly pick the validated fields.

    const bundle: HealthOsBundleV1 = {
        bundle_version: HEALTHOS_BUNDLE_VERSION,
        generated_at: requireString(input, "generated_at"),
        source_freshness: validateFreshness(requireArray(input, "source_freshness")),
        landing_summary: validateLandingSummary(requireArray(input, "landing_summary")),
        clinician_packet: requireRecord(input, "clinician_packet") as HealthOsClinicianPacket,
        metrics: requireArray(input, "metrics") as JsonRecord[],
        provenance: requireArray(input, "provenance") as JsonRecord[],
        warnings: requireArray(input, "warnings").map((warning, index) => {
            if (typeof warning !== "string") throw new Error(`warnings[${index}] must be a string`);
            return warning;
        }),
    };

Comment thread scripts/privacy-audit.mjs
Comment on lines +85 to +87
if (!isLikelyText(file)) continue;

const text = readFileSync(file, "utf8").toLowerCase();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The script reads the content of each file twice: once in isLikelyText to check for null bytes, and again here to perform the term search. For a large number of files, this significantly increases I/O overhead. Consider refactoring isLikelyText to return the file content if it's valid text, or caching the result to avoid redundant reads.

Comment on lines +17 to +27
.slice(0, 12)
.join("; ");
const landingSummary = bundle.landing_summary
.map((metric) => {
const unit = metric.unit ? ` ${metric.unit}` : "";
const asOf = metric.as_of ? ` as of ${metric.as_of}` : "";
return `${metric.label}: ${compact(metric.value)}${unit}${asOf}`;
})
.slice(0, 20)
.join("; ");
const warnings = bundle.warnings.slice(0, 8).join("; ");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The array slicing limits (12, 20, 8) are hardcoded magic numbers. Defining these as named constants at the top of the file would improve readability and make it easier to adjust these limits (e.g., for prompt token management) in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant