Skip to content

[Feature]: Remove internal Oak Health Insurance data from GitHub history, replace with external dataset #53

@haroldship

Description

@haroldship

Feature Request

Remove the internal/proprietary Oak Health Insurance benchmark data from GitHub — including from git history — and replace it with an externally-sourced (public/shareable) dataset.

Motivation / Problem

The Oak Health Insurance benchmark (benchmarks/oak_health_insurance/) currently ships internal data (e.g. oak_data.json, oak_health_test_suite_v1.json, oak_policies.py and the oak-* policy definitions) that should not live in this repository or its history. This needs to be scrubbed from the git history (not just removed in a new commit) and replaced with an equivalent dataset sourced externally/publicly so the benchmark continues to function for all contributors.

Use Case

  • Contributors without access to the internal Oak data should still be able to clone the repo, run benchmarks/oak_health_insurance/eval.sh, and reproduce Oak benchmark results using the external dataset.
  • The repo can be shared/open-sourced without exposing internal/proprietary content, including in its commit history.

Proposed Solution

  • Identify all internal Oak data files and references (oak_data.json, oak_health_test_suite_v1.json, oak_policies.py, and any other internal oak-* content under benchmarks/oak_health_insurance/).
  • Source or construct an equivalent external/public dataset that exercises the same capabilities, and swap it in as the benchmark's data source (update loaders/config/registry as needed so eval.sh/compare.sh keep working).
  • Rewrite git history to remove the internal data (e.g. via git filter-repo or BFG Repo-Cleaner) — this is a destructive, history-rewriting operation that requires coordination (force-push, all clones/forks need to be re-fetched), so plan and communicate accordingly.

Alternatives Considered

N/A

Priority

High

Additional Context

Relevant files under benchmarks/oak_health_insurance/: oak_data.json, oak_health_test_suite_v1.json, oak_policies.py, eval_bench_sdk.py.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions