Skip to content

PASS evaluates clinical data across six dimensions to quantify an OMOP CDM's fitness for research and analytics

Notifications You must be signed in to change notification settings

Analyticsphere/PASS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PASS: Profile of Analytic Suitability Score

A data quality assessment tool for OMOP Common Data Model databases. The Profile of Analytic Suitability Score (PASS) evaluates clinical data across six dimensions to quantify its fitness for research and analytics.

Overview

PASS calculates standardized metrics (0-1 scale) that measure different aspects of OMOP CDM data quality:

  • Accessibility: Are clinical facts present and discoverable?
  • Provenance: How well are facts coded and traceable to source data?
  • Standards: Are OHDSI standard concepts being used?
  • Concept Diversity: Is there variety in the concepts represented?
  • Source Diversity: How many different data sources contribute?
  • Temporal: How is data distributed over time?

Each metric produces field-level, table-level, and overall scores with 95% confidence intervals. A composite PASS aggregates individual metrics into a single quality measure.

Metrics

Accessibility

Evaluates whether clinical facts exist in concept_id fields. Scores range from 1.0 (concept present) to 0.5 (source code only) to 0.05 (text only) to 0.0 (absent). Includes pseudo-fields for custom completeness checks (e.g., measurement results, note text).

Provenance

Measures coding quality and source traceability. Native vocabulary usage scores 1.0, mapped codes 0.95, mapped text 0.75, and untraceable concepts 0.0.

Standards

Binary assessment of OHDSI standard concept usage. Standard concepts score 1.0, non-standard 0.0.

Concept Diversity

Shannon entropy of concept distributions within each field. Normalized to [0,1] where 1.0 indicates perfect diversity and 0.0 indicates no variety.

Source Diversity

Counts unique type_concept_id values per table using exponential decay normalization (1 - exp(-n/k)). Asymptotically approaches 1.0 as source count increases.

Temporal

Combines three sub-scores: range (years of coverage), density (rows per patient per quarter), and consistency (temporal stability via coefficient of variation).

Usage

Basic Example

library(pass)

# Create database connection
conn <- create_pass_connection(
  project_id = "my-project",
  dataset = "omop_cdm",
  jdbc_driver_path = "~/bigquery_driver/"
)

# Load default configuration
config <- load_pass_config()

# Calculate all metrics
results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  metrics = "all",
  output_dir = "output/"
)

# Disconnect
disconnect_pass(conn)

Calculate Specific Metrics

# Run only accessibility and temporal
results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  metrics = c("accessibility", "temporal"),
  output_dir = "output/"
)

Custom Configuration

# Load custom configuration files
config <- load_pass_config(
  concept_fields_path = "path/to/custom_concept_fields.csv",
  type_fields_path = "path/to/custom_type_fields.csv",
  date_fields_path = "path/to/custom_date_fields.csv"
)

results <- calculate_pass(conn, schema, config)

Custom Composite Weights

# Adjust metric weights in composite score
results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  metrics = "all",
  composite_weights = list(
    accessibility = 1.5,
    provenance = 1.0,
    standards = 1.0,
    concept_diversity = 0.5,
    source_diversity = 1.0,
    temporal = 1.0
  )
)

Configuration

The package includes default configuration files that define which fields to evaluate. These can be customized by providing your own CSV files.

Default Configuration Files

Configuration files are located in inst/config/:

concept_fields_with_weights.csv Defines which concept_id fields to evaluate and their analytical importance weights (0-1 scale).

table,concept_id_field,source_concept_id_field,source_value_field,multiplier,rationale
condition_occurrence,condition_concept_id,condition_source_concept_id,condition_source_value,1.0,Primary diagnosis field

type_concept_id_fields.csv Specifies type_concept_id fields for source diversity analysis.

table,type_concept_id
condition_occurrence,condition_type_concept_id

date_fields.csv Defines primary date fields for temporal analysis.

table,date_field
condition_occurrence,condition_start_date

Customizing Field Weights

To adjust field importance in your analysis:

  1. Export default configuration:
default_config <- system.file("config", "concept_fields_with_weights.csv", package = "pass")
file.copy(default_config, "my_custom_config.csv")
  1. Edit my_custom_config.csv to adjust multipliers

  2. Load custom configuration:

config <- load_pass_config(concept_fields_path = "my_custom_config.csv")

Output

Results are written to the output/ directory as 5 consolidated CSV files:

Consolidated Output Files

pass_field_level.csv - Field-level scores for all metrics

  • One row per metric per field (e.g., accessibility + person.gender_concept_id)
  • Includes metric column to filter by specific metric
  • Standardized field_score column across all metrics
  • Metric-specific columns (e.g., score breakdowns) filled with NA where not applicable

pass_table_level.csv - Table-level aggregated scores for all metrics

  • One row per metric per table (e.g., provenance + measurement table)
  • Includes metric column to filter by specific metric
  • Standardized table_score column across all metrics

pass_overall.csv - Dataset-wide overall scores for all metrics

  • One row per metric with standardized columns:
    • overall_score - Main quality score (0-1 scale)
    • mean_score, median_score, sd_score, variance_score - Distribution statistics
    • ci_95_lower, ci_95_upper - 95% confidence intervals
    • total_entities - Count of entities evaluated (rows, fields, or tables depending on metric)
  • Metric-specific columns filled with NA where not applicable

pass_composite_overall.csv - Weighted composite PASS score

  • Single row with overall composite score and confidence interval

pass_composite_components.csv - Breakdown of individual metric contributions to composite

  • One row per metric showing weights and contributions

Package Structure

pass/
├── DESCRIPTION                # Package metadata
├── NAMESPACE                  # Exported functions
├── R/                         # R source code
│   ├── calculate_pass.R       # Main user function
│   ├── config.R               # Internal config helpers
│   ├── config_helpers.R       # Config loading (exported)
│   ├── connection_helpers.R   # Database connection (exported)
│   ├── composite_score.R      # Composite score calculation
│   ├── utils_gcs.R            # GCS utilities (exported)
│   ├── accessibility.R        # ACCESSIBILITY metric
│   ├── provenance.R           # PROVENANCE metric
│   ├── standards.R            # STANDARDS metric
│   ├── concept_diversity.R    # CONCEPT DIVERSITY metric
│   ├── source_diversity.R     # SOURCE DIVERSITY metric
│   └── temporal.R             # TEMPORAL metric
├── inst/
│   ├── config/               # Default configuration files
│   │   ├── concept_fields_with_weights.csv
│   │   ├── type_concept_id_fields.csv
│   │   └── date_fields.csv
│   └── examples/             # Example usage scripts
│       └── calculate_pass_example.R
├── man/                      # Function documentation (auto-generated)
├── vignettes/                # Package vignettes
│   └── scoring_methodology.Rmd
└── README.md

Programmatic Access

Access results programmatically without saving to files:

results <- calculate_pass(
  conn = conn,
  schema = "my-project.omop_cdm",
  config = config,
  output_dir = NULL  # Don't save CSV files
)

# Access overall scores (all metrics use standardized column names)
accessibility_score <- results$accessibility$overall$overall_score
temporal_score <- results$temporal$overall$overall_score
composite_score <- results$composite$overall$composite_score

# Access field-level details
field_scores <- results$accessibility$field_level

# Filter by metric in consolidated results
provenance_fields <- results$provenance$field_level

About

PASS evaluates clinical data across six dimensions to quantify an OMOP CDM's fitness for research and analytics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages