A data quality assessment tool for OMOP Common Data Model databases. The Profile of Analytic Suitability Score (PASS) evaluates clinical data across six dimensions to quantify its fitness for research and analytics.
PASS calculates standardized metrics (0-1 scale) that measure different aspects of OMOP CDM data quality:
- Accessibility: Are clinical facts present and discoverable?
- Provenance: How well are facts coded and traceable to source data?
- Standards: Are OHDSI standard concepts being used?
- Concept Diversity: Is there variety in the concepts represented?
- Source Diversity: How many different data sources contribute?
- Temporal: How is data distributed over time?
Each metric produces field-level, table-level, and overall scores with 95% confidence intervals. A composite PASS aggregates individual metrics into a single quality measure.
Evaluates whether clinical facts exist in concept_id fields. Scores range from 1.0 (concept present) to 0.5 (source code only) to 0.05 (text only) to 0.0 (absent). Includes pseudo-fields for custom completeness checks (e.g., measurement results, note text).
Measures coding quality and source traceability. Native vocabulary usage scores 1.0, mapped codes 0.95, mapped text 0.75, and untraceable concepts 0.0.
Binary assessment of OHDSI standard concept usage. Standard concepts score 1.0, non-standard 0.0.
Shannon entropy of concept distributions within each field. Normalized to [0,1] where 1.0 indicates perfect diversity and 0.0 indicates no variety.
Counts unique type_concept_id values per table using exponential decay normalization (1 - exp(-n/k)). Asymptotically approaches 1.0 as source count increases.
Combines three sub-scores: range (years of coverage), density (rows per patient per quarter), and consistency (temporal stability via coefficient of variation).
library(pass)
# Create database connection
conn <- create_pass_connection(
project_id = "my-project",
dataset = "omop_cdm",
jdbc_driver_path = "~/bigquery_driver/"
)
# Load default configuration
config <- load_pass_config()
# Calculate all metrics
results <- calculate_pass(
conn = conn,
schema = "my-project.omop_cdm",
config = config,
metrics = "all",
output_dir = "output/"
)
# Disconnect
disconnect_pass(conn)# Run only accessibility and temporal
results <- calculate_pass(
conn = conn,
schema = "my-project.omop_cdm",
config = config,
metrics = c("accessibility", "temporal"),
output_dir = "output/"
)# Load custom configuration files
config <- load_pass_config(
concept_fields_path = "path/to/custom_concept_fields.csv",
type_fields_path = "path/to/custom_type_fields.csv",
date_fields_path = "path/to/custom_date_fields.csv"
)
results <- calculate_pass(conn, schema, config)# Adjust metric weights in composite score
results <- calculate_pass(
conn = conn,
schema = "my-project.omop_cdm",
config = config,
metrics = "all",
composite_weights = list(
accessibility = 1.5,
provenance = 1.0,
standards = 1.0,
concept_diversity = 0.5,
source_diversity = 1.0,
temporal = 1.0
)
)The package includes default configuration files that define which fields to evaluate. These can be customized by providing your own CSV files.
Configuration files are located in inst/config/:
concept_fields_with_weights.csv
Defines which concept_id fields to evaluate and their analytical importance weights (0-1 scale).
table,concept_id_field,source_concept_id_field,source_value_field,multiplier,rationale
condition_occurrence,condition_concept_id,condition_source_concept_id,condition_source_value,1.0,Primary diagnosis fieldtype_concept_id_fields.csv
Specifies type_concept_id fields for source diversity analysis.
table,type_concept_id
condition_occurrence,condition_type_concept_iddate_fields.csv
Defines primary date fields for temporal analysis.
table,date_field
condition_occurrence,condition_start_dateTo adjust field importance in your analysis:
- Export default configuration:
default_config <- system.file("config", "concept_fields_with_weights.csv", package = "pass")
file.copy(default_config, "my_custom_config.csv")-
Edit
my_custom_config.csvto adjust multipliers -
Load custom configuration:
config <- load_pass_config(concept_fields_path = "my_custom_config.csv")Results are written to the output/ directory as 5 consolidated CSV files:
pass_field_level.csv - Field-level scores for all metrics
- One row per metric per field (e.g., accessibility + person.gender_concept_id)
- Includes
metriccolumn to filter by specific metric - Standardized
field_scorecolumn across all metrics - Metric-specific columns (e.g., score breakdowns) filled with NA where not applicable
pass_table_level.csv - Table-level aggregated scores for all metrics
- One row per metric per table (e.g., provenance + measurement table)
- Includes
metriccolumn to filter by specific metric - Standardized
table_scorecolumn across all metrics
pass_overall.csv - Dataset-wide overall scores for all metrics
- One row per metric with standardized columns:
overall_score- Main quality score (0-1 scale)mean_score,median_score,sd_score,variance_score- Distribution statisticsci_95_lower,ci_95_upper- 95% confidence intervalstotal_entities- Count of entities evaluated (rows, fields, or tables depending on metric)
- Metric-specific columns filled with NA where not applicable
pass_composite_overall.csv - Weighted composite PASS score
- Single row with overall composite score and confidence interval
pass_composite_components.csv - Breakdown of individual metric contributions to composite
- One row per metric showing weights and contributions
pass/
├── DESCRIPTION # Package metadata
├── NAMESPACE # Exported functions
├── R/ # R source code
│ ├── calculate_pass.R # Main user function
│ ├── config.R # Internal config helpers
│ ├── config_helpers.R # Config loading (exported)
│ ├── connection_helpers.R # Database connection (exported)
│ ├── composite_score.R # Composite score calculation
│ ├── utils_gcs.R # GCS utilities (exported)
│ ├── accessibility.R # ACCESSIBILITY metric
│ ├── provenance.R # PROVENANCE metric
│ ├── standards.R # STANDARDS metric
│ ├── concept_diversity.R # CONCEPT DIVERSITY metric
│ ├── source_diversity.R # SOURCE DIVERSITY metric
│ └── temporal.R # TEMPORAL metric
├── inst/
│ ├── config/ # Default configuration files
│ │ ├── concept_fields_with_weights.csv
│ │ ├── type_concept_id_fields.csv
│ │ └── date_fields.csv
│ └── examples/ # Example usage scripts
│ └── calculate_pass_example.R
├── man/ # Function documentation (auto-generated)
├── vignettes/ # Package vignettes
│ └── scoring_methodology.Rmd
└── README.md
Access results programmatically without saving to files:
results <- calculate_pass(
conn = conn,
schema = "my-project.omop_cdm",
config = config,
output_dir = NULL # Don't save CSV files
)
# Access overall scores (all metrics use standardized column names)
accessibility_score <- results$accessibility$overall$overall_score
temporal_score <- results$temporal$overall$overall_score
composite_score <- results$composite$overall$composite_score
# Access field-level details
field_scores <- results$accessibility$field_level
# Filter by metric in consolidated results
provenance_fields <- results$provenance$field_level