Skip to content

[Feature]: validate column name existence #87

@nick-youngblut

Description

@nick-youngblut

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

A modified version of the README example:

library(magrittr)
library(data.validator)

report <- data_validation_report()

validate(mtcars, name = "Verifying cars dataset") %>%
  validate_if(drat > 0, description = "Column drat has only positive values") %>%
  validate_cols(in_set(c(0, 2)), WRONG_COLUMN_NAME, vs, am, description = "vs and am values equal 0 or 2 only") %>%
  validate_cols(within_n_sds(1), mpg, description = "mpg within 1 sds") %>%
  validate_rows(num_row_NAs, within_bounds(0, 2), vs, am, mpg, description = "not too many NAs in rows") %>%
  validate_rows(maha_dist, within_n_mads(10), everything(), description = "maha dist within 10 mads") %>%
  add_results(report)

print(report)

The error:

> validate(mtcars, name = "Verifying cars dataset") %>%
+   validate_if(drat > 0, description = "Column drat has only positive values") %>%
+   validate_cols(in_set(c(0, 2)), WRONG_COLUMN_NAME, vs, am, description = "vs and am values equal 0 or 2 only") %>%
+   validate_cols(within_n_sds(1), mpg, description = "mpg within 1 sds") %>%
+   validate_rows(num_row_NAs, within_bounds(0, 2), vs, am, mpg, description = "not too many NAs in rows") %>%
+   validate_rows(maha_dist, within_n_mads(10), everything(), description = "maha dist within 10 mads") %>%
+   add_results(report)
Error in `dplyr::select()` at assertr/R/assertions.R:102:2:
! Can't subset columns that don't exist.
✖ Column `WRONG_COLUMN_NAME` doesn't exist.

As far as I can tell, if the user provides a table in which a validated column doesn't exist, then the validate workflow throws an error instead of producing a report stating validation failed due to missing required columns.

Problem

No checks that the validated columns exist in the provided data.frame.
So, the column-exists check must be placed outside of the generate-validation-report workflow.
The feedback to the user is then split into at least 2 validations: 1) a check for the required columns and 2) the validation report -- instead of just one all-encompassing validation report.

Proposed Solution

Include assertr::has_all_names in the validation report, or if that is already possible, provide an example in the package README.

Alternatives Considered

I'm currently validating the existence of the required columns prior to using data.validator, and providing user feedback on the column existence via shiny::showNotification()

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions