Skip to content

[Feature]: make flexible validations #76

@nbbn

Description

@nbbn

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

Data.validator strongly follow idea of table and validations running on the table.
IMO it doesn't fit most of use cases.

E.g. I do:

validate(data.frame(), name = "Comparing testing vs postgres data") |>
  validate_if(
    identical(
      names(get_cols(...)),
      names(get_cols(...))
    ),
    description = "Column names are the same in 1 table"
  ) |>
  validate_if(
    identical(
      as.vector(get_cols(...)),
      as.vector(get_cols(...))
    ),
    description = "Column types are the same in 1 table"
  ) |>
  add_results(report)

As you can see, I have to pass empty data frame to validate() but I don't use it.

Then when I do print(report)
I see:

|table_name                         |description                                       |type    | total_violations|
|:----------------------------------|:-------------------------------------------------|:-------|----------------:|
|Comparing testing vs ci data       |Column names are the same in 1 table |success |               NA|
|Comparing testing vs ci data       |Column names are the same in 1 table         |success |               NA|

Name of column table_name doesn't make sense for me in this situation. It should be maybe Group?

Also Violated data doesn't work with this flexible approach.

Another example from practice

We used data.validator to show rows, that are returned by queries. Queries were built in the way that they return only invalid rows, and there is nothing returned if there is no invalid data. More documentation about how to hack data.validator for this cases would be nice.

Problem

My use of this package doesn't fit standard use of the package. I think package should be more flexible and allow validations based on multiple data frames without specifing them explicitly in validate call.

Proposed Solution

  1. Change column names in report object.
  2. Remove requirement of dataframe in validate()
  3. Update docs with examples of more advanced and customized use-cases.

Alternatives Considered

Stick to what you have. Write in docs explicitly that it is dedicated to working with data frames.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions