Skip to content

[Feature]: Document that Report object is mutable #71

@alexverse

Description

@alexverse

Guidelines

  • I agree to follow this project's Contributing Guidelines.

Description

Using mutable objects in a data workflow may destroy reproducibility evidence.

Example:

library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)

validator_a <- function(data_) {
  report <- data_validation_report()
  validate(data_) %>%
    validate_cols(
      \(x) not_na(x),
      Sepal.Length,
      description = "Sepal.Length not_na"
    ) %>%
    add_results(report)
  report
}

validator_b <- function(data_, report) {
  validate(data_) %>%
    validate_if(
      nrow(data_) > 0,
      description = "Non empty table"
    ) %>%
    add_results(report)
  report
}


report_a <- validator_a(iris)
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

report_b <- validator_b(iris, report_a)

### eport_a is mutated
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 1
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Non empty table     |error   |                1|
#> |data_      |Sepal.Length not_na |success |               NA|

Created on 2023-06-20 with reprex v2.0.2

Problem

The report_a object changes and for a functional approach in data analysis workflow this may be non-expected behavior for most users.

Proposed Solution

Update documentation and highlight that reference semantics are used and the Report can be passed to downstream functions using R6 clone() method.

Example:

library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)

validator_a <- function(data_) {
  report <- data_validation_report()
  validate(data_) %>%
    validate_cols(
      \(x) not_na(x),
      Sepal.Length,
      description = "Sepal.Length not_na"
    ) %>%
    add_results(report)
  report
}

validator_b <- function(data_, report) {
  validate(data_) %>%
    validate_if(
      nrow(data_) > 0,
      description = "Non empty table"
    ) %>%
    add_results(report)
  report
}


report_a <- validator_a(iris)
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

report_b <- validator_b(iris, report_a$clone())

### eport_a is mutated
print(report_a)
#> Validation summary: 
#>  Number of successful validations: 1
#>  Number of failed validations: 0
#>  Number of validations with warnings: 0
#> 
#> Advanced view: 
#> 
#> 
#> |table_name |description         |type    | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_      |Sepal.Length not_na |success |               NA|

Created on 2023-06-20 with reprex v2.0.2

Alternatives Considered

Maybe refactor so that non-standard reference semantics are used.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions