-
Notifications
You must be signed in to change notification settings - Fork 15
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Guidelines
- I agree to follow this project's Contributing Guidelines.
Description
Using mutable objects in a data workflow may destroy reproducibility evidence.
Example:
library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)
validator_a <- function(data_) {
report <- data_validation_report()
validate(data_) %>%
validate_cols(
\(x) not_na(x),
Sepal.Length,
description = "Sepal.Length not_na"
) %>%
add_results(report)
report
}
validator_b <- function(data_, report) {
validate(data_) %>%
validate_if(
nrow(data_) > 0,
description = "Non empty table"
) %>%
add_results(report)
report
}
report_a <- validator_a(iris)
print(report_a)
#> Validation summary:
#> Number of successful validations: 1
#> Number of failed validations: 0
#> Number of validations with warnings: 0
#>
#> Advanced view:
#>
#>
#> |table_name |description |type | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_ |Sepal.Length not_na |success | NA|
report_b <- validator_b(iris, report_a)
### eport_a is mutated
print(report_a)
#> Validation summary:
#> Number of successful validations: 1
#> Number of failed validations: 1
#> Number of validations with warnings: 0
#>
#> Advanced view:
#>
#>
#> |table_name |description |type | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_ |Non empty table |error | 1|
#> |data_ |Sepal.Length not_na |success | NA|Created on 2023-06-20 with reprex v2.0.2
Problem
The report_a object changes and for a functional approach in data analysis workflow this may be non-expected behavior for most users.
Proposed Solution
Update documentation and highlight that reference semantics are used and the Report can be passed to downstream functions using R6 clone() method.
Example:
library(dplyr, warn.conflicts = FALSE)
library(data.validator)
library(assertr)
validator_a <- function(data_) {
report <- data_validation_report()
validate(data_) %>%
validate_cols(
\(x) not_na(x),
Sepal.Length,
description = "Sepal.Length not_na"
) %>%
add_results(report)
report
}
validator_b <- function(data_, report) {
validate(data_) %>%
validate_if(
nrow(data_) > 0,
description = "Non empty table"
) %>%
add_results(report)
report
}
report_a <- validator_a(iris)
print(report_a)
#> Validation summary:
#> Number of successful validations: 1
#> Number of failed validations: 0
#> Number of validations with warnings: 0
#>
#> Advanced view:
#>
#>
#> |table_name |description |type | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_ |Sepal.Length not_na |success | NA|
report_b <- validator_b(iris, report_a$clone())
### eport_a is mutated
print(report_a)
#> Validation summary:
#> Number of successful validations: 1
#> Number of failed validations: 0
#> Number of validations with warnings: 0
#>
#> Advanced view:
#>
#>
#> |table_name |description |type | total_violations|
#> |:----------|:-------------------|:-------|----------------:|
#> |data_ |Sepal.Length not_na |success | NA|Created on 2023-06-20 with reprex v2.0.2
Alternatives Considered
Maybe refactor so that non-standard reference semantics are used.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request