Skip to content

Feature or bug? ard_stack_hierarchical drops levels of by variables that are not present in denominator data #525

@bzkrouse

Description

@bzkrouse

What happened?

If any by variables are not present in the denominator dataset, and a subject has multiple records across multiple levels of said by variable, only 1 is retained.
In the following example, a subject has 1 mild and 1 moderate AE, but only the last one (moderate) is counted.

  library(cards)
  library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  
  # load data
  adsl <- pharmaverseadam::adsl |>
    filter(SAFFL == "Y", SITEID=="701")
  adae <- pharmaverseadam::adae |>
    filter(SAFFL == "Y", SITEID=="701")
  # subset data to limit
  adae <- adae |>
    filter(AESOC %in% unique(AESOC)[1:2]) |>
    unique()
  
  # duplicate one of the USUBJID records & add it
  adae_row_add <- adae |> slice(1)
  print(adae_row_add |> select(USUBJID, TRT01A, AESOC, AESEV))
#> # A tibble: 1 × 4
#>   USUBJID     TRT01A  AESOC                                                AESEV
#>   <chr>       <chr>   <chr>                                                <chr>
#> 1 01-701-1015 Placebo GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS MILD
  
  adae <- adae |>
    bind_rows(
      adae_row_add |> mutate(AESEV = "MODERATE")
    )
  
  # calculate manually
  manual_counts <- adae |>
    select(USUBJID, TRT01A, AESEV, AESOC) |>
    unique() |>
    group_by(TRT01A, AESEV, AESOC) |>
    tally() |>
    ungroup()
  
  # calculate with ard_stack_hier
  ard_counts <- ard_stack_hierarchical(
    data = adae,
    by = c(TRT01A, AESEV),
    variables = AESOC,
    statistic = ~ "n",
    denominator = adsl,
    id = USUBJID,
    by_stats = FALSE
  ) |>
    unlist_ard_columns()
#> ℹ Denominator set by "TRT01A" column in `denominator` data frame.
  
  # 3 MILD, 1 MODERATE
  manual_counts |>
    filter(TRT01A=="Placebo",
           AESOC=="GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS")
#> # A tibble: 2 × 4
#>   TRT01A  AESEV    AESOC                                                    n
#>   <chr>   <chr>    <chr>                                                <int>
#> 1 Placebo MILD     GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS     3
#> 2 Placebo MODERATE GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS     1
  
  # 2 MILD, 1 MODERATE
  ard_counts |>
    filter(group1_level=="Placebo",
           variable_level=="GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS")
#> {cards} data frame: 3 x 13
#>   group1 group1_level group2 group2_level variable variable_level stat_name
#> 1 TRT01A      Placebo  AESEV         MILD    AESOC      GENERAL …         n
#> 2 TRT01A      Placebo  AESEV     MODERATE    AESOC      GENERAL …         n
#> 3 TRT01A      Placebo  AESEV       SEVERE    AESOC      GENERAL …         n
#>   stat_label stat
#> 1          n    2
#> 2          n    1
#> 3          n    0
#> ℹ 4 more variables: context, fmt_fun, warning, error

Created on 2025-11-24 with reprex v2.1.1

I'm not 100% sure if this is a bug or intentional behavior, but the result is unexpected in some cases and may be confusing to users. The help file does describe the logic accurately, so it's probably intentional and might just need additional clarification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions