Skip to content

on getCatchWgt #46

@einarhjorleifsson

Description

@einarhjorleifsson

This issue is not a bug, but rather a question of preventing inadvertent use of the dataproduct downstream.

Now the help file states: "Calculate the total reported catch weight by species and haul." I initially interpreted this to be the actual catch in the haul. But if I understand things correctly the interpretation of the value in CatCatchWeight in the HL-data is dependent on the DataType in the HH-data. I.e. standardized to one hour if DataType is "C", otherwise ias reported in the haul.

So question if this function should not in addition return an explicit cpue value and make a clear distinction between that value and CatchWgt in the help file (also if zero's should not explicitly be returned, see sidenote below).

Below is a code that illustrate the difference these two values would give in a typical downstream user analysis:

library(icesDatras)
library(tidyverse)
dr_add_id <- function (d) {
  d |> 
    dplyr::mutate(.id = paste(Survey, Year, Quarter, Country, 
                               Ship, Gear, StNo, HaulNo, sep = ":"))
}
cw <- 
  icesDatras::getCatchWgt("NS-IBTS", years = 2020:2025, quarters = 1, aphia = 126437) |> 
  dr_add_id() |>  
  select(.id, Valid_Aphia, CatchWgt)
hh <- 
  icesDatras::getDATRAS("HH", "NS-IBTS", 2020:2025, 1) |> 
  dr_add_id() |> 
  select(.id, DataType, HaulDur, Year)
d <- 
  hh |> 
  left_join(cw,
            by = join_by(.id)) |> 
  as_tibble() |> 
  mutate(CatchWgt = replace_na(CatchWgt, 0),    # This may not be kosher is some cases
         wgt = case_when(DataType == "C" ~ CatchWgt/60 * HaulDur,
                         .default = CatchWgt),
         cpue = wgt / HaulDur * 60)
d |> 
  count(Year, DataType) |> 
  ggplot(aes(Year, n, colour = DataType)) +
  geom_point()
d |> 
  select(.id, Year, CatchWgt, cpue) |> 
  gather(var, value, -c(.id, Year)) |> 
  ggplot(aes(Year, value, colour = var)) +
  stat_summary(fun.data = "mean_cl_boot")

Side note: In the above the missing CatchWgt are interpreted as zero. I have however come across cases where the CatCatchWgt may be missing for some CatIdentifier within the same tow. I guess these should return an NA in the CatchWgt of a particular haul (or possibly be a QC-flag issue), but cases where the species is not reported at all in the HL-data for a given haul should be explicitly set to zero.

Just some food for thought.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions