Skip to content

Conversation

@cyrilrader
Copy link
Contributor

In the details of the commit, you can see the changes between the BODC files of this month and the ones from april but it's weird because the repo don't contain the ones from april anymore. I don't think it should be a problem since it wants to make changes in a file that doesn't exist anymore.

@cyrilrader cyrilrader requested review from Copilot and rubenpp7 July 2, 2025 11:46
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances duplicate detection for eMoF records, adds an exact duplicate check, and updates unit definitions and documentation accordingly.

  • Added R logic to flag duplicate measurementType and measurementTypeID at both occurrence and event levels, plus an exact-eMoF duplicate check.
  • Updated README.md to describe the new duplicate checks.
  • Cleaned up BODCunits_20250701.csv by removing duplicates and appending new unit codes.

Reviewed Changes

Copilot reviewed 3 out of 8 changed files in this pull request and generated 3 comments.

File Description
R/checkdataset.R New blocks for duplicate measurementType/measurementTypeID and exact eMoF checks
README.md Added entries for the new duplicate checks in the general issues table
files/BODCunits_20250701.csv Removed duplicate unit entries and inserted new codes
Comments suppressed due to low confidence (2)

R/checkdataset.R:647

  • [nitpick] For the measurementTypeID duplicate check, consider setting field = 'measurementTypeID' instead of 'measurementType' to clearly indicate which column is in error.
                            mutate (field = 'measurementType', 

README.md:80

  • [nitpick] The description uses "duplicated measurementTypes" but the next entries refer to measurementTypeID; consider consistent singular/plural naming, e.g. "Check if there are duplicated measurementTypeIDs linked to the same occurrenceID."
| Check if there are duplicated measurementTypes linked to the same occurrenceID / Does the same measurement occurs more than once for the same occurrenceID? | Duplicate measurementType linked to the same occurrence | IPTreport\$dtb\$general_issues | Ideally a given occurrenceID can be linked to the same measurementType only once. E.g. It doesn't make sense to say that one same occurrenceID contains adults in one record but larvae in another record, it must be or either adults or larvae or both, but all expressed in only one record. | 

Comment on lines +670 to +675
select (id, measurementTypeID) %>%
group_by (id, measurementTypeID) %>%
summarize(count = n()) %>%
filter (count >1 ) %>%
inner_join((eMoF) %>% mutate (row = row_number()),
by = c("id", "measurementTypeID") ) %>%
Copy link

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In duplicated_measurementTypeID_ev(), you’re selecting and grouping by id instead of eventID. It should use eventID to match the filter on event-level records.

Suggested change
select (id, measurementTypeID) %>%
group_by (id, measurementTypeID) %>%
summarize(count = n()) %>%
filter (count >1 ) %>%
inner_join((eMoF) %>% mutate (row = row_number()),
by = c("id", "measurementTypeID") ) %>%
select (eventID, measurementTypeID) %>%
group_by (eventID, measurementTypeID) %>%
summarize(count = n()) %>%
filter (count >1 ) %>%
inner_join((eMoF) %>% mutate (row = row_number()),
by = c("eventID", "measurementTypeID") ) %>%

Copilot uses AI. Check for mistakes.
} else {mof_oc_Value_war <-NULL }

mof_oc_dubs <- eMoF %>% filter (!is.na(occurrenceID), !is.na(measurementTypeID)) %>%
duplicated_measurementType_oc <- eMoF %>% filter (!is.na(occurrenceID),is.na(measurementTypeID)) %>%
Copy link

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filter for duplicate measurementType currently excludes measurementTypeID but does not ensure measurementType is not NA. Consider adding !is.na(measurementType) to avoid grouping NA values.

Suggested change
duplicated_measurementType_oc <- eMoF %>% filter (!is.na(occurrenceID),is.na(measurementTypeID)) %>%
duplicated_measurementType_oc <- eMoF %>% filter (!is.na(occurrenceID), !is.na(measurementType), is.na(measurementTypeID)) %>%

Copilot uses AI. Check for mistakes.
SDN:P06::M2MS,Moles per square metre per second,mol/m^2/s,false,http://vocab.nerc.ac.uk/collection/P06/current/M2MS/
SDN:P06::MLPS,Moles per second,mol/s,false,http://vocab.nerc.ac.uk/collection/P06/current/MLPS/
SDN:P06::SQM2,Square metres per second squared,m^2/s^2,false,http://vocab.nerc.ac.uk/collection/P06/current/SQM2/
SDN:P06::XXXX,Not applicable,NA,false,http://vocab.nerc.ac.uk/collection/P06/current/XXXX/
Copy link

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] New unit codes are inserted in multiple spots; consider keeping the CSV in sorted or grouped order to simplify future maintenance.

Copilot uses AI. Check for mistakes.
@cyrilrader cyrilrader marked this pull request as ready for review July 23, 2025 09:31
@cyrilrader cyrilrader requested review from LynnDelgat and removed request for rubenpp7 July 24, 2025 12:23
@cyrilrader cyrilrader closed this Jul 24, 2025
@cyrilrader
Copy link
Contributor Author

Will redo but on dev branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant