Skip to content

Conversation

@cyrilrader
Copy link
Contributor

Removed this filter '!is.na(measurementTypeID' so the emof with no measurementTypeID are not excluded from the check. Added the column measurementType to the 'select' and 'group_by'.

Changes have been tested on my fork of the tool first and with this modified dataset:
test_modified_biofun.zip

@cyrilrader cyrilrader requested a review from rubenpp7 June 23, 2025 08:26
@rubenpp7
Copy link
Collaborator

Hi @cyrilrader

I feel that this commit is replacing one functionality by another and it may introduce a bug.

For example:

  • cases where measurementTypeID exist AND it's the same for 2 rows within the same eventID&occurrenceID BUT where the measurementType field is populated and different for those 2 rows will NOT be detected anymore.

I thought that the issue we wanted to solve was to detect exact duplicated rows, why not create a different check that looks at all the fields, instead of only at one more field?

@cyrilrader
Copy link
Contributor Author

cyrilrader commented Jun 23, 2025

Hi @rubenpp7

Yes, you are right. In the case you described, the duplicates won't be detected. But, it would be the same for a check that looks at all the fields no ?
I'll try to find another solution that solve this

@rubenpp7
Copy link
Collaborator

@cyrilrader exactly, that's why the point would be to add a new check instead of replacing an already existing one

@cyrilrader cyrilrader closed this Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants