Skip to content

Handling Duplicated Clinical Records #202

@odikia

Description

@odikia

Issue Name: General Convention for Handling "Duplicate" Clinical Data Points

Issue Type: Multiple ways to store source data in the OMOP CDM to represent an idea of interest.

Description: The current convention is meant to generalize and extend the logic of Themis conventions for drug exposure as general guidance for ETL'ers.

Related forum post: Populating CONDITION_OCCURRENCE using EPIC Clarity and its duplications

Suggested solution: When a row in an OMOP clinical table contains reasonable methods for deduplication of a datapoint outside of a timepoint alone, then these may be included as "duplicates" of a particular datum, and adheres to the table's other conventions as stated in the OMOP-CDM version documentation. For example, if a diagnosis code is given same day, and near the same time, but by different providers (as part of reason for lab draw by nurse, then as diagnosis resulting from a visit with the physician), then it is reasonable to allow duplicate records as each record contains a meaningful differentiator, and maintains the fact that the patient was given the diagnosis multiple times that day. Furthermore, in this situation, a "type concept" of "EHR order" for the nurse's lab draw, and "EHR outpatient note" could provide meaningful differentiation that would support later analysis. From data access alone, it may also not be clear which diagnosis is the "correct" diagnosis, and the nurses diagnosis may reflect a former outside diagnosis listed on the patient's file.

In contrast, drug exposures in source system will contain audit trails where the drug is ordered by a physician's assistant, modified by a physician, updated to a generic code by the pharmacist, and months later, marked as complete by a different physician without further modification of the underlying ingredient the patient is exposed to. During ETL, an implementer may fear losing the history of the drug's pathway in reaching the patient if this information is collapsed to a single record with a start and stop date based on the record for that order number (see THEMIS convention). However, at the time of this writing, the stated convention for the drug_exposure table is to represent a patient's exposure to a drug; it is not to establish the operational facts associated with a drug's process of being prescribed, if that process has not lead to change in the patient's true exposure to that drug. In this way, the broader intended purpose of the OMOP-CDM to remain patient-centric is further maintained. Unless the patient's actual exposure to the drug has been changed in some way since they were first prescribed the drug, then the exposure should be collapsed to the best information available for that patient's drug exposure at time of actual administration. Whether or not the provider_id should be collapsed to the physicians assistant, the physician, or the pharmacist however, is not covered by this convention, and so could be addressed elsewhere. If no such convention were to exist at the time of this writing, it would therefore be important for an implementation team to communicate to their users, as well as to network study members, the nature of the deduplication process and the decisions made regarding the ramifications (see Rabit in a Hat, or LLM's positioned against a code base such as Github Copilot, for assistance with creating said documentation based upon ETL steps).

It's notable that great care should be taken to determine if an endpoint can be meaningfully collapsed by OHDSI tools or by custom scripts, or if the duplicate would cause unknown reason for clustering of data. If the duplication would interfere with an OHDSI tool's aggregation capabilities without the potential for collapse across a differentiating attribute within the table itself, then great care should be taken in allowing said duplicate, and documentation of process should be clearly tracked. If for example, the deduplication of a data point is dependent solely upon a process that requires a join to another table (e.g., drug_exposure -> observation), it is likely that an OMOP instance's user group, or network study collaborators, would not be aware of the local deduplication convention.

Reported By: Daniel Smith (Emory University)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions