Skip to content

Data Processing and Logical Issues in Generating OSCEs from MIMIC Data, Leading to Unreliable Results #7

@Warren-swr

Description

@Warren-swr

After reviewing and testing the repository, I have identified several major issues specifically related to the generate_cases/gen_mimic_tutorial.py file. Below are the details:

  1. case_studies only uses patient IDs without actual patient information

    • In the code snippet below, the prompt passed to the large language model only uses _case as content, while _case is essentially just a string of patient IDs and does not contain any corresponding patient details:
      messages = [
          {"role": "system", "content": "..."},
          {"role": "user", "content": "Generate a OSCE for the following case study {}.".format(_case) + ...}
      ]
    • Since case_studies[_case] contains the actual patient information, but only _case (the patient ID) is passed to the LLM here, the generated OSCE has no real connection to the patient data and is therefore unreliable.
  2. The logic to limit data size does not take effect; case_studies is not actually truncated to 300 records

    • Although there is a mechanism intended to limit the number of cases to 300:
      # Choose only cases with diagnoses == 1
      for _ in num_diagnoses:
          if num_diagnoses[_] < 2:
              num += 1
              if num >= 300: break
              patlist.append(_)
      ...
      pats_file = [_ for _ in pats_file if _[0] in patlist]
    • In the subsequent processing and output of the patient_info dictionary (i.e., case_studies), there is no real filtering or removal based on patlist. As a result, case_studies still contains all the data.
  3. Inefficient CSV reading approach, especially for the very large labevents.csv

    • The code uses list(csv.reader(f)) to load the entire CSV file into memory at once, for example:
      with open(base_str + "hosp/labevents.csv", "r") as f:
          labenvt_file = list(csv.reader(f))
    • For massive MIMIC data, particularly the labevents.csv, this approach places a heavy burden on memory and processing speed, making it impractical to run on servers with typical configurations.
  4. Potential lack of reliability in the paper’s experimental results

    • As described in Issue 1, the script does not pass actual patient information to the large language model, but rather just the patient ID. This could lead to significant bias or unreliability in the experimental results.
    • If the paper’s analysis or conclusions rely on these generated results, the validity of the findings should be carefully reevaluated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions