After reviewing and testing the repository, I have identified several major issues specifically related to the generate_cases/gen_mimic_tutorial.py file. Below are the details:
-
case_studies only uses patient IDs without actual patient information
- In the code snippet below, the prompt passed to the large language model only uses
_case as content, while _case is essentially just a string of patient IDs and does not contain any corresponding patient details:
messages = [
{"role": "system", "content": "..."},
{"role": "user", "content": "Generate a OSCE for the following case study {}.".format(_case) + ...}
]
- Since
case_studies[_case] contains the actual patient information, but only _case (the patient ID) is passed to the LLM here, the generated OSCE has no real connection to the patient data and is therefore unreliable.
-
The logic to limit data size does not take effect; case_studies is not actually truncated to 300 records
- Although there is a mechanism intended to limit the number of cases to 300:
# Choose only cases with diagnoses == 1
for _ in num_diagnoses:
if num_diagnoses[_] < 2:
num += 1
if num >= 300: break
patlist.append(_)
...
pats_file = [_ for _ in pats_file if _[0] in patlist]
- In the subsequent processing and output of the
patient_info dictionary (i.e., case_studies), there is no real filtering or removal based on patlist. As a result, case_studies still contains all the data.
-
Inefficient CSV reading approach, especially for the very large labevents.csv
- The code uses
list(csv.reader(f)) to load the entire CSV file into memory at once, for example:
with open(base_str + "hosp/labevents.csv", "r") as f:
labenvt_file = list(csv.reader(f))
- For massive MIMIC data, particularly the
labevents.csv, this approach places a heavy burden on memory and processing speed, making it impractical to run on servers with typical configurations.
-
Potential lack of reliability in the paper’s experimental results
- As described in Issue 1, the script does not pass actual patient information to the large language model, but rather just the patient ID. This could lead to significant bias or unreliability in the experimental results.
- If the paper’s analysis or conclusions rely on these generated results, the validity of the findings should be carefully reevaluated.
After reviewing and testing the repository, I have identified several major issues specifically related to the
generate_cases/gen_mimic_tutorial.pyfile. Below are the details:case_studiesonly uses patient IDs without actual patient information_caseas content, while_caseis essentially just a string of patient IDs and does not contain any corresponding patient details:case_studies[_case]contains the actual patient information, but only_case(the patient ID) is passed to the LLM here, the generated OSCE has no real connection to the patient data and is therefore unreliable.The logic to limit data size does not take effect;
case_studiesis not actually truncated to 300 recordspatient_infodictionary (i.e.,case_studies), there is no real filtering or removal based onpatlist. As a result,case_studiesstill contains all the data.Inefficient CSV reading approach, especially for the very large
labevents.csvlist(csv.reader(f))to load the entire CSV file into memory at once, for example:labevents.csv, this approach places a heavy burden on memory and processing speed, making it impractical to run on servers with typical configurations.Potential lack of reliability in the paper’s experimental results