PathReportParsing_MRA_DeepDerm

This repo provides a tool for automatically extracting and processing pathology report data for the MRA-DeepDerm Study. The module is tailored to extract structured data for use in research and clinical applications.

Features

Accession Number Extraction: Automatically identifies and extracts accession numbers from pathology text.
Specimen Information Parsing: Organizes specimen identifiers and their descriptions into a structured format.
Clinical Impressions and Diagnoses: Captures detailed impressions and diagnostic details for each specimen.
Microscopic Descriptions: Extracts findings from microscopic sections of pathology reports.
Batch Processing Support: Processes multiple pathology reports from an input excel file.

Run in Colab

For ease of use without local setup, you can run the Python notebook in Google Colab. Use the provided badge to access the notebook directly in your browser.

Prepare Your Input File Ensure your input file is an Excel file containing the sheet named "Per Lesion". Include a column titled Path Report Text containing the pathology report text to be processed.
Run the Notebook Update the input_file path to the location of your Excel file:
```
input_file = "/path/to/the/excel/file"
```

Run Locally

If you prefer to run the notebook on your local machine, follow the installation and usage instructions below.

Installation

Clone the repository:

git clone git@github.com:DaneshjouLab/PathReportParsing_MRA_DeepDerm.git
cd PathReportParsing_MRA_DeepDerm

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Prepare Your Input File Ensure your input file is an Excel file containing the sheet named "Per Lesion". Include a column titled Path Report Text containing the pathology report text to be processed.
Run the Notebook Update the input_file path to the location of your Excel file:
```
input_file = "/path/to/the/excel/file"
```
Execute the notebook or script. If the Path Report Text column is present, the script will process the data and save the results to a new file named Processed_Pathology_Reports.xlsx.
Check the Output The processed data will be saved as an Excel file in the working directory. Example file name: Processed_Pathology_Reports.xlsx.

# Import packages
import pandas as pd
from utils import process_pathology_reports

# Load the Excel file into a DataFrame
input_file = "/path/to/the/excel/file"
df = pd.read_excel(input_file, sheet_name="Per Lesion", engine="openpyxl")

# Ensure the 'Path Report Text' column exists
if "Path Report Text" not in df.columns:
    processed_df = process_pathology_reports(df)

    # Save the processed DataFrame to a new Excel file
    output_file = "Processed_Pathology_Reports.xlsx"
    processed_df.to_excel(output_file, engine="openpyxl", index=False)
    print(f"Processed data saved to {output_file}")
else:
    print("Error: 'Path Report Text' column not found in the input file.")

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
.reuse		.reuse
.gitignore		.gitignore
PathReportParser.ipynb		PathReportParser.ipynb
PathReportParser.ipynb.license		PathReportParser.ipynb.license
README.md		README.md
requirements.txt		requirements.txt
requirements.txt.license		requirements.txt.license
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PathReportParsing_MRA_DeepDerm

Features

Run in Colab

Run Locally

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PathReportParsing_MRA_DeepDerm

Features

Run in Colab

Run Locally

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages