An AI tool that extracts text and named entities from PDF documents using Python, pdfplumber, and spaCy
This is a simple AI-powered tool built using Python that:
- Extracts text from PDF documents
- Identifies useful entities like names, dates, locations, and more using spaCy
- Extracts plain text from any
.pdffile usingpdfplumber - Performs Named Entity Recognition (NER) using spaCy's
en_core_web_smmodel - Outputs both the full text and the named entities to a
.txtfile
- Python 3.10+
- pdfplumber
- spaCy
- Install dependencies:
pip install pdfplumber spacy python -m spacy download en_core_web_sm
#Run the script python document_extractor.py Enter the full path to your PDF file (e.g. C:/Users/Name/Desktop/sample_resume.pdf) ๐ฆ Output Extracted text saved in extracted_info.txt
Named entities displayed in terminal
๐ซ Contact Made with โค๏ธ by Laiba Azhar GitHub: laibatechX