PDF Word Cloud Generator

A desktop application that extracts text from PDF files and generates beautiful, customizable word clouds with interactive frequency analysis. This code is a PyQt6 based GUI that serves as a wrapper for amueller's word_cloud.

Features

PDF Text Extraction: Extract and analyze text from multi-page PDF documents
Interactive Word Cloud: Generate stunning visualizations with multiple shape options and color schemes
Frequency Analysis: View and edit word frequencies in an interactive table
Advanced Filtering:
- Filter by word length
- Toggle number inclusion
- Automatic stop word removal (articles, prepositions, pronouns, etc.)
- Custom word exclusion
Customizable Visualization:
- Multiple shape options (Rectangle, Circle, Heart, Star, Diamond, Hexagon, Triangle)
- 50+ built-in colormaps plus custom color palette support
- Adjustable dimensions, font sizes, and background colors
- Custom image shapes
Save & Export:
- Save word lists as CSV for future editing
- Export word clouds as PNG or SVG
Persistent Settings: Application remembers your preferences between sessions

Requirements

Python 3.8 or higher
See requirements.txt for Python package dependencies

Installation

1. Clone the Repository

git clone <repository-url>
cd WordCloud

2. Create Virtual Environment (Recommended)

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

Usage

Running the Application

python src/pdf_wordcloud.py

Workflow

Open a PDF File
- Click "Open PDF" or use Ctrl+O
- Select any PDF file from your computer
- The application automatically extracts text from all pages
Generate Word Cloud
- Adjust settings in the left control panel (optional)
- Click "Generate" to generate the word cloud visualization
- The frequency table updates automatically
Refine the Results
- View the "Word Frequency" tab to see all extracted words
- Double-click frequency values to edit them manually
- Check the "Exclude" checkbox to remove specific words
- Click "Update" to regenerate with your changes
Save Your Work
- Save the word list as CSV: Click "Save List" (Ctrl+S)
- Export the image: Click "Save Cloud" (Ctrl+Shift+S)
- Both PNG and SVG formats are supported

Advanced Features

Reset to Original

If you want to clear all manual edits and regenerate from the source:

Go to Cloud menu → Reset List (Ctrl+Shift+L)
This clears all exclusions and custom frequencies

Load a Previously Saved Word List

Click "Open List" or use Ctrl+Shift+O
Select a CSV file previously saved by the application
The word list and all settings are restored

Randomize Layout

Toggle the "Randomize" button to enable/disable random layout generation
When enabled: generates a different layout each time you update
When disabled: uses a fixed seed for consistent layouts

Configuration

Word Cloud Settings

Size: Set width and height of the generated word cloud (default: 800x600)
Background Color: Choose the background color
Colormap: Select from 50+ matplotlib colormaps or custom palettes
Font Size: Set minimum and maximum font sizes (default: 10-150 pt)

Text Filtering

Min Word Length: Filter out words shorter than specified length (default: 1)
Max Words: Limit the number of words displayed (default: 200)
Include Numbers: Toggle whether numeric words are included
Ignore Stop Words: Automatically remove common words (articles, prepositions, etc.)

Shape Options

Rectangle (default)
Circle
Heart
Star
Diamond
Hexagon
Triangle
Custom Image (load your own mask image)

File Structure

WordCloud/
├── README.md                      # This file
├── SPEC.md                        # Detailed specification
├── requirements.txt               # Python dependencies
├── src/
│   └── pdf_wordcloud.py          # Main application
├── resources/
│   ├── custom_colormaps.csv      # Custom color palettes
│   ├── settings.json             # Application settings
│   └── icons/                    # UI icons
└── saved/
    └── GDA_wordlist.csv          # Example saved word list

Keyboard Shortcuts

Shortcut	Action
Ctrl+O	Open PDF
Ctrl+Shift+O	Open Word List
Ctrl+S	Save List
Ctrl+Shift+S	Save Cloud
Ctrl+G	Generate Word Cloud
Ctrl+Shift+L	Reset List (menu only)

Creating Custom Color Palettes

Edit resources/custom_colormaps.csv to create custom color schemes:

name,color1,color2,color3,...
ocean,#1a5276,#2874a6,#5dade2
sunset,#d04526,#f39c12,#f9e79f

Each row defines a colormap with a name and 2-5 hex colors.

Troubleshooting

Application won't start

Ensure Python 3.8+ is installed: python --version
Verify all dependencies are installed: pip install -r requirements.txt
Try reinstalling PyQt6: pip install --upgrade PyQt6

PDF text extraction fails

Ensure the PDF is not encrypted or corrupted
Try opening the PDF with a PDF reader first to confirm it's readable
Some scanned PDFs require OCR (not supported by this application)

Word cloud is empty

Check your filter settings (minimum word length, stop words, etc.)
Ensure the PDF contains extractable text (not an image-only scan)
Try disabling stop word filtering to see if that helps

Author

Derrick Hasterok

License

MIT License - see LICENSE file for details

Note: This project uses PyQt6 for the GUI. PyQt6 is licensed under GPLv3 unless a commercial license is purchased, so redistribution of this application may be subject to PyQt6 licensing restrictions.

Contributing

Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests.

Future Enhancements

Real-time preview updates
Batch processing for multiple PDFs
Word frequency statistics and analytics
Theme customization
Additional input file types (txt, docx, etc.)
Add color picker for backgrounds and custom colormap generation.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
resources		resources
saved		saved
src		src
README.md		README.md
SPEC.md		SPEC.md
requirements.txt		requirements.txt
screenshot.png		screenshot.png

Folders and files

Latest commit

History

Repository files navigation

PDF Word Cloud Generator

Features

Requirements

Installation

1. Clone the Repository

2. Create Virtual Environment (Recommended)

3. Install Dependencies

Usage

Running the Application

Workflow

Advanced Features

Reset to Original

Load a Previously Saved Word List

Randomize Layout

Configuration

Word Cloud Settings

Text Filtering

Shape Options

File Structure

Keyboard Shortcuts

Creating Custom Color Palettes

Troubleshooting

Application won't start

PDF text extraction fails

Word cloud is empty

Author

License

Contributing

Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages