This Java-based project reads a large text file and efficiently indexes every word using a Trie (Prefix Tree) data structure. Once indexed, users can search for any word to check if it exists in the file and how many times it appears.
- Reads large files line-by-line using
BufferedReader - Uses Trie for fast word insertion and lookup
- Handles punctuation and case-insensitivity
- Interactive CLI: search for words or exit anytime
- Shows word frequency if present in the file
- The program prompts for a file name (supports relative paths).
- Reads the file, extracts valid words, and inserts them into the trie.
- Accepts user input to search words interactively.
- Returns the count of each word's occurrence or a not-found message.
- Java 11+
- Trie Data Structure
- BufferedReader
- Scanner
- Show suggestions for near matches (fuzzy search)
- Export indexed data as a report
- GUI integration using JavaFX or Swing
- Add support for multiple files or file types
- Input file must be in
.txtformat. - Words are normalized: lowercase and stripped of punctuation.
- File path must be correct, or the program will exit gracefully.
- Due to GitHub's file size limitations, the test dataset (170,000+ rows) has not been uploaded. However, the project has been successfully tested on this large data file locally.
Monika
B.Tech, CSE (Data Science)
Linkedin: [https://www.linkedin.com/in/monika-nahadiya-a99558289/]
Email: [monikanahadiya@gmail.com]