Skip to content

Monikanahadiya/Large_file_Indexing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Large File Data Indexing and Word Search using Trie

This Java-based project reads a large text file and efficiently indexes every word using a Trie (Prefix Tree) data structure. Once indexed, users can search for any word to check if it exists in the file and how many times it appears.


Features

  • Reads large files line-by-line using BufferedReader
  • Uses Trie for fast word insertion and lookup
  • Handles punctuation and case-insensitivity
  • Interactive CLI: search for words or exit anytime
  • Shows word frequency if present in the file

How It Works

  1. The program prompts for a file name (supports relative paths).
  2. Reads the file, extracts valid words, and inserts them into the trie.
  3. Accepts user input to search words interactively.
  4. Returns the count of each word's occurrence or a not-found message.

Technologies Used

  • Java 11+
  • Trie Data Structure
  • BufferedReader
  • Scanner

Future Enhancements

  • Show suggestions for near matches (fuzzy search)
  • Export indexed data as a report
  • GUI integration using JavaFX or Swing
  • Add support for multiple files or file types

Notes

  • Input file must be in .txt format.
  • Words are normalized: lowercase and stripped of punctuation.
  • File path must be correct, or the program will exit gracefully.
  • Due to GitHub's file size limitations, the test dataset (170,000+ rows) has not been uploaded. However, the project has been successfully tested on this large data file locally.

Author

Monika
B.Tech, CSE (Data Science)
Linkedin: [https://www.linkedin.com/in/monika-nahadiya-a99558289/] Email: [monikanahadiya@gmail.com]

About

Implementation of a Trie-based indexing system in Java to enable optimized word lookup and scalable search performance on large text files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages