Skip to content

RYeeshuDhurandhar/Statistical-Language-Modeling-Using-N-Grams

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Statistical-Language-Modeling-Using-N-Grams

The model predicts the most probable next word and outputs the correctness of an input English sentence using the trigram model.

Course Project for MA 202 [Probability and Statistics]


Abstract

Using Natural Language Processing, the model predicts the most probable next word and outputs the correctness of an input English sentence. To achieve the optimum accuracy, a large reliable dataset or corpus is extracted from Wikipedia, preprocessed, and then analyzed before using it to train the model. Analyzing the dataset and its visualization can be an insightful technique to understand the corpus before using it for the model's training. Choosing an appropriate model for any problem is a crucial step. In our case, using a trigram model to train the data proved to be the best trade-off. This trained model is finally used in the code to predict the next word and find the perplexity of a given sentence based on the trigram model.

Problem Statement

Computers were once thought of as “dumb terminals,” and human interactions were based on the principle of “garbage in, garbage out.” Computers could only communicate in sophisticated hand-coded rules. Natural Language Processing bridges the gap between humans and computers by enabling humans to interact with computers in human-developed languages. It can have various use-cases such as voice assistants, speech recognition, computer-assisted coding, and word & sentence prediction. The boundless possibilities in NLP, yet to be explored, motivate us to work in this field.

Requirements

  • nltk

License

The code is licenced under the MIT license and free to use by anyone without any restrictions.

About

The model predicts the most probable next word and outputs the correctness of an input English sentence using the trigram model.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors