Natural Language Processing (NLP) with NLTK and spaCy 🤖📚

This repository demonstrates basic NLP tasks using the NLTK and spaCy libraries. It contains an extensive collection of Python code and explanations for tasks like tokenization, POS tagging, N-gram language modeling, named entity recognition (NER), text classification, stemming, and more.

Getting Started 🚀

To get started with this repository, follow the steps below:

Prerequisites 🛠️

Python 3.7+

Install the required libraries:

pip install nltk spacy
python -m spacy download en_core_web_sm

Files in the Repository 📂

📘 n_gram_language_model.ipynb: Build and evaluate N-gram language models using NLTK.
🌍 named_entity_recognition.ipynb: Perform NER with NLTK and spaCy.
🏷️ pos_tagging.ipynb: POS tagging using NLTK and spaCy.
✅ spelling_correction.ipynb: Demonstrates spelling correction techniques.
🌱 stemming_stopwords.ipynb: Covers stemming types and stopword removal methods.
📊 text_classification.ipynb: Basic text classification using NLTK.
✂️ tokenization.ipynb: Different tokenization techniques using NLTK and spaCy.

Topics Covered 📖

1. Tokenization ✂️

Splitting a sentence into words or subwords.

from nltk.tokenize import word_tokenize
sentence = "Natural Language Processing is exciting!"
tokens = word_tokenize(sentence)
print(tokens)

📌 Output: ['Natural', 'Language', 'Processing', 'is', 'exciting', '!']

2. N-gram Language Model with NLTK 📈

Building N-grams and predicting the next word.

from nltk import ngrams
sentence = "I am learning NLP."
n_grams = list(ngrams(sentence.split(), 2))
print(n_grams)

🔗 Output: [('I', 'am'), ('am', 'learning'), ('learning', 'NLP.')]

3. Named Entity Recognition (NER) 🌍

Identifying entities like names, locations, and dates in text using spaCy.

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Barack Obama was the 44th President of the USA.")
for ent in doc.ents:
    print(ent.text, ent.label_)

📌 Output:

Barack Obama PERSON
44th ORDINAL
USA GPE

4. Part-of-Speech (POS) Tagging 🏷️

Tagging words in a sentence with their respective parts of speech.

from nltk import pos_tag
from nltk.tokenize import word_tokenize
sentence = "NLTK makes POS tagging simple."
tags = pos_tag(word_tokenize(sentence))
print(tags)

📌 Output: [('NLTK', 'NNP'), ('makes', 'VBZ'), ('POS', 'NNP'), ('tagging', 'NN'), ('simple', 'JJ'), ('.', '.')]

5. Spelling Correction ✅

Correcting misspelled words using NLTK's edit_distance.

from nltk.metrics.distance import edit_distance
def correct_spelling(word, vocab):
    return min(vocab, key=lambda x: edit_distance(word, x))

vocab = {"learning", "machine", "intelligence"}
print(correct_spelling("lerning", vocab))

📌 Output: learning

6. Stemming Types 🌱

Reducing words to their base form using algorithms like Porter and Lancaster stemmers.

from nltk.stem import PorterStemmer
stemmer = PorterStemmer()
print(stemmer.stem("running"))  # Output: run

🔗 Other Examples:

learning → learn
connected → connect

7. Stopword Removal 🛑

Removing common stopwords that do not add much meaning.

from nltk.corpus import stopwords
words = ["I", "am", "learning", "NLP", "with", "NLTK"]
filtered_words = [w for w in words if w not in stopwords.words("english")]
print(filtered_words)

📌 Output: ['learning', 'NLP', 'NLTK']

8. Text Classification 📊

Classifying text into predefined categories using NLTK.

from nltk.classify import NaiveBayesClassifier
train_data = [({"word": "love"}, "positive"), ({"word": "hate"}, "negative")]
classifier = NaiveBayesClassifier.train(train_data)
print(classifier.classify({"word": "love"}))  # Output: positive

📌 Output: positive

How to Use the Repository

Clone the repository:

git clone https://github.com/rushikeshraghatate90/Natural_Language_Processing.git

Navigate to the project directory:
```
cd Natural_Language_Processing
```
Open any .ipynb file in Jupyter Notebook or JupyterLab to explore the code.

Additional Resources

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Introduction_to_NLTK,_SPACY_Library.ipynb		Introduction_to_NLTK,_SPACY_Library.ipynb
LICENSE		LICENSE
Name_Entity_Recognition.ipynb		Name_Entity_Recognition.ipynb
Part_of_Speech_(POS)_tagging.ipynb		Part_of_Speech_(POS)_tagging.ipynb
README.md		README.md
Stemming_Types.ipynb		Stemming_Types.ipynb
TOKENIZATION.ipynb		TOKENIZATION.ipynb
Text_Classification_with_NLTK.ipynb		Text_Classification_with_NLTK.ipynb
n_gram_language_model_with_NLTK.ipynb		n_gram_language_model_with_NLTK.ipynb
spelling_correction.ipynb		spelling_correction.ipynb
stop_word_removal_method_using_different_libraries.ipynb		stop_word_removal_method_using_different_libraries.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Natural Language Processing (NLP) with NLTK and spaCy 🤖📚

Getting Started 🚀

Prerequisites 🛠️

Files in the Repository 📂

Topics Covered 📖

1. Tokenization ✂️

2. N-gram Language Model with NLTK 📈

3. Named Entity Recognition (NER) 🌍

4. Part-of-Speech (POS) Tagging 🏷️

5. Spelling Correction ✅

6. Stemming Types 🌱

7. Stopword Removal 🛑

8. Text Classification 📊

How to Use the Repository

Additional Resources

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Natural Language Processing (NLP) with NLTK and spaCy 🤖📚

Getting Started 🚀

Prerequisites 🛠️

Files in the Repository 📂

Topics Covered 📖

1. Tokenization ✂️

2. N-gram Language Model with NLTK 📈

3. Named Entity Recognition (NER) 🌍

4. Part-of-Speech (POS) Tagging 🏷️

5. Spelling Correction ✅

6. Stemming Types 🌱

7. Stopword Removal 🛑

8. Text Classification 📊

How to Use the Repository

Additional Resources

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages