Dataset :
Hindi English Parallel Corpus consisting of 15k pairs of sentences in Hindi and English.
Pre-processing steps performed are :
- Removing special characters, quotes and extra space
- Adding start and end tokens in both English and Hindi
- Obtaining the Hindi and English Vocabulary
Model Architecture :
Used a Encoder Decoder Architecture consisting of LSTM modules and Attention mechanism for capturing long range dependencies.
Model Evaulation :
Used bleu score metric of NLTK module to detect accuracies of model.