Skip to content

Latest commit

 

History

History
18 lines (12 loc) · 639 Bytes

File metadata and controls

18 lines (12 loc) · 639 Bytes

MachineTranslation

Dataset :
Hindi English Parallel Corpus consisting of 15k pairs of sentences in Hindi and English.


Pre-processing steps performed are :
  • Removing special characters, quotes and extra space
  • Adding start and end tokens in both English and Hindi
  • Obtaining the Hindi and English Vocabulary

Model Architecture :
Used a Encoder Decoder Architecture consisting of LSTM modules and Attention mechanism for capturing long range dependencies.

Model Evaulation :
Used bleu score metric of NLTK module to detect accuracies of model.