Automatically describing the content of an image is a fundamental problem in AI that connects CV and NLP. So our goal is, given an image, the method can output an English sentence describing it’s content, we present a generative model based on a DNN architecture that combines recent advances in CV and machine translation.
The code contains the whole procoess of pre processing, feature extraction, making vocabulary, converting words to indexes, training the model, evaluating the model, predictions.