Image and Video Captioning

Project Overview

I worked on this project during my participation in the Artificial Intelligence and Multi-Agent Systems Summer School at the AI-MAS Laboratory, University Politehnica of Bucharest. The focus was on improving the existing model for image and video captioning, through curriculum learning.

Key Features

Exploring the use of different learning strategies that use curriculum learning
Increasing difficulty through: caption length, syntactic structure, vocabulary difficulty
Evaluating comprehensively on standard datasets (Flickr8k, MSVD)

Results

The implemented models achieved competitive performance on standard benchmarks, with particular improvements in BLEU scores on the Flickr8k dataset. However, as a result of training on simpler data for longer, the predicted captions were more generic and less descriptive.

Learning Outcomes

This project provided valuable hands-on experience with cutting-edge deep learning research, combining computer vision and natural language processing. It enhanced my understanding of both fields.

Technologies Used

Project Overview

Key Features

Results

Learning Outcomes