Skip to content

Given an Image Genrating Description using Flickr8K dataset

Notifications You must be signed in to change notification settings

Dibyanshu-gtm/ImageCaptioning

Repository files navigation

Image Caption Generation

Image Caption Generation is one of the classic AI problem that uses both domains from NLP and CV making it a really interesting project. Objective of the system is to generate a caption( A one line description) about an Image which is accurate as much as possible. Caption generation is a challenging artificial intelligence problem where a textual description must be generated for a given image input. It requires methods from both computer vision and natural language processing. Computer vision to understand the content and features of the image and natural language processing to turn the understanding of the image into words in the right order. Recently, deep learning methods have achieved state-of-the-art results on examples of this problem.

Please read the complete description and README to be clear about the implementation.

Technical Report is saved here with all the references

Requirements

Minimum Requirements

  1. Python with Keras and other important libraries including tensorflow, numpy et cetera
  2. 4GB RAM
  3. Any Operating System would do
  4. Ipynb editor like Jupyter or Ipython
  5. Intel i3 7th Gen or above

Dataset Requirements

We would be using Flickr8K_ dataset . As the name suggests the particular dataset contains around 8000 images with around 5 captions per image. The reason is because it is realistic and relatively small so that you can download it and build models on your workstation using a CPU.

The Dataset can be downloaded through the request form at this Dataset Request Form Download the datasets and unzip them into your current working directory. You will have two directories:

  • Flickr8k_Dataset: Contains 8092 photographs in JPEG format.
  • Flickr8k_text: Contains a number of files containing different sources of descriptions for the photographs.

Alt Text

The dataset has a pre-defined training dataset (6,000 images), development dataset (1,000 images), and test dataset (1,000 images).

Main Architecture

Model Main Summary CHECK THE REPORT FOR DETAILED DESCRIPTION

: Alt Text

Steps to implement this locally on your system

You can easily implement the project locally on your system easily with the following Steps:

  • Download the Dataset that is linked in the README
  • Now try using clone method to clone this repository in your local system
  • Before implementing see to it that all the path variables are set correctly
  • After setting everything up , You can run the cells
  • Your program would run and generate the needed outputs

Results

Alt Text

Check out my other repos as well. Enjoy and be Safe

About

Given an Image Genrating Description using Flickr8K dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published