Skip to content

Sree118/Scene_description

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

Scene_Explainer using AI

"A picture is worth a thousand words" is an idiom that conveys its meaning as 'Seeing something is better for learning than having it described'. But what if you see something(image/scene) for the first time and your brain can't analyze what is it?

Don't worry! An automatic AI model which generators caption or explain the scene is all you need to analyze smth you see for the first time.

This project is all about generating captions by extracting the features from the images and predicting the captions from the model.

Dataset

The MS COCO (Microsoft Common Objects in Context) dataset is large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images. The 2014 training/validation split is 118K/5K and the test set is a subset of 41K images. The dataset has annotations for captioning: natural language descriptions of the images. You can download the dataset here https://cocodataset.org/#download

Screenshot (268)

Model Overview

Resnet50 is used for image classification and for the extraction of features.

Screenshot (369)

The attention Based Mechanism is used for caption generation. I have used Bahdanau’s Attention or Local Attention. The attention mechanism is focusing on the relevant part of the image, so the decoder only uses specific parts of the image. Screenshot (373)

Result

Screenshot (370)

Screenshot (371)

Screenshot (372)

Note :

This is a phase 1 model, as I will be improving it using transforms or GPT3 model

About

Scene Description/explanation using AI

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published