A Robust Approach to Open Vocabulary Image Retrieval with Deep Convolutional Neural Networks and Transfer Learning

Enabling computer systems to respond to conversational human language is a challenging problem with wide-ranging applications in the field of robotics and human computer interaction. Specifically, in image searches, humans tend to describe objects in fine-grained detail like color or company, for which conventional retrieval algorithms have shown poor performance. In this paper, a novel approach for open vocabulary image retrieval, capable of selecting the correct candidate image from among a set of distractions given a query in natural language form, is presented. Our methodology focuses on generating a robust set of image-text projections capable of accurately representing any image, with an objective of achieving high recall. To this end, an ensemble of classifiers is trained on ImageNet for representing high-resolution objects, Cifar 100 for smaller resolution images of objects and Caltech 256 for challenging views of everyday objects, for generating category-based projections. In addition to category based projections, we also make use of an image captioning model trained on MS COCO and Google Image Search (GISS) to capture additional semantic/latent information about the candidate images. To facilitate image retrieval, the natural language query and projection results are converted to a common vector representation using word embeddings, with which query-image similarity is computed. The proposed model when benchmarked on the RefCoco dataset, achieved an accuracy of 68.8%, while retrieving semantically meaningful candidate images.

 
 

Outline/Structure of the Talk

Introduction and why is it required

Previous work done in these fields

The approach

The results obtained

Learning Outcome

A new perspective into making speech more efficiently understood by the machines and produce reasonable outputs to us.

Target Audience

Anyone

Prerequisites for Attendees

Basic NLP and CV understanding would do.

schedule Submitted 5 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Dr. Vikas Agrawal
    By Dr. Vikas Agrawal  ~  4 months ago
    reply Reply

    Dear Srivalya: Good to see your IEEE paper. Is there a way you could include a video of yourself introducing the topic and what people can expect to learn please? 

    Warm Regards

    Vikas

    • Srivalya  Elluru
      By Srivalya Elluru  ~  3 months ago
      reply Reply

      Hello Sir, 
      I have uploaded the intro and expected learning video section and also a complete project explanation in the links section.

      Please do check it out.

      Thank You,

      Srivalya Elluru

    • Srivalya  Elluru
      By Srivalya Elluru  ~  4 months ago
      reply Reply

      Thank you, Sure :)

      Should the video explain the complete workout of the paper or just briefing up and what people can learn from this is enough?