A Robust Approach to Open Vocabulary Image Retrieval with Deep Convolutional Neural Networks and Transfer Learning

Enabling computer systems to respond to conversational human language is a challenging problem with wide-ranging applications in the field of robotics and human computer interaction. Specifically, in image searches, humans tend to describe objects in fine-grained detail like color or company, for which conventional retrieval algorithms have shown poor performance. In this paper, a novel approach for open vocabulary image retrieval, capable of selecting the correct candidate image from among a set of distractions given a query in natural language form, is presented. Our methodology focuses on generating a robust set of image-text projections capable of accurately representing any image, with an objective of achieving high recall. To this end, an ensemble of classifiers is trained on ImageNet for representing high-resolution objects, Cifar 100 for smaller resolution images of objects and Caltech 256 for challenging views of everyday objects, for generating category-based projections. In addition to category based projections, we also make use of an image captioning model trained on MS COCO and Google Image Search (GISS) to capture additional semantic/latent information about the candidate images. To facilitate image retrieval, the natural language query and projection results are converted to a common vector representation using word embeddings, with which query-image similarity is computed. The proposed model when benchmarked on the RefCoco dataset, achieved an accuracy of 68.8%, while retrieving semantically meaningful candidate images.


Outline/Structure of the Talk

Introduction and why is it required

Previous work done in these fields

The approach

The results obtained

Learning Outcome

A new perspective into making speech more efficiently understood by the machines and produce reasonable outputs to us.

Target Audience


Prerequisites for Attendees

Basic NLP and CV understanding would do.

schedule Submitted 9 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Dr. Vikas Agrawal
    By Dr. Vikas Agrawal  ~  8 months ago
    reply Reply

    Dear Srivalya: Good to see your IEEE paper. Is there a way you could include a video of yourself introducing the topic and what people can expect to learn please? 

    Warm Regards


    • Srivalya  Elluru
      By Srivalya Elluru  ~  7 months ago
      reply Reply

      Hello Sir, 
      I have uploaded the intro and expected learning video section and also a complete project explanation in the links section.

      Please do check it out.

      Thank You,

      Srivalya Elluru

    • Srivalya  Elluru
      By Srivalya Elluru  ~  8 months ago
      reply Reply

      Thank you, Sure :)

      Should the video explain the complete workout of the paper or just briefing up and what people can learn from this is enough?

  • Liked Suvro Shankar Ghosh

    Suvro Shankar Ghosh - Learning Entity embedding’s form Knowledge Graph

    45 Mins
    Case Study
    • Over a period of time, a lot of Knowledge bases have evolved. A knowledge base is a structured way of storing information, typically in the following form Subject, Predicate, Object
    • Such Knowledge bases are an important resource for question answering and other tasks. But they often suffer from their incompleteness to resemble all the data in the world, and thereby lack of ability to reason over their discrete Entities and their unknown relationships. Here we can introduce an expressive neural tensor network that is suitable for reasoning over known relationships between two entities.
    • With such a model in place, we can ask questions, the model will try to predict the missing data links within the trained model and answer the questions, related to finding similar entities, reasoning over them and predicting various relationship types between two entities, not connected in the Knowledge Graph.
    • Knowledge Graph infoboxes were added to Google's search engine in May 2012

    What is the knowledge graph?

    ▶Knowledge in graph form!

    ▶Captures entities, attributes, and relationships

    More specifically, the “knowledge graph” is a database that collects millions of pieces of data about keywords people frequently search for on the World wide web and the intent behind those keywords, based on the already available content

    ▶In most cases, KGs is based on Semantic Web standards and have been generated by a mixture of automatic extraction from text or structured data, and manual curation work.

    ▶Structured Search & Exploration
    e.g. Google Knowledge Graph, Amazon Product Graph

    ▶Graph Mining & Network Analysis
    e.g. Facebook Entity Graph

    ▶Big Data Integration
    e.g. IBM Watson

    ▶Diffbot, GraphIQ, Maana, ParseHub, Reactor Labs, SpazioDati

  • 45 Mins

    Today more than 3 billion people are using social media and using it as a medium to express their real feelings which makes different social media platform like Facebook, Twitter etc. an ideal source for capturing interest of users. Obviously, data mined from social media alone cannot be used to achieve target i.e. predict user's Interest, it needs some form of supervision.
    Our talk propose how Semantic web a.k.a Knowledge bases add supervision into system and can prove helpful to predict user's Interest given social media data. Once, User's Interest is captured, it can be widely used for many purposes like Recommendation system, campaigning, analytics over user interests etc.

    Keywords: Knowledge systems, linked data, OpenIE, NLP, Semantic Web, User Interest, SPARQL.

  • Liked Aakash Goel

    Aakash Goel / Ankit Kalra - Detect Workout Pose for Virtual Gym using CNN

    45 Mins

    Approximately 80% of the people across globe do not use gym, yet they pay $30 to $125/month.Attrition from gym is linked with discouraging results and lack of engagement. Traditional gym users don’t know proper exercise regimen and users prefer workout regimens that are fun, customizable and social.

    To combat above problem, we came up with idea to provide customized fitness solutions using Artificial Intelligence. In this talk, we showcase how we can leverage Deep Learning based Architectures like CNN to develop "Workout pose detection" that tracks user movement and classify it corresponding to specific trained workout and will determine whether the performed pose is correct or wrong.

    Keywords: CNN, Deep Learning, Image classification Model, Computer Vision.

  • Liked Debjyoti Paul

    Debjyoti Paul - Transfer Learning in Unsupervised text processing

    Debjyoti Paul
    Debjyoti Paul
    Assistant Lead Data Scienties
    schedule 9 months ago
    Sold Out!
    45 Mins

    Today we are facing enormous amount of unstructured textual data. Given a text processing problem, how to start? What models to build language model with? Can models trained in similar domains be exploited. These are some trailing questions.

    1. When and how to use Transfer Learning- new vocabulary? 2. Challenges in Text processing and Transfer Learning 3. Effectively method selection for transfer learning 4. Applications 5. How to validate your model?

    Presentation on Aspect detection in unsupervised domain using Transfer Learning from structure prediction.

  • Liked Dr Hari Krishna Maram

    Dr Hari Krishna Maram - Future of Technology

    Dr Hari Krishna Maram
    Dr Hari Krishna Maram
    Vision Digital
    schedule 9 months ago
    Sold Out!
    20 Mins

    Future of Technology covered trends in technology across the globe and innovation changing the future