A Robust Approach to Open Vocabulary Image Retrieval with Deep Convolutional Neural Networks and Transfer Learning

Enabling computer systems to respond to conversational human language is a challenging problem with wide-ranging applications in the field of robotics and human computer interaction. Specifically, in image searches, humans tend to describe objects in fine-grained detail like color or company, for which conventional retrieval algorithms have shown poor performance. In this paper, a novel approach for open vocabulary image retrieval, capable of selecting the correct candidate image from among a set of distractions given a query in natural language form, is presented. Our methodology focuses on generating a robust set of image-text projections capable of accurately representing any image, with an objective of achieving high recall. To this end, an ensemble of classifiers is trained on ImageNet for representing high-resolution objects, Cifar 100 for smaller resolution images of objects and Caltech 256 for challenging views of everyday objects, for generating category-based projections. In addition to category based projections, we also make use of an image captioning model trained on MS COCO and Google Image Search (GISS) to capture additional semantic/latent information about the candidate images. To facilitate image retrieval, the natural language query and projection results are converted to a common vector representation using word embeddings, with which query-image similarity is computed. The proposed model when benchmarked on the RefCoco dataset, achieved an accuracy of 68.8%, while retrieving semantically meaningful candidate images.


Outline/Structure of the Talk

Introduction and why is it required

Previous work done in these fields

The approach

The results obtained

Learning Outcome

A new perspective into making speech more efficiently understood by the machines and produce reasonable outputs to us.

Target Audience


Prerequisites for Attendees

Basic NLP and CV understanding would do.

schedule Submitted 1 year ago

Public Feedback

    • 90 Mins

      Machine learning and deep learning have been rapidly adopted in various spheres of medicine such as discovery of drug, disease diagnosis, Genomics, medical imaging and bioinformatics for translating biomedical data into improved human healthcare. Machine learning/deep learning based healthcare applications assist physicians to make faster, cheaper and more accurate diagnosis.

      We have successfully developed three deep learning based healthcare applications and are currently working on two more healthcare related projects. In this workshop, we will discuss one healthcare application titled "Deep Learning based Craniofacial Distance Measurement for Facial Reconstructive Surgery" which is developed by us using TensorFlow. Craniofacial distances play important role in providing information related to facial structure. They include measurements of head and face which are to be measured from image. They are used in facial reconstructive surgeries such as cephalometry, treatment planning of various malocclusions, craniofacial anomalies, facial contouring, facial rejuvenation and different forehead surgeries in which reliable and accurate data are very important and cannot be compromised.

      Our discussion on healthcare application will include precise problem statement, the major steps involved in the solution (deep learning based face detection & facial landmarking and craniofacial distance measurement), data set, experimental analysis and challenges faced & overcame to achieve this success. Subsequently, we will provide hands-on exposure to implement this healthcare solution using TensorFlow. Finally, we will briefly discuss the possible extensions of our work and the future scope of research in healthcare sector.

    • Liked Suvro Shankar Ghosh

      Suvro Shankar Ghosh - Learning Entity embedding’s form Knowledge Graph

      45 Mins
      Case Study
      • Over a period of time, a lot of Knowledge bases have evolved. A knowledge base is a structured way of storing information, typically in the following form Subject, Predicate, Object
      • Such Knowledge bases are an important resource for question answering and other tasks. But they often suffer from their incompleteness to resemble all the data in the world, and thereby lack of ability to reason over their discrete Entities and their unknown relationships. Here we can introduce an expressive neural tensor network that is suitable for reasoning over known relationships between two entities.
      • With such a model in place, we can ask questions, the model will try to predict the missing data links within the trained model and answer the questions, related to finding similar entities, reasoning over them and predicting various relationship types between two entities, not connected in the Knowledge Graph.
      • Knowledge Graph infoboxes were added to Google's search engine in May 2012

      What is the knowledge graph?

      ▶Knowledge in graph form!

      ▶Captures entities, attributes, and relationships

      More specifically, the “knowledge graph” is a database that collects millions of pieces of data about keywords people frequently search for on the World wide web and the intent behind those keywords, based on the already available content

      ▶In most cases, KGs is based on Semantic Web standards and have been generated by a mixture of automatic extraction from text or structured data, and manual curation work.

      ▶Structured Search & Exploration
      e.g. Google Knowledge Graph, Amazon Product Graph

      ▶Graph Mining & Network Analysis
      e.g. Facebook Entity Graph

      ▶Big Data Integration
      e.g. IBM Watson

      ▶Diffbot, GraphIQ, Maana, ParseHub, Reactor Labs, SpazioDati

    • Liked Favio Vázquez

      Favio Vázquez - Complete Data Science Workflows with Open Source Tools

      90 Mins

      Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.

    • Liked Aakash Goel

      Aakash Goel / Ankit Kalra - Detect Workout Pose for Virtual Gym using CNN

      45 Mins

      Approximately 80% of the people across globe do not use gym, yet they pay $30 to $125/month.Attrition from gym is linked with discouraging results and lack of engagement. Traditional gym users don’t know proper exercise regimen and users prefer workout regimens that are fun, customizable and social.

      To combat above problem, we came up with idea to provide customized fitness solutions using Artificial Intelligence. In this talk, we showcase how we can leverage Deep Learning based Architectures like CNN to develop "Workout pose detection" that tracks user movement and classify it corresponding to specific trained workout and will determine whether the performed pose is correct or wrong.

      Keywords: CNN, Deep Learning, Image classification Model, Computer Vision.

    • 45 Mins

      Today more than 3 billion people are using social media and using it as a medium to express their real feelings which makes different social media platform like Facebook, Twitter etc. an ideal source for capturing interest of users. Obviously, data mined from social media alone cannot be used to achieve target i.e. predict user's Interest, it needs some form of supervision.
      Our talk propose how Semantic web a.k.a Knowledge bases add supervision into system and can prove helpful to predict user's Interest given social media data. Once, User's Interest is captured, it can be widely used for many purposes like Recommendation system, campaigning, analytics over user interests etc.

      Keywords: Knowledge systems, linked data, OpenIE, NLP, Semantic Web, User Interest, SPARQL.

    • Liked Pankaj Kumar

      Pankaj Kumar / Abinash Panda / Usha Rengaraju - Quantitative Finance :Global macro trading strategy using Probabilistic Graphical Models

      90 Mins

      { This is a handson workshop in pgmpy package. The creator of pgmpy package Abinash Panda will do the code demonstration }

      Crude oil plays an important role in the macroeconomic stability and it heavily influences the performance of the global financial markets. Unexpected fluctuations in the real price of crude oil are detrimental to the welfare of both oil-importing and oil-exporting economies.Global macro hedge-funds view forecast the price of oil as one of the key variables in generating macroeconomic projections and it also plays an important role for policy makers in predicting recessions.

      Probabilistic Graphical Models can help in improving the accuracy of existing quantitative models for crude oil price prediction as it takes in to account many different macroeconomic and geopolitical variables .

      Hidden Markov Models are used to detect underlying regimes of the time-series data by discretising the continuous time-series data. In this workshop we use Baum-Welch algorithm for learning the HMMs, and Viterbi Algorithm to find the sequence of hidden states (i.e. the regimes) given the observed states (i.e. monthly differences) of the time-series.

      Belief Networks are used to analyse the probability of a regime in the Crude Oil given the evidence as a set of different regimes in the macroeconomic factors . Greedy Hill Climbing algorithm is used to learn the Belief Network, and the parameters are then learned using Bayesian Estimation using a K2 prior. Inference is then performed on the Belief Networks to obtain a forecast of the crude oil markets, and the forecast is tested on real data.

    • Liked Dr Hari Krishna Maram

      Dr Hari Krishna Maram - Future of Technology

      Dr Hari Krishna Maram
      Dr Hari Krishna Maram
      Vision Digital
      schedule 1 year ago
      Sold Out!
      20 Mins

      Future of Technology covered trends in technology across the globe and innovation changing the future

    • Liked Debjyoti Paul

      Debjyoti Paul - Transfer Learning in Unsupervised text processing

      45 Mins

      Today we are facing enormous amount of unstructured textual data. Given a text processing problem, how to start? What models to build language model with? Can models trained in similar domains be exploited. These are some trailing questions.

      1. When and how to use Transfer Learning- new vocabulary? 2. Challenges in Text processing and Transfer Learning 3. Effectively method selection for transfer learning 4. Applications 5. How to validate your model?

      Presentation on Aspect detection in unsupervised domain using Transfer Learning from structure prediction.