Deep learning has significantly improved state-of-the-art performance for natural language processing (NLP) tasks, but each one is typically studied in isolation. The Natural Language Decathlon (decaNLP) is a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. By requiring a single system to perform ten disparate natural language tasks, decaNLP offers a unique setting for multitask, transfer, and continual learning. decaNLP is maintained by salesforce and is publicly available on github in order to use for tasks like Question Answering, Machine Translation, Summarization, Sentiment Analysis etc.

 
 

Outline/Structure of the Talk

  • Introduction to DecaNLP
  • Objectives
  • Motivation
  • Innovativeness
  • Targeted NLP Tasks
  • Impact
  • Open Source Collaboration on github
  • Patents / Publications in NLP, Computer Vision, AI.

Learning Outcome

People will be able to understand different problems of NLP like:

1. Question Answering

2. Machine Translation

3. Summarization

4. Natural Language Inference

5. Sentiment Analysis

6. Semantic Role Labeling

7. Relation Extraction

8. Goal-Oriented Dialogue

9. Semantic Parsing

10. Commonsense Reasoning

People will know about a unified Framework provided by decaNLP to solve different NLP tasks mentioned above.

Target Audience

People having basic knowledge of NLP, Machine Learning and Deep Learning.

Prerequisites for Attendees

Read basic stuff about NLP, Machine Learning, Deep Learning.

schedule Submitted 1 year ago

  • Liked Kabir Rustogi
    keyboard_arrow_down

    Kabir Rustogi - Generation of Locality Polygons using Open Source Road Network Data and Non-Linear Multi-classification Techniques

    Kabir Rustogi
    Kabir Rustogi
    Head - Data Sciences
    Delhivery
    schedule 1 year ago
    Sold Out!
    45 Mins
    Case Study
    Intermediate

    One of the principal problems in the developing world is the poor localization of its addresses. This inhibits discoverability of local trade, reduces availability of amenities such as creation of bank accounts and delivery of goods and services (e.g., e-commerce) and delays emergency services such as fire brigades and ambulances. In general, people in the developing World identify an address based on neighbourhood/locality names and points of interest (POIs), which are neither standardized nor any official records exist that can help in locating them systematically. In this paper, we describe an approach to build accurate geographical boundaries (polygons) for such localities.

    As training data, we are provided with two pieces of information for millions of address records: (i) a geocode, which is captured by a human for the given address, (ii) set of localities present in that address. The latter is determined either by manual tagging or by using an algorithm which is able to take a raw address string as input and output meaningful locality information present in that address. For example, for the address, “A-161 Raheja Atlantis Sector 31 Gurgaon 122002”, its geocode is given as (28.452800, 77.045903), and the set of localities present in that address is given as (Raheja Atlantis, Sector 31, Gurgaon, Pin-code 122002). Development of this algorithm are part of any other project we are working on; details about the same can be found here.

    Many industries, such as the food-delivery industry, courier-delivery industry, KYC (know-your-customer) data-collection industry, are likely to have huge amounts of such data. Such crowdsourced data usually contain large a amount of noise, acquired either due to machine/human error in capturing the geocode, or due to error in identifying the correct set of localities from a poorly written address. For example, for the address, “Plot 1000, Sector 31 opposite Sector 40 road, Gurgaon 122002”, a machine may output the set of localities present in this address as (Sector 31, Sector 40, Gurgaon, Pin-code 122002), even though it is clear that the address does not lie in Sector 40.

    The solution described in this paper is expected to consume the provided data and output polygons for each of the localities identified in the address data. We assume that the localities for which we must build polygons are non-overlapping, e.g., this assumption is true for pin-codes. The problem is solved in two phases.

    In the first phase, we separate the noisy points from the points that lie within a locality. This is done by formulating the problem as a non-linear multi-classification problem. The latitudes and longitudes of all non-overlapping localities act as features, and their corresponding locality name acts as a label, in the training data. The classifier is expected to partition the 2D space containing the latitudes and longitudes of the union of all non-overlapping localities into disjoint regions corresponding to each locality. These partitions are defined as non-linear boundaries, which are obtained by optimizing for two objectives: (i) the area enclosed by the boundaries should maximize the number of points of the corresponding locality and minimize the number of points of other localities, (ii) the separation boundary should be smooth. We compare two algorithms, decision trees and neural networks for creating such partitions.

    In the second phase, we extract all the points that satisfy the partition constraints, i.e., lie within the boundary of a locality L, as candidate points, for generating the polygon for locality L. The resulting polygon must contain all candidate points and should have the minimum possible area while maintaining the smoothness of the polygon boundary. This objective can be achieved by algorithms such as concave hull. However, since localities are always bounded by roads, we have further enhanced our locality polygons by leveraging open source data of road networks. To achieve this, we solve a non-linear optimisation problem which decides the set of roads to be selected, so that the enclosed area is minimized, while ensuring that all the candidate points lie within the enclosed area. The output of this optimisation problem is a set of roads, which represents the boundary of a locality L.

  • Liked Ishita Mathur
    keyboard_arrow_down

    Ishita Mathur - How GO-FOOD built a Query Semantics Engine to help you find food faster

    Ishita Mathur
    Ishita Mathur
    Data Scientist
    Gojek Tech
    schedule 1 year ago
    Sold Out!
    45 Mins
    Case Study
    Beginner

    Context: The Search problem

    GOJEK is a SuperApp: 19+ apps within an umbrella app. One of these is GO-FOOD, the first food delivery service in Indonesia and the largest food delivery service in Southeast Asia. There are over 300 thousand restaurants on the platform with a total of over 16 million dishes between them.

    Over two-thirds of those who order food online using GO-FOOD do so by utilising text search. Search engines are so essential to our everyday digital experience that we don’t think twice when using them anymore. Search engines involve two primary tasks: retrieval of documents and ranking them in order of relevance. While improving that ranking is an extremely important part of improving the search experience, actually understanding that query helps give the searcher exactly what they’re looking for. This talk will show you what we are doing to make it easy for users to find what they want.

    GO-FOOD uses the ElasticSearch stack with restaurant and dish indexes to search for what the user types. However, this results in only exact text matches and at most, fuzzy matches. We wanted to create a holistic search experience that not only personalised search results, but also retrieved restaurants and dishes that were more relevant to what the user was looking for. This is being done by not only taking advantage of ElasticSearch features, but also developing a Query semantics engine.

    Query Understanding: What & Why

    This is where Query Understanding comes into the picture: it’s about using NLP to correctly identify the search intent behind the query and return more relevant search results, it’s about the interpretation process even before the results are even retrieved and ranked. The semantic neighbours of the query itself become the focus of the search process: after all, if I don’t understand what you’re trying to ask for, how will I give you what you want?

    In the duration of this talk, you will learn about how we are taking advantage of word embeddings to build a Query Understanding Engine that is holistically designed to make the customer’s experience as smooth as possible. I will go over the techniques we used to build each component of the engine, the data and algorithmic challenges we faced and how we solved each problem we came across.

  • Liked Joy Mustafi
    keyboard_arrow_down

    Joy Mustafi / Aditya Bhattacharya - Person Identification via Multi-Modal Interface with Combination of Speech and Image Data

    90 Mins
    Workshop
    Intermediate

    Multi-Modalities

    Having multiple modalities in a system gives more affordance to users and can contribute to a more robust system. Having more also allows for greater accessibility for users who work more effectively with certain modalities. Multiple modalities can be used as backup when certain forms of communication are not possible. This is especially true in the case of redundant modalities in which two or more modalities are used to communicate the same information. Certain combinations of modalities can add to the expression of a computer-human or human-computer interaction because the modalities each may be more effective at expressing one form or aspect of information than others. For example, MUST researchers are working on a personalized humanoid built and equipped with various types of input devices and sensors to allow them to receive information from humans, which are interchangeable and a standardized method of communication with the computer, affording practical adjustments to the user, providing a richer interaction depending on the context, and implementing robust system with features like; keyboard; pointing device; touchscreen; computer vision; speech recognition; motion, orientation etc.

    There are six types of cooperation between modalities, and they help define how a combination or fusion of modalities work together to convey information more effectively.

    • Equivalence: information is presented in multiple ways and can be interpreted as the same information
    • Specialization: when a specific kind of information is always processed through the same modality
    • Redundancy: multiple modalities process the same information
    • Complimentarity: multiple modalities take separate information and merge it
    • Transfer: a modality produces information that another modality consumes
    • Concurrency: multiple modalities take in separate information that is not merged

    Computer - Human Modalities

    Computers utilize a wide range of technologies to communicate and send information to humans:

    • Vision - computer graphics typically through a screen
    • Audition - various audio outputs

    Project Features

    Adaptive: They MUST learn as information changes, and as goals and requirements evolve. They MUST resolve ambiguity and tolerate unpredictability. They MUST be engineered to feed on dynamic data in real time.

    Interactive: They MUST interact easily with users so that those users can define their needs comfortably. They MUST interact with other processors, devices, services, as well as with people.

    Iterative and Stateful: They MUST aid in defining a problem by asking questions or finding additional source input if a problem statement is ambiguous or incomplete. They MUST remember previous interactions in a process and return information that is suitable for the specific application at that point in time.

    Contextual: They MUST understand, identify, and extract contextual elements such as meaning, syntax, time, location, appropriate domain, regulation, user profile, process, task and goal. They may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided).

    Project Demos

    Multi-Modal Interaction: https://www.youtube.com/watch?v=jQ8Gq2HWxiA

    Gesture Detection: https://www.youtube.com/watch?v=rDSuCnC8Ei0

    Speech Recognition: https://www.youtube.com/watch?v=AewM3TsjoBk

    Assignment (Hands-on Challenge for Attendees)

    Real-time multi-modal access control system for authorized access to work environment - All the key concepts and individual steps will be demonstrated and explained in this workshop, and the attendees need to customize the generic code or approach for this assignment or hands-on challenge.