This talk focuses on the topic of querying industry grade big data systems. Enterprises have vast amount of information spread across structured data stores (relational databases, data warehouses, etc.). Descriptive analytics over this data is limited to experts familiar with complex querying languages (e.g., Structured Query Language) as well as metadata and schema associated with such large datastores. The ability to convert natural language questions to SQL statements would make descriptive analytics and reporting much easier and widespread. Problem of automatically converting natural language questions to SQL is well studied, viz., Natural Language Interface to Databases (NLIDB). We present our work on an end-to-end (E2E) system focussed on NLIDB.

We describe two main aspects of E2E NLIDB systems: i) Converting natural language to structured language and ii) understanding natural language. There is a plenitude of applications of such E2E systems across domains e.g., healthcare, finance, logistics, etc.


Outline/Structure of the Talk

  • Background (1 min)
  • Problem Statement (2 mins)
    • Specific questions
      • Unstructured to Structured language
      • Understanding natural language
  • Challenges (2 mins)
    • Main challenges for e.g.
      • Problem complexity
      • Ambiguous human language
  • Solutions/Methodology (6 mins)
    • Pipeline
    • Different Models
  • Experiments (4 mins)
  • Findings (3 mins)
  • Summary and Conclusion (2 mins)

Learning Outcome

As the talk will first cover the basics of the topic, and present the background, problem statement and then the core topics in an incremental manner, there will be multiple learning outcomes as listed below:

  • Participants will learn on the topic of querying industry grade big data system
  • Participants will learn an overview of an end-to-end system to translate unstructured language to a structured query
  • Participants will experience application of natural language processing and understanding to handle and interpret user queries effectively
  • Participants will gain knowledge on the findings of the system built within out enterprise to query industry grade big data system

Target Audience

As this topic is of eminent interest there are two main types of audience: Participants interested to learn more on: i) How to query databases and records (from huge structured data stores) effectively? and ii) How to handle user generated natural language queries for industry grade big data systems ? Participants interested or working in the direction of natural language processing (NLP) and understanding, and its applications for big data systems.

Prerequisites for Attendees

We will start with the basics and then move towards the core aspects of the talk. All interested people are welcome to attend. There is no specific prerequisite for attendees, however a bit of knowledge on natural language processing and databases will be helpful.


schedule Submitted 1 year ago

Public Feedback