An Empirical Approach for tackling NLP Tasks
Language is one of the most unstructured forms of data available today in abundance. Solving any NLP task like classification, sequence labelling, topic modelling, saliency etc, needs an approach that is structured and iterative. NLP pipelines are sensitive to small changes and design decisions, as preprocessing is expensive and time consuming.
We present the problem of Entity Saliency in NLP with the help of a case-study to build a news/product recommendation system and the role of saliency to improve recommendations. We showcase the practice of breaking a complex problem into smaller sub-problems, making continuous measurable progress without leaving any pesky bugs in the process. It’s important to carefully break a problem statement and solve chunks in NLP to create a production grade consistent Pipeline.
The use case of Entity Saliency is chosen as it can be used to enhance various downstream NLP tasks such as clustering,similarity measurement, text classification, information retrieval, text summarization, content recommendation, SEO tools, etc. It’s an under-researched topic on which we would like to bring the community’s focus
Outline/Structure of the Talk
Part 1: Understanding NLP for Business (5 min)
- Defining upstream/downstream tasks, and their applications for Business Problems.
- Establishing data reliability, treatment of data inconsistencies.
- Define success criteria and user-groups for evaluation.
- Key takeaway – NLP problem framing for Business.
Part 2: Saliency in NLP (8 min)
- Understanding Entity Saliency as an NLP Task & it’s importance.
- Discussing about the current research & datasets available for it.
- Use Case : A Recommendation Engine with Entity Salience for news/products.
- Dividing tasks & solutioning (NER,Saliency,NEL,KB)
- Key takeaways – Leveraging Entity Saliency for your applications.
Part 3: Modelling | Evaluation | Continuous Improvement (7 min)
- Benchmarking SOTA and iterative baseline modelling for tasks on custom data.
- Defining intrinsic/extrinsic evaluation.
- Measuring data shift & model performance over time
- Key Takeaway – Structured Modelling & Evaluation for the Problem Statement
Link to Entity Salience : https://confengine.com/odsc-india-2020/proposal/14379/nlp-modelling-of-salience-in-literature-as-humans
Key takeaways from this presentation
- Approach to frame & handle NLP problem statements wrt Business
- Understanding Entity Saliency and it’s importance in the NLP Pipeline.
- Incremental Modelling & Continuous Improvement for production systems.
Data Scientist, AI Enthusiasts, NLP Practitioner, Tech Leads/Managers of NLP Application, NLP Enthusiasts & Engineers
Prerequisites for Attendees
Basic understanding of Machine Learning and NLP is required. Being familiar with some standard NLP problem statements like NER, Text Classification, Information Retrieval is good to have.
schedule Submitted 1 year ago
People who liked this proposal, also liked:
Amit Agarwal / Ishant Wankhede - NLP Modelling of Salience in Literature as HumansAmit AgarwalData ScientistJioIshant WankhedeData ScientistAbzooba
schedule 1 year agoSold Out!
In language, a document talks about and mentions entities, topics which are coherent, building a context around the main topic/idea of the document. This rich co-relation between entities makes it increasingly hard to identify the right concepts/topics within a document. Salient entities are those that human readers deem most relevant to the document. [Dunietz et al]
Entity Saliency refers to the importance of an Entity in a document. Humans can easily recognise the words that contribute to the meaning of a sentence (i.e. content words) from words that serve only a grammatical functionality (i.e. functional words).
Accurately computing the salience of words, we can develop better representations of texts that can be used in downstream NLP tasks such as similarity measurement, text classification, information retrieval, text summarization, content recommendation, SEO tools, etc.
We present a conceptual overview of Entity Saliency as a pre-task and its usage in NLP based solutions followed by approaches on solving it as per the domain along with its challenges. We also share novel approaches on evaluation and how it works in conjunction with Entity Linking, Entity Disambiguation & Domain Knowledge bases.