NLP Modelling of Salience in Literature as Humans
In language, a document talks about and mentions entities, topics which are coherent, building a context around the main topic/idea of the document. This rich co-relation between entities makes it increasingly hard to identify the right concepts/topics within a document. Salient entities are those that human readers deem most relevant to the document. [Dunietz et al]
Entity Saliency refers to the importance of an Entity in a document. Humans can easily recognise the words that contribute to the meaning of a sentence (i.e. content words) from words that serve only a grammatical functionality (i.e. functional words).
Accurately computing the salience of words, we can develop better representations of texts that can be used in downstream NLP tasks such as similarity measurement, text classification, information retrieval, text summarization, content recommendation, SEO tools, etc.
We present a conceptual overview of Entity Saliency as a pre-task and its usage in NLP based solutions followed by approaches on solving it as per the domain along with its challenges. We also share novel approaches on evaluation and how it works in conjunction with Entity Linking, Entity Disambiguation & Domain Knowledge bases.
Outline/Structure of the Talk
Part 1: Understanding: Entity Saliency in NLP (5 min)
- Defining the problem of Saliency in NLP
- Exploring the Entity Saliency as an NLP Task.
- Discussing about the current research & datasets available for it.
- Key takeaway – Importance of Saliency in NLP
Part 2: Application of Entity Saliency in Real World (5 min)
- Why & How to use Entity Saliency?
- Market Leaders & applications of Entity Saliency.
- How can you use it improve your product?
- Key takeaways – Leveraging Entity Saliency for your applications
Part 3: Modelling & Evaluation (6 min)
- Creating baseline models for your domain task.
- De-linking dependency on Knowledge Bases & Entity Linking/dis-ambiguation
- Moving ahead of F1 to evaluate your model.
- Ensuring Lightweight model for user-facing interfaces.
- Ensuring explain-ability.
- Key Takeaway – Custom Modelling as per task requirement.
Part 4: Questions (4 min)
- Time to answer any queries from the audience.
Key takeaways from this talk
- Understanding importance of Entity Saliency as an NLP Problem
- Real-world application of Entity Saliency
- Custom Modelling & Evaluation to improve downstream NLP Solutions
Data Scientist, AI Enthusiasts, NLP Practitioner, Tech Leads/Managers of NLP Application, NLP Enthusiasts & Engineers
Prerequisites for Attendees
Basic understanding of Machine Learning and NLP is required. Being familiar to some standard NLP problem statements like NER, Text Classification, Information Retrieval is good to have.
schedule Submitted 1 year ago
People who liked this proposal, also liked:
Ishant Wankhede / Amit Agarwal - An Empirical Approach for tackling NLP TasksIshant WankhedeData ScientistAbzoobaAmit AgarwalData ScientistJio
schedule 1 year agoSold Out!
Language is one of the most unstructured forms of data available today in abundance. Solving any NLP task like classification, sequence labelling, topic modelling, saliency etc, needs an approach that is structured and iterative. NLP pipelines are sensitive to small changes and design decisions, as preprocessing is expensive and time consuming.
We present the problem of Entity Saliency in NLP with the help of a case-study to build a news/product recommendation system and the role of saliency to improve recommendations. We showcase the practice of breaking a complex problem into smaller sub-problems, making continuous measurable progress without leaving any pesky bugs in the process. It’s important to carefully break a problem statement and solve chunks in NLP to create a production grade consistent Pipeline.
The use case of Entity Saliency is chosen as it can be used to enhance various downstream NLP tasks such as clustering,similarity measurement, text classification, information retrieval, text summarization, content recommendation, SEO tools, etc. It’s an under-researched topic on which we would like to bring the community’s focus