Topic Modeling the art of extracting latent topics/themes that exist in a set of documents. In this talk we will discuss the use cases of Topic Modeling, particularly pertaining to Latent Dirichlet Allocation (LDA), and the implementation work by the Data Science Applications team at Meredith for the purposes of designing auto-taggers, classifiers for the topics in the custom enterprise taxonomy against hundreds of thousands of documents. We will talk about the best practices of choosing the optimal number of topics for hundreds of thousands of documents, how named entity extraction is employed to derive context in the feature space, alignment of machine learning techniques to support the work of taxonomists, the integration with the enterprise architecture to support expert assessor population for curating training data for Google’s AutoML and other deep learning capabilities.
Latent semantic analysis has been shown to be ideal for quickly clustering the document space. Applied in a hierarchical manner on top-level clusters to derive child clusters and informed with inputs from the subject matter experts and taxonomists, namely taxonomy terms and synonyms, makes it possible to get a sense of the coverage in the content space against the enterprise taxonomy model.
Where there are shortcomings, additional training data needs to be obtained in order to effectively build auto-tagging solutions. One technique for data augmentation is query formulation, again utilizing entity extraction from owned content along with the taxonomy categories and synonyms, to construct social listening streams to surface new off-property content to become part of the training corpus.

1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist

Outline/Structure of the Case Study

  1. Brief introduction about Meredith and Media Industry
  2. Problem Statement - Importance of creating detailed taxonomy in Digital Media
  3. Why Topic Modeling
  4. Agenda for Analysis
    1. Objective
    2. Data Description and preprocessing
    3. Model Building
    4. Model Evaluation
    5. Model Improvement
    6. Production
  5. Outcome

Learning Outcome

  1. Build broad level topics for creating a taxonomy
  2. Overview of LDA
  3. Methods to improve topic modeling results
  4. Classifiers to label articles

Target Audience

Data Scientists, Media Firms, NLP Researchers, Taxonomists, Digital Media Executives

Prerequisites for Attendees

The presentation requires a general awareness of text mining, topic modeling and LDA. It will touch upon the basics of LDA and topic modeling before getting into more in-depth analysis. Hence, a person exposed to these areas will be able to understand the matter equally well as a seasoned expert and data scientist.

schedule Submitted 1 week ago

Public Feedback

comment Suggest improvements to the Speaker