Word Vectors and Language Models, using Counting to derive Actionable Insights

One of the perennial problems of NLP is how best to encode the meanings of words. Recent advances in Deep Learning have yielded a variety of neural net based approaches. The simplest and perhaps most famous of which is the word2vec Algorithm by Mikolov et al.

Most people think of these approaches or language models as simple black box techniques that improve performance of other black box techniques like RNNs or LSTMs. However, the usage of these language models is a lot richer than we give them credit for. When combined with more classical techniques, like topic modeling - models like word2vec can be used to derive very powerful, actionable insights from bodies of text, due to how interpretable and explainable they are.

This talk would be a quick exploration of word2vec, it's implications and varied applications in real world NLP Problems.


Outline/Structure of the Talk

* Introduction to Word2Vec and LDA.

* Demonstration of how word2vec is essentially just storing probabilities of a word being surrounded with other words - similar to count vectors or co-occurrence matrices of the past

* How to combine LDA and word2vec, using LDA to tag topics and `cluster` documents and word2vec to find similar parts of various documents - exposing simple but powerful insights as to what documents are talking about.

* Uses of word2vec as a generic sequence or similarity detector - using it as a simple recommendation system for music or books.

Learning Outcome

People will hopefully come away with an idea of how word2vec works and be inspired to enrich their work and projects by applications of a simple algorithm on common everyday tasks - like figuring out what songs to listen to or books to read etc.

Target Audience

Anyone that's interested in text analytics, nlp or even recommender systems

Prerequisites for Attendees

some python if they want to follow along with code examples. otherwise, no real prior knowledge is required. any prior experience with NLP would be helpful.

schedule Submitted 1 year ago