End-to-end project on predicting collective sentiment for programming language using StackOverflow answers
In the world of a plethora of programming languages, and a diverse population of developers working on them, an interesting question is posed - “How happy are the developers of any given language?”.
It is often that sentiment for a language creeps into the StackOverflow answer provided by any user. With an ability to perform sentiment analysis on the user's answers, we can take a step forward to aggregate the average sentiment on the factor of language. This conveniently answers our question of interest.
The presenters create an end-to-end project which begins with pulling data from the StackOverflow API, making the collective sentiment prediction model and eventually deploying it as an API on the GCP Compute Engine.
Outline/Structure of the Tutorial
- Scraping data from StackOverflow using the StackOverflow API
- Investigating the data and preprocessing it as necessary
- Serializing the data into a usable format (such as .csv) for further usage
- Discussion on the basics of NLP and some of the classical NLP techniques like count vectorization, TF-IDF etc
- Preparing the collective sentiment prediction model
- Evaluating the model
- Deploying the model as a REST API on GCP Compute Engine
This tutorial presents the typical workflow of a full-stack data science project. Following is the learning outcome in brief:
- Collecting data for a given problem statement when the data is not directly available
- Investigating the data from a Data Scientist's perspective
- Building simple NLP models
- Deploying a model as an API on the web
Machine Learning Enthusiasts, Aspiring Data Scientists
Prerequisites for Attendees
- StackOverflow Ninja
- Familiarity with NumPy, Pandas, NLTK, Spacy
- Understanding of basic web concepts