End-to-end project on predicting collective sentiment for programming language using StackOverflow answers

In the world of a plethora of programming languages, and a diverse population of developers working on them, an interesting question is posed - “How happy are the developers of any given language?”.

It is often that sentiment for a language creeps into the StackOverflow answer provided by any user. With an ability to perform sentiment analysis on the user's answers, we can take a step forward to aggregate the average sentiment on the factor of language. This conveniently answers our question of interest.

The presenters create an end-to-end project which begins with pulling data from the StackOverflow API, making the collective sentiment prediction model and eventually deploying it as an API on the GCP Compute Engine.


Outline/Structure of the Tutorial

  • Scraping data from StackOverflow using the StackOverflow API
  • Investigating the data and preprocessing it as necessary
  • Serializing the data into a usable format (such as .csv) for further usage
  • Discussion on the basics of NLP and some of the classical NLP techniques like count vectorization, TF-IDF etc
  • Preparing the collective sentiment prediction model
  • Evaluating the model
  • Deploying the model as a REST API on GCP Compute Engine

Learning Outcome

This tutorial presents the typical workflow of a full-stack data science project. Following is the learning outcome in brief:

  • Collecting data for a given problem statement when the data is not directly available
  • Investigating the data from a Data Scientist's perspective
  • Building simple NLP models
  • Deploying a model as an API on the web

Target Audience

Machine Learning Enthusiasts, Aspiring Data Scientists

Prerequisites for Attendees

  • StackOverflow Ninja
  • Python3
  • Familiarity with NumPy, Pandas, NLTK, Spacy
  • Understanding of basic web concepts
schedule Submitted 10 months ago

Public Feedback

comment Suggest improvements to the Speaker