Importing and cleaning data with R

We are experiencing a tremendous explosion in big data. A significant share of this data is unfit for direct analysis or machine learning. This presentation emphasizes on web scraping with powerful R packages such as httr and tools like XPath.This session will also introduce the principles of data cleaning. By the end of the session, you will be able to import raw data from most websites and transform them into proper robust datasets. In the due course of this session, we would build a robust dataset by implementing the above concepts ready for analysis

 
 

Outline/Structure of the Workshop

  • Importing data from the web
  • Introduction to packages and tools like httr and XPath
  • Tidying imported data
  • Cleaning survey data

Learning Outcome

  • Learn to import data from the web using APIs
  • Learn to import raw data from the web
  • Learn to clean the imported raw data by removing unwanted HTML tags and other data impurities
  • Learn to tidy up survey data and transform any survey data to a proper dataset

Target Audience

Aspiring Data Analysts,Aspiring Data Scientists

Prerequisites for Attendees

Basic knowledge of R syntax

Laptops with R installed

schedule Submitted 1 year ago

Public Feedback

comment Suggest improvements to the Speaker
  • Venkatraman J
    By Venkatraman J  ~  1 year ago
    reply Reply

    I too feel this a beginner level talk or a workshop session for beginners as this talk covers basic packages like httr and XPath. Languages like java and python have done this for ages and engineers are used to those concepts. R as a language is famous for it's statistical packages if we could add a nice statistical package that would be great. I don't want to disappoint you as i see you are a Bachelor's student, but just push little bit beyond, how about something like random forest package.

    Just try your best but good start with your approach here.

  • Prachi Saraph
    By Prachi Saraph  ~  1 year ago
    reply Reply

    Hi Hariraj,

    Do you think, it is possible to cover an E2E use case, apart from data cleaning piece, you could  if you wish to conduct a 45 min workshop?

    Since the abstract mentions Big Data, web scrapping and R packages, please provide high level comments on how this approach works for Big Data?

    • Hariraj K
      By Hariraj K  ~  1 year ago
      reply Reply

      Thank you Prachi Saraph. I have updated my proposal.

  • Vishal Gokhale
    By Vishal Gokhale  ~  1 year ago
    reply Reply

    Thanks for the proposal, Hariraj ! :-)

    I am assuming this would be a hands-on session and you would expect people to try out some examples themselves. 
    Is that right? if it is so, you might want to update the pre-requisites with any installations/setup that every participant must have in order to have a smooth and quick start. 

    • Hariraj K
      By Hariraj K  ~  1 year ago
      reply Reply

      Thank you for drawing my attention to the issue. I have updated the pre-requisites column with relevant details.


  • Liked Venkatraman J
    keyboard_arrow_down

    Venkatraman J - Detection and Classification of Fake news using Convolutional Neural networks

    20 Mins
    Talk
    Intermediate

    The proliferation of fake news or rumours in traditional news media sites, social media, feeds, and blogs have made it extremely difficult and challenging to trust any news in day to day life. There are wide implications of false information on both individuals and society. Even though humans can identify and classify fake news through heuristics, common sense and analysis there is a huge demand for an automated computational approach to achieve scalability and reliability. This talk explains how Neural probabilistic models using deep learning techniques are used to classify and detect fake news.

    This talk will start with an introduction to Deep learning, Tensor flow(Google's Deep learning framework), Dense vectors (word2vec model) feature extraction, data preprocessing techniques, feature selection, PCA and move on to explain how a scalable machine learning architecture for fake news detection can be built.

  • Liked Hariraj K
    keyboard_arrow_down

    Hariraj K - Big Data and Open data: as tools for empowering people

    Hariraj K
    Hariraj K
    Co-Founder
    FOSSMEC
    schedule 1 year ago
    Sold Out!
    20 Mins
    Talk
    Beginner

    With limited transparency, governments tend to become less accessible to the public. While data science remains as a dominating market in almost all day-to-day life industries, its possibilities in administration and governance are yet to be exploited. In this presentation, I address how emerging concepts such as open data and big data can be used to strengthen democracies and help governments serve the public better. We will explore the various possible ways big data and open data can be used to bridge income inequalities and implement proper resource and service allocation. We will also be looking at different initiative taken by individuals and communities and see the impact those initiatives have had on aiding governance. We will also emphasize the concept of open governance and government open data.

  • Liked Akshay Balakrishnan
    keyboard_arrow_down

    Akshay Balakrishnan - Does Blockchain have any application in the field of Machine Learning?

    20 Mins
    Talk
    Beginner

    Blockchain is one of the most hyped as well as the most misunderstood technologies to come out in recent years. What has caused this sudden surge in interest is the massive success of cryptocurrencies like Bitcoin which is a genuine alternative to the banking system we have known for so long. The success of cryptocurrencies has more to do with the economic aspect of currencies rather than the blockchain itself. Yet, that has not stopped enthusiasts from applying the concept of blockchain to many other fields, including Machine Learning. This keynote speech intends to give a brief overview of what has been done to this regard, and whether blockchain actually has much practical relevance in the field of Machine Learning.

  • Liked Hariraj K
    keyboard_arrow_down

    Hariraj K - Reccomendation engine: Theory and mathematical implementation

    Hariraj K
    Hariraj K
    Co-Founder
    FOSSMEC
    schedule 1 year ago
    Sold Out!
    10 Mins
    Talk
    Beginner

    From our Tinder matches to movies we watch on Netflix, we tend to encounter recommendation engines on a day to day basis and with the data explosion in place, the number of recommendation engines at play would increase dramatically. In this talk, we look into the underlying principles of recommendation engines. You will learn about the main types of recommendation engine approaches. By the end of this session, you will have ideas on how each of this approaches can be implemented. You will also be able to understand the pros and cons of both these approaches.