We are experiencing a tremendous explosion in big data. A significant share of this data is unfit for direct analysis or machine learning. This presentation emphasizes on web scraping with powerful R packages such as httr and tools like XPath.This session will also introduce the principles of data cleaning. By the end of the session, you will be able to import raw data from most websites and transform them into proper robust datasets. In the due course of this session, we would build a robust dataset by implementing the above concepts ready for analysis

 
 

Outline/Structure of the Workshop

  • Importing data from the web
  • Introduction to packages and tools like httr and XPath
  • Tidying imported data
  • Cleaning survey data

Learning Outcome

  • Learn to import data from the web using APIs
  • Learn to import raw data from the web
  • Learn to clean the imported raw data by removing unwanted HTML tags and other data impurities
  • Learn to tidy up survey data and transform any survey data to a proper dataset

Target Audience

Aspiring Data Analysts,Aspiring Data Scientists

Prerequisites for Attendees

Basic knowledge of R syntax

Laptops with R installed

schedule Submitted 2 years ago

Public Feedback


    • Venkatraman J
      keyboard_arrow_down

      Venkatraman J - Detection and Classification of Fake news using Convolutional Neural networks

      20 Mins
      Talk
      Intermediate

      The proliferation of fake news or rumours in traditional news media sites, social media, feeds, and blogs have made it extremely difficult and challenging to trust any news in day to day life. There are wide implications of false information on both individuals and society. Even though humans can identify and classify fake news through heuristics, common sense and analysis there is a huge demand for an automated computational approach to achieve scalability and reliability. This talk explains how Neural probabilistic models using deep learning techniques are used to classify and detect fake news.

      This talk will start with an introduction to Deep learning, Tensor flow(Google's Deep learning framework), Dense vectors (word2vec model) feature extraction, data preprocessing techniques, feature selection, PCA and move on to explain how a scalable machine learning architecture for fake news detection can be built.

    • Hariraj K
      keyboard_arrow_down

      Hariraj K - Big Data and Open data: as tools for empowering people

      Hariraj K
      Hariraj K
      Co-Founder
      FOSSMEC
      schedule 2 years ago
      Sold Out!
      20 Mins
      Talk
      Beginner

      With limited transparency, governments tend to become less accessible to the public. While data science remains as a dominating market in almost all day-to-day life industries, its possibilities in administration and governance are yet to be exploited. In this presentation, I address how emerging concepts such as open data and big data can be used to strengthen democracies and help governments serve the public better. We will explore the various possible ways big data and open data can be used to bridge income inequalities and implement proper resource and service allocation. We will also be looking at different initiative taken by individuals and communities and see the impact those initiatives have had on aiding governance. We will also emphasize the concept of open governance and government open data.

    • Akshay Balakrishnan
      keyboard_arrow_down

      Akshay Balakrishnan - Does Blockchain have any application in the field of Machine Learning?

      20 Mins
      Talk
      Beginner

      Blockchain is one of the most hyped as well as the most misunderstood technologies to come out in recent years. What has caused this sudden surge in interest is the massive success of cryptocurrencies like Bitcoin which is a genuine alternative to the banking system we have known for so long. The success of cryptocurrencies has more to do with the economic aspect of currencies rather than the blockchain itself. Yet, that has not stopped enthusiasts from applying the concept of blockchain to many other fields, including Machine Learning. This keynote speech intends to give a brief overview of what has been done to this regard, and whether blockchain actually has much practical relevance in the field of Machine Learning.

    • Hariraj K
      keyboard_arrow_down

      Hariraj K - Reccomendation engine: Theory and mathematical implementation

      Hariraj K
      Hariraj K
      Co-Founder
      FOSSMEC
      schedule 2 years ago
      Sold Out!
      10 Mins
      Talk
      Beginner

      From our Tinder matches to movies we watch on Netflix, we tend to encounter recommendation engines on a day to day basis and with the data explosion in place, the number of recommendation engines at play would increase dramatically. In this talk, we look into the underlying principles of recommendation engines. You will learn about the main types of recommendation engine approaches. By the end of this session, you will have ideas on how each of this approaches can be implemented. You will also be able to understand the pros and cons of both these approaches.