Machine data: how to handle it better?

The rise of IoT and smart infrastructure has led to the generation of massive amounts of complex data. Traditional solutions struggle to cope with this shift, leading to a decrease in performance and an increase in cost. In this session, I will talk about time-series data, machine data, the challenges of working with this kind of data, ingestion of this data using data from NYC cabs and running real time queries to visualise the data and gather insights. By the end of this session, you will be able to set up a highly scalable data pipeline for complex time series data with real time query performance.


Outline/Structure of the Talk

High level outline of topics that will be covered in this presentation:

1. Growth of IoT and Sensor Data

2. Time-series data

3. Challenges that are posed by large volumes of time-series data

4. Showcasing and overcoming the problem: A case-study

5. Demo time: Geospatial queries on machine data, 2017 NYC cab data and visualisation on Grafana

Learning Outcome

By the end of this session, we will be able to set up a highly scalable data pipeline for complex time series data with real time query performance.

Target Audience

Developers, Managers, IoT Enthusiasts

Prerequisites for Attendees

Some knowledge of databases, data pipelines and containers will help the audiences to follow along and make the most of this talk.

schedule Submitted 11 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Kuldeep Jiwani
    By Kuldeep Jiwani  ~  10 months ago
    reply Reply

    Hi Tanay,

    IoT is an interesting space for ML enthusiasts, good that you are focusing on it.

    Just to understand more on your talk, will it be more focused on the ETL and data query/visualisations part of IoT data. Or will it also be covering some sensor event stream series / timeseries analysis of data and showcasing use of ML techniques on it.

    • Tanay Pant
      By Tanay Pant  ~  10 months ago
      reply Reply

      Hi Kuldeep,

      While application of ML techniques is out of scope for this talk, it will definitely be covering sensor stream series and time-series analysis of a massive amount of data while still being highly available. It will also cover some topic of visualisation of the data.

      • Kuldeep Jiwani
        By Kuldeep Jiwani  ~  9 months ago
        reply Reply

        Thanks for the info, sounds good.

  • Liked Dat Tran

    Dat Tran - Image ATM - Image Classification for Everyone

    Dat Tran
    Dat Tran
    Head of AI
    Axel Springer AI
    schedule 1 year ago
    Sold Out!
    45 Mins

    At we store and display millions of images. Our gallery contains pictures of all sorts. You’ll find there vacuum cleaners, bike helmets as well as hotel rooms. Working with huge volume of images brings some challenges: How to organize the galleries? What exactly is in there? Do we actually need all of it?

    To tackle these problems you first need to label all the pictures. In 2018 our Data Science team completed four projects in the area of image classification. In 2019 there were many more to come. Therefore, we decided to automate this process by creating a software we called Image ATM (Automated Tagging Machine). With the help of transfer learning, Image ATM enables the user to train a Deep Learning model without knowledge or experience in the area of Machine Learning. All you need is data and spare couple of minutes!

    In this talk we will discuss the state-of-art technologies available for image classification and present Image ATM in the context of these technologies. We will then give a crash course of our product where we will guide you through different ways of using it - in shell, on Jupyter Notebook and on the Cloud. We will also talk about our roadmap for Image ATM.

  • Liked Dipanjan Sarkar

    Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the Hype

    45 Mins

    The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.

    A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.

    To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!

  • Liked Rahee Walambe

    Rahee Walambe / Vishal Gokhale - Processing Sequential Data using RNNs

    480 Mins

    Data that forms the basis of many of our daily activities like speech, text, videos has sequential/temporal dependencies. Traditional deep learning models, being inadequate to model this connectivity needed to be made recurrent before they brought technologies such as voice assistants (Alexa, Siri) or video based speech translation (Google Translate) to a practically usable form by reducing the Word Error Rate (WER) significantly. RNNs solve this problem by adding internal memory. The capacities of traditional neural networks are bolstered with this addition and the results outperform the conventional ML techniques wherever the temporal dynamics are more important.
    In this full-day immersive workshop, participants will develop an intuition for sequence models through hands-on learning along with the mathematical premise of RNNs.

  • Liked Anant Jain

    Anant Jain - Adversarial Attacks on Neural Networks

    Anant Jain
    Anant Jain
    Compose Labs, Inc.
    schedule 11 months ago
    Sold Out!
    20 Mins

    Since 2014, adversarial examples in Deep Neural Networks have come a long way. This talk aims to be a comprehensive introduction to adversarial attacks including various threat models (black box/white box), approaches to create adversarial examples and will include demos. The talk will dive deep into the intuition behind why adversarial examples exhibit the properties they do — in particular, transferability across models and training data, as well as high confidence of incorrect labels. Finally, we will go over various approaches to mitigate these attacks (Adversarial Training, Defensive Distillation, Gradient Masking, etc.) and discuss what seems to have worked best over the past year.