Image ATM - Image Classification for Everyone

At idealo.de we store and display millions of images. Our gallery contains pictures of all sorts. You’ll find there vacuum cleaners, bike helmets as well as hotel rooms. Working with huge volume of images brings some challenges: How to organize the galleries? What exactly is in there? Do we actually need all of it?

To tackle these problems you first need to label all the pictures. In 2018 our Data Science team completed four projects in the area of image classification. In 2019 there were many more to come. Therefore, we decided to automate this process by creating a software we called Image ATM (Automated Tagging Machine). With the help of transfer learning, Image ATM enables the user to train a Deep Learning model without knowledge or experience in the area of Machine Learning. All you need is data and spare couple of minutes!

In this talk we will discuss the state-of-art technologies available for image classification and present Image ATM in the context of these technologies. We will then give a crash course of our product where we will guide you through different ways of using it - in shell, on Jupyter Notebook and on the Cloud. We will also talk about our roadmap for Image ATM.

 
4 favorite thumb_down thumb_up 6 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Talk

  • Motivation why to use Image ATM
  • Introduction of image classification problem
    • Deep learning/Transfer learning
    • Keras, TensorFlow, PyTorch, MXNet etc can solve it but still low-level
  • Image ATM
    • Installation
    • CLI
    • Working with the cloud
    • Further roadmap
  • Conclusion

Learning Outcome

- Learn how to use Image ATM e.g. which kind of input is needed, preprocessing, training and then evaluation

- Learn how you can contribute to it as well

- Learn about our image classification problems

Target Audience

data scientists, machine learners, software engineers, data analyst

Prerequisites for Attendees

- Experience with deep learning in particular CNNs

- Experience with Jupyter notebooks & Cloud training

- Image classification

schedule Submitted 2 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Naresh Jain
    By Naresh Jain  ~  1 week ago
    reply Reply

    Hi Dat,

    Thanks for your proposal. Since the conference is targeted at Data Science practitioners, an overview session will not really be useful. Attendees are looking for specific deep-dive sessions, where they can learn something concrete, from the speaker's first-hand experience, which is hard to find online.

    Just a suggestion: Will you be able to take a specific use-case from online price comparison service and do deep into the nuts and bolts of it to help data science practitioners from other domain learn something unique and specific that you and your team implemented?

    • Dat Tran
      By Dat Tran  ~  1 week ago
      reply Reply

      It's Dat and not Dan. As I said I can present more a deep-dive into a specific use case instead of giving an overall overview. Shall I change the abstract here or should I hand in a different talk?

      • Naresh Jain
        By Naresh Jain  ~  1 week ago
        reply Reply

        I'm really sorry about misspelling your name, Dat. Request you to please update this proposal itself. Thank you.

        • Dat Tran
          By Dat Tran  ~  1 week ago
          reply Reply

          Done!

  • Dipanjan Sarkar
    By Dipanjan Sarkar  ~  2 weeks ago
    reply Reply

    Hey Dat, this is quite good but you folks have done so much recently around building excellent products\models leveraging ML and DL, would you maybe want to even cover some of those real-world case studies? 

    • Dat Tran
      By Dat Tran  ~  2 weeks ago
      reply Reply

      Sure I either can speak about all our use cases in general or I also have a new talk around our new project imageatm which we also used for a specific project. So depending on what you guys are interested in I can change the talk.


  • Liked Anant Jain
    keyboard_arrow_down

    Anant Jain - Adversarial Attacks on Neural Networks

    Anant Jain
    Anant Jain
    Co-Founder
    Compose Labs, Inc.
    schedule 2 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    Since 2014, adversarial examples in Deep Neural Networks have come a long way. This talk aims to be a comprehensive introduction to adversarial attacks including various threat models (black box/white box), approaches to create adversarial examples and will include demos. The talk will dive deep into the intuition behind why adversarial examples exhibit the properties they do — in particular, transferability across models and training data, as well as high confidence of incorrect labels. Finally, we will go over various approaches to mitigate these attacks (Adversarial Training, Defensive Distillation, Gradient Masking, etc.) and discuss what seems to have worked best over the past year.

  • Liked Dipanjan Sarkar
    keyboard_arrow_down

    Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the Hype

    Dipanjan Sarkar
    Dipanjan Sarkar
    Data Scientist
    Red Hat
    schedule 3 months ago
    Sold Out!
    45 Mins
    Tutorial
    Intermediate

    The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.

    A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.

    To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!

  • Liked Favio Vázquez
    keyboard_arrow_down

    Favio Vázquez - Complete Data Science Workflows with Open Source Tools

    90 Mins
    Tutorial
    Beginner

    Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.

  • Liked Tanuj Jain
    keyboard_arrow_down

    Tanuj Jain - Taming the Spark beast for Deep Learning predictions at scale

    45 Mins
    Talk
    Intermediate

    Predicting at scale is a challenging pursuit, especially when working with Deep Learning models. This is because Deep Learning models tend to have high inference time. At idealo.de, Germany's biggest price comparison platform, the Data Science team was tasked with carrying out image tagging to improve our product galleries.

    One of the biggest challenges we faced was to generate predictions for more than 300 million images within a short time while keeping the costs low. Moreover, a resolution for the scaling problem became critical since we intended to apply other Deep Learning models on the same big dataset. We ended up formulating a batch-prediction solution by employing an Apache Spark setup that ran on an AWS EMR cluster.

    Spark is notorious for being difficult to configure and tune. As a result, we had to carry on several optimisation steps in order to meet the scale requirements that adhered to our time and financial constraints. In this talk, I would present our Spark setup and focus on the journey of optimising the Spark tagging solution. Additionally, I would also talk briefly about the underlying deep learning model which was used to predict the image tags.

  • Liked Pushker Ravindra
    keyboard_arrow_down

    Pushker Ravindra - Data Science Best Practices for R and Python

    20 Mins
    Talk
    Intermediate

    How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sustainable data products.

    - Integrated Development Environment (RStudio, PyCharm)

    - Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)

    - Linter (lintR, Pylint)

    - Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)

    - Unit testing (testthat, unittest)

    - Packaging

    - Version control (Git)

    These best practices reduce technical debt in long term significantly, foster more collaboration and promote building of more sustainable data products in any organization.

  • Liked Rahee Walambe
    keyboard_arrow_down

    Rahee Walambe / Aditya Sonavane - Can AI replace Traditional Control Algorithms?

    45 Mins
    Case Study
    Beginner

    As the technology progresses, the control tasks are getting increasingly complex. Employing the targeted algorithms for such control tasks and manually tuning them by trial and error (as in case of PID), is a cumbersome and lengthy process. Additionally, methods such as PID are designed for linear systems, however, all the real world control tasks are inherently non-linear in nature. With such complex tasks, using the conventional linear control methods approximates the nonlinear system to a linear model and in effect required performance is difficult to achieve.

    The new advances in the field of AI have presented us with techniques which may help replace the traditional control algorithms. Use of AI may allow us to achieve a higher quality of control on the nonlinear process, with minimum human interaction. Thus eliminating the requirement for a skilled person to perform meager tasks of tuning control algorithms with trial and error.

    Here we consider a simple case study of a beam balancer, where the controller is used for balancing a beam on a pivot to stabilize the ball at the center of the beam. We aim to implement a Reinforcement Learning based controller as an alternative to PID. We analyze the quality and compare the performance of PID-based controller vs. a RL-based controller to better understand the suitability for real-world control tasks.

  • Liked Rahee Walambe
    keyboard_arrow_down

    Rahee Walambe / Vishal Gokhale - Processing Sequential Data using RNNs

    480 Mins
    Workshop
    Beginner

    Data that forms the basis of many of our daily activities like speech, text, videos has sequential/temporal dependencies. Traditional deep learning models, being inadequate to model this connectivity needed to be made recurrent before they brought technologies such as voice assistants (Alexa, Siri) or video based speech translation (Google Translate) to a practically usable form by reducing the Word Error Rate (WER) significantly. RNNs solve this problem by adding internal memory. The capacities of traditional neural networks are bolstered with this addition and the results outperform the conventional ML techniques wherever the temporal dynamics are more important.
    In this full-day immersive workshop, participants will develop an intuition for sequence models through hands-on learning along with the mathematical premise of RNNs.

  • Liked Tanay Pant
    keyboard_arrow_down

    Tanay Pant - Machine data: how to handle it better?

    Tanay Pant
    Tanay Pant
    Developer Advocate
    Crate.io
    schedule 2 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    The rise of IoT and smart infrastructure has led to the generation of massive amounts of complex data. Traditional solutions struggle to cope with this shift, leading to a decrease in performance and an increase in cost. In this session, I will talk about time-series data, machine data, the challenges of working with this kind of data, ingestion of this data using data from NYC cabs and running real time queries to visualise the data and gather insights. By the end of this session, you will be able to set up a highly scalable data pipeline for complex time series data with real time query performance.