Image ATM - Image Classification for Everyone
At idealo.de we store and display millions of images. Our gallery contains pictures of all sorts. You’ll find there vacuum cleaners, bike helmets as well as hotel rooms. Working with huge volume of images brings some challenges: How to organize the galleries? What exactly is in there? Do we actually need all of it?
To tackle these problems you first need to label all the pictures. In 2018 our Data Science team completed four projects in the area of image classification. In 2019 there were many more to come. Therefore, we decided to automate this process by creating a software we called Image ATM (Automated Tagging Machine). With the help of transfer learning, Image ATM enables the user to train a Deep Learning model without knowledge or experience in the area of Machine Learning. All you need is data and spare couple of minutes!
In this talk we will discuss the state-of-art technologies available for image classification and present Image ATM in the context of these technologies. We will then give a crash course of our product where we will guide you through different ways of using it - in shell, on Jupyter Notebook and on the Cloud. We will also talk about our roadmap for Image ATM.
Outline/Structure of the Talk
- Motivation why to use Image ATM
- Introduction of image classification problem
- Deep learning/Transfer learning
- Keras, TensorFlow, PyTorch, MXNet etc can solve it but still low-level
- Image ATM
- Working with the cloud
- Further roadmap
- Learn how to use Image ATM e.g. which kind of input is needed, preprocessing, training and then evaluation
- Learn how you can contribute to it as well
- Learn about our image classification problems
data scientists, machine learners, software engineers, data analyst
Prerequisites for Attendees
- Experience with deep learning in particular CNNs
- Experience with Jupyter notebooks & Cloud training
- Image classification
schedule Submitted 2 months ago
People who liked this proposal, also liked:
Anant Jain - Adversarial Attacks on Neural NetworksAnant JainCo-FounderCompose Labs, Inc.
schedule 2 months agoSold Out!
Since 2014, adversarial examples in Deep Neural Networks have come a long way. This talk aims to be a comprehensive introduction to adversarial attacks including various threat models (black box/white box), approaches to create adversarial examples and will include demos. The talk will dive deep into the intuition behind why adversarial examples exhibit the properties they do — in particular, transferability across models and training data, as well as high confidence of incorrect labels. Finally, we will go over various approaches to mitigate these attacks (Adversarial Training, Defensive Distillation, Gradient Masking, etc.) and discuss what seems to have worked best over the past year.
Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the HypeDipanjan SarkarData ScientistRed Hat
schedule 3 months agoSold Out!
The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.
To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!
Favio Vázquez - Complete Data Science Workflows with Open Source ToolsFavio VázquezSr. Data ScientistRaken Data Group
schedule 1 week agoSold Out!
Cleaning, preparing , transforming, exploring data and modeling it's what we hear all the time about data science, and these steps maybe the most important ones. But that's not the only thing about data science, in this talk you will learn how the combination of Apache Spark, Optimus, the Python ecosystem and Data Operations can form a whole framework for data science that will allow you and your company to go further, and beyond common sense and intuition to solve complex business problems.
Tanuj Jain - Taming the Spark beast for Deep Learning predictions at scaleTanuj JainSenior Data Scientistidealo Internet GmbH
schedule 1 week agoSold Out!
Predicting at scale is a challenging pursuit, especially when working with Deep Learning models. This is because Deep Learning models tend to have high inference time. At idealo.de, Germany's biggest price comparison platform, the Data Science team was tasked with carrying out image tagging to improve our product galleries.
One of the biggest challenges we faced was to generate predictions for more than 300 million images within a short time while keeping the costs low. Moreover, a resolution for the scaling problem became critical since we intended to apply other Deep Learning models on the same big dataset. We ended up formulating a batch-prediction solution by employing an Apache Spark setup that ran on an AWS EMR cluster.
Spark is notorious for being difficult to configure and tune. As a result, we had to carry on several optimisation steps in order to meet the scale requirements that adhered to our time and financial constraints. In this talk, I would present our Spark setup and focus on the journey of optimising the Spark tagging solution. Additionally, I would also talk briefly about the underlying deep learning model which was used to predict the image tags.
Pushker Ravindra - Data Science Best Practices for R and PythonPushker RavindraData Analytics LeadMonsanto/Bayer
schedule 1 week agoSold Out!
How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sustainable data products.
- Integrated Development Environment (RStudio, PyCharm)
- Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)
- Linter (lintR, Pylint)
- Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)
- Unit testing (testthat, unittest)
- Version control (Git)
These best practices reduce technical debt in long term significantly, foster more collaboration and promote building of more sustainable data products in any organization.
Rahee Walambe / Aditya Sonavane - Can AI replace Traditional Control Algorithms?Rahee WalambeResearch and Teaching FacultySymbiosis institute of TechnologyAditya Sonavane--
schedule 3 weeks agoSold Out!
As the technology progresses, the control tasks are getting increasingly complex. Employing the targeted algorithms for such control tasks and manually tuning them by trial and error (as in case of PID), is a cumbersome and lengthy process. Additionally, methods such as PID are designed for linear systems, however, all the real world control tasks are inherently non-linear in nature. With such complex tasks, using the conventional linear control methods approximates the nonlinear system to a linear model and in effect required performance is difficult to achieve.
The new advances in the field of AI have presented us with techniques which may help replace the traditional control algorithms. Use of AI may allow us to achieve a higher quality of control on the nonlinear process, with minimum human interaction. Thus eliminating the requirement for a skilled person to perform meager tasks of tuning control algorithms with trial and error.
Here we consider a simple case study of a beam balancer, where the controller is used for balancing a beam on a pivot to stabilize the ball at the center of the beam. We aim to implement a Reinforcement Learning based controller as an alternative to PID. We analyze the quality and compare the performance of PID-based controller vs. a RL-based controller to better understand the suitability for real-world control tasks.
Rahee Walambe / Vishal Gokhale - Processing Sequential Data using RNNsRahee WalambeResearch and Teaching FacultySymbiosis institute of TechnologyVishal GokhaleSr. ConsultantXNSIO
schedule 2 weeks agoSold Out!
Data that forms the basis of many of our daily activities like speech, text, videos has sequential/temporal dependencies. Traditional deep learning models, being inadequate to model this connectivity needed to be made recurrent before they brought technologies such as voice assistants (Alexa, Siri) or video based speech translation (Google Translate) to a practically usable form by reducing the Word Error Rate (WER) significantly. RNNs solve this problem by adding internal memory. The capacities of traditional neural networks are bolstered with this addition and the results outperform the conventional ML techniques wherever the temporal dynamics are more important.
In this full-day immersive workshop, participants will develop an intuition for sequence models through hands-on learning along with the mathematical premise of RNNs.
Tanay Pant - Machine data: how to handle it better?Tanay PantDeveloper AdvocateCrate.io
schedule 2 months agoSold Out!
The rise of IoT and smart infrastructure has led to the generation of massive amounts of complex data. Traditional solutions struggle to cope with this shift, leading to a decrease in performance and an increase in cost. In this session, I will talk about time-series data, machine data, the challenges of working with this kind of data, ingestion of this data using data from NYC cabs and running real time queries to visualise the data and gather insights. By the end of this session, you will be able to set up a highly scalable data pipeline for complex time series data with real time query performance.