DevOps for Data Science: Experiences from building a cloud-based data science platform

location_city Bengaluru schedule Aug 31st 04:30 - 05:15 PM place Jupiter people 90 Interested

Productionizing data science applications is non trivial. Non optimal practices, the people-heavy way of the traditional approaches, the developers love for complex solutions for the sake of using cool technologies makes the situation even worse.

There are two key ingredients required to streamline this: “the cloud” and “the right level of devops abstractions”.

In this talk, I’ll share the experiences of building a cloud-based platform for streamlining data science and how such solutions can greatly simplify building and deploying data science and machine learning applications.

 
 

Outline/Structure of the Case Study

* Why Productionizing data science is hard?
* What are the hurdles?
* Why Cloud?
* The DevOps challanges
* The power of abstractions
* Optimizing for Developer Experience (DX)
* Case studies from the rorodata platform
* Summary

Learning Outcome

  • Understand the inherent challenges in productionzing data science applications
  • Common pitfalls of ML in production
  • Tips & tricks for deploying ML applications in production

Target Audience

Data Scientists, Architects, CTOs

Prerequisites for Attendees

Participants should have knowledge with using machine learning in production.

schedule Submitted 2 years ago

Public Feedback


    • Dr. Tom Starke
      keyboard_arrow_down

      Dr. Tom Starke - Intelligent Autonomous Trading Systems - Are We There Yet?

      Dr. Tom Starke
      Dr. Tom Starke
      CEO
      AAAQuants
      schedule 2 years ago
      Sold Out!
      45 Mins
      Talk
      Intermediate

      Over the last two decades, trading has seen a remarkable evolution from open-outcry in the Wall Street pits to screen trading all the way to current automation and high-frequency trading (HFT). The success of machine learning and artificial intelligence (AI) seems like natural progression for the evolution of trading. However, unlike other fields of AI, trading has some domain specific problems that project the dream of set-it-and-forget-it money making machines still some way in the future. This talk will describe the current challenges for intelligent autonomous trading systems and provides some practical examples where machine learning is already being used in financial applications.

    • Vincenzo Tursi
      keyboard_arrow_down

      Vincenzo Tursi - Puzzling Together a Teacher-Bot: Machine Learning, NLP, Active Learning, and Microservices

      Vincenzo Tursi
      Vincenzo Tursi
      Data Scientist
      KNIME
      schedule 2 years ago
      Sold Out!
      45 Mins
      Demonstration
      Beginner

      Hi! My name is Emil and I am a Teacher Bot. I was built to answer your initial questions about using KNIME Analytics Platform. Well, actually, I was built to point you to the right training materials to answer your questions about KNIME.

      Puzzling together all the pieces to implement me wasn't that difficult. All you need are:

      • A user interface - web or speech based - for you to ask questions
      • A text parser for me to understand
      • A brain to find the right training materials to answer your question
      • A user interface to send you the answer
      • A feedback option - nice to have but not a must - on whether my answer was of any help

      The most complex part was, of course, my brain. Building my brain required: a clear definition of the problem, a labeled data set, a class ontology, and finally the training of a machine learning model. The labeled data set in particular was lacking. So, we relied on active learning to incrementally make my brain smarter over time. Some parts of the final architecture, such as understanding and resource searching, were deployed as microservices.

    • Atin Ghosh
      keyboard_arrow_down

      Atin Ghosh - AR-MDN - Associative and Recurrent Mixture Density Network for e-Retail Demand Forecasting

      45 Mins
      Case Study
      Intermediate

      Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The chal- lenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called AR-MDN, that simultaneously models associative fac- tors, time-series trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The AR-MDN can be trained end-to-end without the need for additional supervision. We experiment on a dataset of an year’s worth of data over tens-of-thousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives.

    • Asha Saini
      keyboard_arrow_down

      Asha Saini - Using Open Data to Predict Market Movements

      20 Mins
      Talk
      Intermediate

      As companies progress on their digital transformation journeys, technology becomes a strategic business decision. In this realm, consulting firms such as Gartner exert tremendous influence on technology purchasing decisions. The ability of these firms to predict the movement of market players will provide vendors with competitive benefits.

      We will explore how, with the use of publicly available data sources, IT industry trends can be mimicked and predicted.

      Big Data enthusiasts learned quickly that there are caveats to making Big Data useful:

      • Data source availability
      • Producing meaningful insights from publicly available sources

      Working with large data sets that are frequently changing can become expensive and frustrating. The learning curve is steep and discovery process long. Challenges range from selection of efficient tools to parse unstructured data, to development of a vision for interpreting and utilizing the data for competitive advantages.

      We will describe how the archive of billions of web pages, captured monthly since 2008 and available for free analysis on AWS, can be used to mimic and predict trends reflected in industry-standard consulting reports.

      There could be potential opportunity in this process to apply machine learning to tune the models and to self-learn so they can optimize automatically. There are over 70 topic area reports that Gartner publishes. Having an automated tool that can analyze across all of those topic areas to help us quickly understand major trends across today’s landscape and plan for those to come would be invaluable to many organizations.

    • Dr. Ravi Vijayaraghavan
      keyboard_arrow_down

      Dr. Ravi Vijayaraghavan / Dr. Sidharth Kumar - Analytics and Science for Customer Experience and Growth in E-commerce

      20 Mins
      Experience Report
      Advanced

      In our talk, we will cover the broad areas where Flipkart leverages Analytics and Sciences to drive both human and machine-driven decisions. We will go deeper into one use case related to pricing in e-commerce.

    • Janakiram MSV
      keyboard_arrow_down

      Janakiram MSV - Accelerate Machine Learning Adoption with AutoML

      20 Mins
      Demonstration
      Beginner

      One emerging trend that's going to fundamentally change the face of ML is AutoML. It enables business analysts and developers to evolve machine learning models that can address complex scenarios. From platform companies such as Google and Microsoft to early-stage startups, AutoML is fast gaining traction. This session demonstrates how AutoML accelerates building machine learning models.