DevOps for Data Science: Experiences from building a cloud-based data science platform

schedule Aug 31st 04:30 PM - 05:15 PM place Jupiter people 89 Interested

Productionizing data science applications is non trivial. Non optimal practices, the people-heavy way of the traditional approaches, the developers love for complex solutions for the sake of using cool technologies makes the situation even worse.

There are two key ingredients required to streamline this: “the cloud” and “the right level of devops abstractions”.

In this talk, I’ll share the experiences of building a cloud-based platform for streamlining data science and how such solutions can greatly simplify building and deploying data science and machine learning applications.

 
2 favorite thumb_down thumb_up 8 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

* Why Productionizing data science is hard?
* What are the hurdles?
* Why Cloud?
* The DevOps challanges
* The power of abstractions
* Optimizing for Developer Experience (DX)
* Case studies from the rorodata platform
* Summary

Learning Outcome

  • Understand the inherent challenges in productionzing data science applications
  • Common pitfalls of ML in production
  • Tips & tricks for deploying ML applications in production

Target Audience

Data Scientists, Architects, CTOs

Prerequisite

Participants should have knowledge with using machine learning in production.

schedule Submitted 4 months ago

Comments Subscribe to Comments

comment Comment on this Submission
  • Sarah Masud
    By Sarah Masud  ~  4 months ago
    reply Reply

    Hey Anand,

    Thanks for the well-structured submission. I have one doubt though, looking at the sample presentation, that is a lot of content. Do you think 20 mins will be sufficient for the topics you want to cover?

    • Naresh Jain
      By Naresh Jain  ~  4 months ago
      reply Reply

      Anand, thank you for your proposal. It's great to see that you guys are trying to solve the challenges around productizing machine learning in the cloud.

      Currently, the way your proposal is structured, it appears to be a product pitch. I'm sure that is not your intention.

      Would it be possible for you to take a couple of core challenges in productizing machine learning, share how you've solved that problem, what challenges you ran into while solving those problems and how you overcame them? If someone is not able to use your product, they can still learn how to approach the problem and hence everyone attending the session will gain value from it.

      If you agree, request you to please update the proposal accordingly.

      • Anand Chitipothu
        By Anand Chitipothu  ~  4 months ago
        reply Reply

        Naresh, while I'm speaking from my experience of the building the product, the intention is not to make a product pitch and I don't think see anything in the proposal that sounds like one.

        Please look at the draft slides that I've shared. I've already made sure that there are no direct references to product.

        • Naresh Jain
          By Naresh Jain  ~  4 months ago
          reply Reply

          Hi Anand, thanks for the clarification. Appreciate it. I've updated your proposal to 45 mins. Can you please share the details of the case-study that you plan to cover? That is left blank in the slides.

          Also can you please give a time-break of how you plan to utilize the 45 mins?

          • Anand Chitipothu
            By Anand Chitipothu  ~  4 months ago
            reply Reply

            Thanks Naresh. I'll share the updated slides and detailed time-break in a week.

            • Naresh Jain
              By Naresh Jain  ~  4 months ago
              reply Reply

              Hi Anand, request you to please update us on the same.

              • Anand Chitipothu
                By Anand Chitipothu  ~  4 months ago
                reply Reply

                Hi Naresh, it already has one case study. I've added another case study and updated the same URL.

    • Anand Chitipothu
      By Anand Chitipothu  ~  4 months ago
      reply Reply

      I didn't see any other option than 20mins slot for talks. If that is the only option available, I'll plan accordingly. If there is any option of a longer slot I would love to consider that.


  • Liked Dr. Tom Starke
    keyboard_arrow_down

    Dr. Tom Starke - Intelligent Autonomous Trading Systems - Are We There Yet?

    Dr. Tom Starke
    Dr. Tom Starke
    CEO
    AAAQuants
    schedule 4 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    Over the last two decades, trading has seen a remarkable evolution from open-outcry in the Wall Street pits to screen trading all the way to current automation and high-frequency trading (HFT). The success of machine learning and artificial intelligence (AI) seems like natural progression for the evolution of trading. However, unlike other fields of AI, trading has some domain specific problems that project the dream of set-it-and-forget-it money making machines still some way in the future. This talk will describe the current challenges for intelligent autonomous trading systems and provides some practical examples where machine learning is already being used in financial applications.

  • Liked Vincenzo Tursi
    keyboard_arrow_down

    Vincenzo Tursi - Puzzling Together a Teacher-Bot: Machine Learning, NLP, Active Learning, and Microservices

    Vincenzo Tursi
    Vincenzo Tursi
    Data Scientist
    KNIME
    schedule 4 months ago
    Sold Out!
    45 Mins
    Demonstration
    Beginner

    Hi! My name is Emil and I am a Teacher Bot. I was built to answer your initial questions about using KNIME Analytics Platform. Well, actually, I was built to point you to the right training materials to answer your questions about KNIME.

    Puzzling together all the pieces to implement me wasn't that difficult. All you need are:

    • A user interface - web or speech based - for you to ask questions
    • A text parser for me to understand
    • A brain to find the right training materials to answer your question
    • A user interface to send you the answer
    • A feedback option - nice to have but not a must - on whether my answer was of any help

    The most complex part was, of course, my brain. Building my brain required: a clear definition of the problem, a labeled data set, a class ontology, and finally the training of a machine learning model. The labeled data set in particular was lacking. So, we relied on active learning to incrementally make my brain smarter over time. Some parts of the final architecture, such as understanding and resource searching, were deployed as microservices.

  • Liked Atin Ghosh
    keyboard_arrow_down

    Atin Ghosh - AR-MDN - Associative and Recurrent Mixture Density Network for e-Retail Demand Forecasting

    45 Mins
    Case Study
    Intermediate

    Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The chal- lenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called AR-MDN, that simultaneously models associative fac- tors, time-series trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The AR-MDN can be trained end-to-end without the need for additional supervision. We experiment on a dataset of an year’s worth of data over tens-of-thousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives.

  • Liked Asha Saini
    keyboard_arrow_down

    Asha Saini - Using Open Data to Predict Market Movements

    20 Mins
    Talk
    Intermediate

    As companies progress on their digital transformation journeys, technology becomes a strategic business decision. In this realm, consulting firms such as Gartner exert tremendous influence on technology purchasing decisions. The ability of these firms to predict the movement of market players will provide vendors with competitive benefits.

    We will explore how, with the use of publicly available data sources, IT industry trends can be mimicked and predicted.

    Big Data enthusiasts learned quickly that there are caveats to making Big Data useful:

    • Data source availability
    • Producing meaningful insights from publicly available sources

    Working with large data sets that are frequently changing can become expensive and frustrating. The learning curve is steep and discovery process long. Challenges range from selection of efficient tools to parse unstructured data, to development of a vision for interpreting and utilizing the data for competitive advantages.

    We will describe how the archive of billions of web pages, captured monthly since 2008 and available for free analysis on AWS, can be used to mimic and predict trends reflected in industry-standard consulting reports.

    There could be potential opportunity in this process to apply machine learning to tune the models and to self-learn so they can optimize automatically. There are over 70 topic area reports that Gartner publishes. Having an automated tool that can analyze across all of those topic areas to help us quickly understand major trends across today’s landscape and plan for those to come would be invaluable to many organizations.

  • Liked Dr. Ravi Vijayaraghavan
    keyboard_arrow_down

    Dr. Ravi Vijayaraghavan / Dr. Sidharth Kumar - Analytics and Science for Customer Experience and Growth in E-commerce

    20 Mins
    Experience Report
    Advanced

    In our talk, we will cover the broad areas where Flipkart leverages Analytics and Sciences to drive both human and machine-driven decisions. We will go deeper into one use case related to pricing in e-commerce.

  • Liked Janakiram MSV
    keyboard_arrow_down

    Janakiram MSV - Accelerate Machine Learning Adoption with AutoML

    20 Mins
    Demonstration
    Beginner

    One emerging trend that's going to fundamentally change the face of ML is AutoML. It enables business analysts and developers to evolve machine learning models that can address complex scenarios. From platform companies such as Google and Microsoft to early-stage startups, AutoML is fast gaining traction. This session demonstrates how AutoML accelerates building machine learning models.