Low Latency Polyglot Model Scoring using Apache Apex
Data science is fast becoming a complementary approach and process to solve business challenges today. The explosion of frameworks to help data scientists build models bears a testimony to this. However when a model needs to be turned into a production version in very low latency and enterprise grade environments, there are a very few choices with each one having their own strengths and weaknesses. Adding to this is the current disconnect between a data scientists world which is all about modelling and an engineers world which is about SLAs and service guarantees. A framework like Apache Apex can complement each of these roles and provide constructs for both these worlds. This would help enterprises to drastically cut down the cost of model deployment to production environments.
The talk will present Apache Apex as a framework that can enable engineers and data scientists to build low latency enterprise grade applications. We will cover the foundations of Apex that contribute to the low latency processing capabilities of the platform. Subsequently aspects of the platform that make it qualify as an enterprise grade platform are discussed. Finally, we will cover the main aspects of the title of the talk wherein models developed in Java, R and Python can co-exist in the same scoring application framework thus enabling a true polyglot framework.
Outline/Structure of the Talk
The session would logically be divided into 3 sections
- A general overview of the Apache Apex platform and the features that make it a low latency processing framework
- Features of the framework that make it enterprise ready
- Features of the framework that can accomodate models developed in R,Java, Python to enable a true polyglot platform for low latency scoring.
Learning Outcome
The following would be the learning outcomes
- Basic understanding of the Apache Apex platform as a low latency processing framework
- Alternative approaches to build low latency machine learning model scoring applications with true enterprise grade capabilities.
Target Audience
Software Engineers, Data Scientists, and Architects
Prerequisites for Attendees
- An general idea of true streaming vs min-batch vs batch processing models
- Typical process that is involved today in trying to turn a model into a production version
Links
http://www.atrato.io/blog/2017/05/28/apex-kudu-output/ is a blog that discusses some of the recent integrations with Apache Apex and Apache Kudu.
I am also presenting a session titled "Low latency high throughput streaming using Apache Apex and Apache Kudu" at the data works summit 2017 in Sydney : https://dataworkssummit.com/sydney-2017/speakers/
Public Feedback