Dipping into the Big Data River: Stream Analytics at Scale
This presentation explains the concept of Kappa and Lambda architectures and showcases how useful business knowledge can be extracted from the constantly flowing river of data.
It also demonstrates how a simple POC could be built in a day with only getting your toes wet by leveraging Docker and other technologies like Kafka, Spark and Cassandra.
Outline/Structure of the Demonstration
After a brief introduction to Kappa/Lambda a live demo will be performed. It will include a short explanation of each component involved (Web Service, Kafka, Spark Streaming and Cassandra) and their setup (using Docker-Compose). Additionally, it will highlight the data flow using as an example a modified version of Kaggle Expedia data set. Finally, it will discuss the pros and cons for several business scenarios.
Learning Outcome
Audience will learn the concepts of Kappa and Lambda architectures. It will also facilitate them the identification of business cases most suited for those types of architectures. Additionally, they will walk out with a functional POC code (Github repository) that they could extend and adapt for their use.
Target Audience
Developers and technical managers interested in discovering how to easily get business value from their real-time data.
Prerequisites for Attendees
Generic knowledge of data processing and Big Data technologies.
Links
Related links:
- Lambda vs Kappa architecture: https://www.oreilly.com/ideas/questioning-the-lambda-architecture
Previous talk (Finding your Perfect Weather with Apache Spark and Docker):
https://drive.google.com/open?id=19IoItID6z2TyQ1XjGpO2EXemzTyAHH6igHDsVsmlHUI
https://www.youtube.com/embed/Co5lUwmx9XA?feature=oembed
schedule Submitted 3 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
Davor Bonaci - Realizing the Promise of Portable Data Processing with Apache Beam
30 Mins
Talk
Intermediate
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere".
This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future.
Public Feedback