Dipping into the Big Data River: Stream Analytics at Scale
This presentation explains the concept of Kappa and Lambda architectures and showcases how useful business knowledge can be extracted from the constantly flowing river of data.
It also demonstrates how a simple POC could be built in a day with only getting your toes wet by leveraging Docker and other technologies like Kafka, Spark and Cassandra.
Outline/structure of the Session
After a brief introduction to Kappa/Lambda a live demo will be performed. It will include a short explanation of each component involved (Web Service, Kafka, Spark Streaming and Cassandra) and their setup (using Docker-Compose). Additionally, it will highlight the data flow using as an example a modified version of Kaggle Expedia data set. Finally, it will discuss the pros and cons for several business scenarios.
Audience will learn the concepts of Kappa and Lambda architectures. It will also facilitate them the identification of business cases most suited for those types of architectures. Additionally, they will walk out with a functional POC code (Github repository) that they could extend and adapt for their use.
Developers and technical managers interested in discovering how to easily get business value from their real-time data.
Generic knowledge of data processing and Big Data technologies.
schedule Submitted 11 months ago
People who liked this proposal, also liked:
Davor Bonaci - Realizing the Promise of Portable Data Processing with Apache BeamDavor BonaciSr. Software EngineerGoogle Inc.
schedule 11 months agoSold Out!
The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere".
This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future.