location_city Sydney schedule Sep 18th 02:00 - 02:30 PM place Grand Lodge people 83 Interested

This presentation explains the concept of Kappa and Lambda architectures and showcases how useful business knowledge can be extracted from the constantly flowing river of data.

It also demonstrates how a simple POC could be built in a day with only getting your toes wet by leveraging Docker and other technologies like Kafka, Spark and Cassandra.


Outline/Structure of the Demonstration

After a brief introduction to Kappa/Lambda a live demo will be performed. It will include a short explanation of each component involved (Web Service, Kafka, Spark Streaming and Cassandra) and their setup (using Docker-Compose). Additionally, it will highlight the data flow using as an example a modified version of Kaggle Expedia data set. Finally, it will discuss the pros and cons for several business scenarios.

Learning Outcome

Audience will learn the concepts of Kappa and Lambda architectures. It will also facilitate them the identification of business cases most suited for those types of architectures. Additionally, they will walk out with a functional POC code (Github repository) that they could extend and adapt for their use.

Target Audience

Developers and technical managers interested in discovering how to easily get business value from their real-time data.

Prerequisites for Attendees

Generic knowledge of data processing and Big Data technologies.

schedule Submitted 2 years ago

Public Feedback

comment Suggest improvements to the Speaker
  • Josh Graham
    By Josh Graham  ~  2 years ago
    reply Reply

    Sure you can fit it all in 30 mins? Also, demo should work without Internet ... don't rely on conference wifi ;-)

    • Radek Ostrowski
      By Radek Ostrowski  ~  2 years ago
      reply Reply

      Hi Josh, yes, I'll fit it in. It's less complicated than it seems :) I have the whole demo running on Docker on my laptop, so no need for wifi. Cheers 

  • Liked Davor Bonaci

    Davor Bonaci - Realizing the Promise of Portable Data Processing with Apache Beam

    Davor Bonaci
    Davor Bonaci
    Sr. Software Engineer
    Google Inc.
    schedule 2 years ago
    Sold Out!
    30 Mins

    The world of big data involves an ever changing field of players. Much as SQL stands as a lingua franca for declarative data analysis, Apache Beam aims to provide a portable standard for expressing robust, out-of-order data processing pipelines in a variety of languages across a variety of platforms. In a way, Apache Beam is a glue that can connect the Big Data ecosystem together; it enables users to "run-anything-anywhere".

    This talk will briefly cover the capabilities of the Beam model for data processing, as well as the current state of the Beam ecosystem. We'll discuss Beam architecture and dive into the portability layer. We'll offer a technical analysis of the Beam's powerful primitive operations that enable true and reliable portability across diverse environments. Finally, we'll demonstrate a complex pipeline running on multiple runners in multiple deployment scenarios (e.g. Apache Spark on Amazon Web Services, Apache Flink on Google Cloud, Apache Apex on-premise), and give a glimpse at some of the challenges Beam aims to address in the future.