Batch as a Special Case of Streaming

schedule Sep 19th 12:00 PM - 12:30 PM place Grand Lodge people 77 Attending

In this talk I will share my teams gruelling journey in attempting to migrate a batch like system into a streaming framework.

Walking through the various solutions that we tested using Flink, I'll be discussing each ones performance characteristics and bringing to light misconceptions in their designs.

 
1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

  1. Description of the problem
  2. Why we chose streaming
  3. Solution: Naive Flink
    1. Overview of watermarks
    2. How Flink handles window state
  4. Solution: Flink all in one window
    1. Limitations in Flinks windowing interface
  5. Solution: Flink with extrnal datastore
  6. Solution: Flink with state on Kafka
    1. In depth overview of solution
    2. Illustration of why this doesn't work
  7. Solution: Flink with stateful map
    1. Advantages over other solutions
  8. Conclusion
    1. Discussion on why the solutions did not meet objectives

Learning Outcome

Understand how to approach problems involving large aggregation windows in Flink.

Be able to identify batch solutions that will not work in a streaming system.

Understand the difference in performance characteristics between streaming and batch systems.

Target Audience

Anyone interested in migrating batch systems to use a streaming framework.

Prerequisite

Reading the following blog article could be helpful:

https://data-artisans.com/blog/batch-is-a-special-case-of-streaming

schedule Submitted 2 months ago

Comments Subscribe to Comments

comment Comment on this Proposal