Batch as a Special Case of Streaming
I will be presenting on my teams gruelling journey in attempting to migrate a batch like system into a streaming framework. Walking through the various solutions that we tested using Flink, I'll be discussing each ones performance characteristics and bringing to light misconceptions in their designs.
Outline/structure of the Session
- Description of the problem
- Why we chose streaming
- Solution: Naive Flink
- Overview of watermarks
- How Flink handles window state
- Solution: Flink all in one window
- Limitations in Flinks windowing interface
- Solution: Flink with extrnal datastore
- Solution: Flink with state on Kafka
- In depth overview of solution
- Illustration of why this doesn't work
- Solution: Flink with stateful map
- Advantages over other solutions
- Discussion on why the solutions did not meet objectives
Understand how to approach problems involving large aggregation windows in Flink.
Be able to identify batch solutions that will not work in a streaming system.
Understand the difference in performance characteristics between streaming and batch systems.
Anyone interested in migrating batch systems to use a streaming framework.
Reading the following blog article could be helpful: