location_city Online schedule Mar 26th 03:00 - 03:45 PM IST place Zoom people 32 Interested

At Zeotap, we manage over 1000 data pipelines, many interlinked, across its 1st and 3rd party data assets, both in batch and streaming mode. These data pipelines were written using various compute engines like Apache Spark, Dataflow(Apache Beam), BigQuery, etc. At times, many of these pipelines would run on clusters that would face performance bottlenecks or would block due to resource unavailability(spot nodes). At those times we wished we could take a framework code and run it using another, say Spark onto Bigquery. But alas, the coupling of the domain with the platform at the code level would preempt this. We saw value in a unifying language, more precisely DSL, to combine all these data processes and make them interoperable.. Zeoflow is the result of creating such a unified data processing DSL. It is based on Free Monads and works as a high-level programming model and hence compute engines like Spark, Beam or any other can become plug-and-play interpreters. Additionally, data pipelines have other requirements it has to address like  data read/write, data quality & metric reporting, and writing easily testable & debuggable code. Hence, we went ahead and created an ensemble of applications to better manage and decouple all of the above aspects based on Free Monads and Applicatives. 

The presentation will take you through the journey of what started as a small library with reusability and ease of modeling business rules in mind and the design principles, requirements, and functional domain modeling we followed while choosing constructs like Free and State Monads. This helped it grow into an extensible DSL-based application suite that can operate over simple SQL engines to complex Beam-based pipelines. This is a production system at zeotap and the project is in its final stages of getting open-sourced. We try to show the beauty of pure FP constructs which have been used to solve a complex real-world use case of data pipelines and data processing, in general. 


Outline/Structure of the Talk

The presentation will cover the following topics

  • Free Monads & Applicatives
    • Base Intuition
  • Data IO
    • A framework to do common IO tasks like reading and writing data and albeit much more. An easy builder like syntax to support and extend your organization’s functionality
    • A common DSL abstracts common reader/writer functionalities with no connotation of execution engines like spark or beam
  • Data Expectations
    • A Free Applicative based DSL to model all kinds of expectations on data. Common assertions on data that pertains to question we ask around our data.
  • State Monads
    • Base Intuition
  • ZeoFlow
    • A Free Monad Based DSL to better manage your data processing pipelines
    • Complete separation of Domain and Platform where a SQL written can be easily run on any of the supported platforms/engines like Apache Spark, Apache Beam, etc. We follow the write-once, run anywhere ideology

Time-wise breakup of Topics :-

  1. Intro - 1min
  2. Motivation - 2min
  3. Problem Statement - 2min
  4. DSL - Design Consideration ~ 3min
  5. Free Monad as a DSL - 15min
    1. Use Data-IO as an implementation example
  6. Free Applicative - 5min
    1. Use Data Expectation as an implementation example
  7. Free Monads vs Applicative - When to choose which one ~ 2min
  8. Note on Designing Interpreters - 2min
    1. Show examples of why we used Readers, Writers & State Monads
  9. Elaborate on State Monad - 5min
  10. Zeoflow Design - 5min
    1. Why we used State monad as an interpreter 

Learning Outcome

  • Designing DSLs in FP using the Free Monads & Applicatives
  • Understand the usage of State Monad in a real-world use case of data processing
  • Attendees will see code and examples of all the libraries we present (most of which would be open-sourced by the time we give this talk)

Target Audience

Anybody interested in building DSLs using FP and specifically application of Free Monads & Applicatives

Prerequisites for Attendees

A basic understanding of FP concepts like Monads and Applicatives would be helpful



schedule Submitted 5 months ago

Public Feedback