Functional Programming with Spark
Spark is a general purpose distributed computing platform, designed to handle both batch and streaming applications. It extends on the map reduce paradigm initially coined by the Google in it's 2004 research paper. It leverages functional programming paradigm for doing the transformations on the datasets residing in cluster's memory.
Matei Zaharia, creator of Spark mentioned the importance of using functional programming language - " At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc) because I thought that was the way people would want to program these applications after seeing research systems that had it .. "
Outline/structure of the Session
- Introduction to Spark
- How Spark builds and manages distributed datasets as Scala collection
- Higher order functions in Spark vs Scala
- Transformations on RDD in Spark using functions
- Dataframes API in Spark
- Typed transformations on Datasets
- Get an overview of Spark, the Big data processing Engine
- Understand why lazy collections need to be functional in nature
- How to optimize iterative algorithms in Big data world
People interested in Big data and functional programming
schedule Submitted 1 year ago
People who liked this proposal, also liked:
Eric Torreborre - Streams, effects and beautiful folds, a winning trilogyEric TorreborreSenior Software EngineerZalando
schedule 1 year agoSold Out!
Most applications are just reading data, transforming it and writing it somewhere else. And there are great libraries in the Scala eco-system to support these use cases: Akka-Stream, fs2, Monix,... But if you look under the hood and try to understand how those libraries work you might be a bit scared by their complexity!
In this talk you will learn how to build a very minimal "streaming library" where all the difficult concerns are left to other libraries: eff for asynchronous computations and resources management, origami for extracting useful data out of the stream. Then you will decide how to spend your complexity budget and when you should pay for more powerful abstractions.