Functional Programming with Spark

Spark is a general purpose distributed computing platform, designed to handle both batch and streaming applications. It extends on the map reduce paradigm initially coined by the Google in it's 2004 research paper. It leverages functional programming paradigm for doing the transformations on the datasets residing in cluster's memory. 

 Matei Zaharia, creator of Spark mentioned the importance of using functional programming language  - " At the time we started, I really wanted a PL that supports a language-integrated interface (where people write functions inline, etc) because I thought that was the way people would want to program these applications after seeing research systems that had it .. "

 

 
2 favorite thumb_down thumb_up 2 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

  • Introduction to Spark
  • How Spark builds and manages distributed datasets as Scala collection
  • Higher order functions in Spark vs Scala
  • Demonstration
    • Transformations on RDD in Spark using functions
    • Dataframes API in Spark
    • Typed transformations on Datasets
  • Questions

Learning Outcome

  • Get an overview of Spark, the Big data processing Engine
  • Understand why lazy collections need to be functional in nature
  • How to optimize iterative algorithms in Big data world

Target Audience

People interested in Big data and functional programming

schedule Submitted 2 months ago

Comments Subscribe to Comments

comment Comment on this Proposal
  • Naresh Jain
    By Naresh Jain  ~  1 month ago
    reply Reply

    Thanks for the proposal Shad. Can you please share videos from any of your past presentations? It will help the program committee see your presentations style.


    • Liked Eric Torreborre
      keyboard_arrow_down

      Streams, effects and beautiful folds, a winning trilogy

      Eric Torreborre
      Eric Torreborre
      Senior Software Developer
      Zalando
      schedule 5 months ago
      Sold Out!
      45 mins
      Talk
      Intermediate

      Most applications are just reading data, transforming it and writing it somewhere else. And there are great libraries in the Scala eco-system to support these use cases: Akka-Stream, fs2, Monix,... But if you look under the hood and try to understand how those libraries work you might be a bit scared by their complexity!

      In this talk you will learn how to build a very minimal "streaming library" where all the difficult concerns are left to other libraries: eff for asynchronous computations and resources management, origami for extracting useful data out of the stream. Then you will decide how to spend your complexity budget and when you should pay for more powerful abstractions.