Applying functional programming principles to large scale data processing

schedule Oct 9th 12:15 PM - Jan 1st 12:00 AM place Grand Ball Room 1
At Indix, we deal with a stream of unstructured and constantly changing data. This data is processed through a series of systems before being fed as structured input to our analytics system. In this talk, I will walk through our experience of building a large scale data processing system using Hadoop that's focused on immutability, composition and other functional programming principles.
 
6 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

  • challenges involved in dealing with constantly changing data
  • our first "naive" attempt at building a data processing system and the issues we faced
  • our transition from using vanilla hadoop jobs to scalding which effectively converted our map-reduce jobs to mini functional programs that were composable, compact and easy to reason about
  • a walk through of our current architecture which relies on immutability to provide stability and reliability
  • overview of the Lambda architecture (coined by Nathan Marz of Storm fame), which helps us deal with the reconciliation of data from the batch and real-time systems

Learning Outcome

  • how immutability brings sanity and stability when handling ever-changing data
  • how monoids provide a helpful abstraction while aggregating data and how their associativity helps in parallelism
  • how we deal with reconciliation of data from batch processing system (hadoop) and a real-time stream (storm or spark)
  • dealing with updates using joins, timestamps and batch-processing, instead of writes to a random key-value store that does not scale well
  • hopefully convince you to adopt scalding over vanilla hadoop API for writing map-reduce jobs

Target Audience

Anyone who has an interest in dealing with data at scale. A high level knowledge of Hadoop will help, but is not necessary.

schedule Submitted 3 years ago

Comments Subscribe to Comments

comment Comment on this Proposal

  • Liked Venkat Subramaniam
    keyboard_arrow_down

    Venkat Subramaniam - Keynote: The Joy of Functional Programming

    60 Mins
    Keynote
    Intermediate

    It's been around for a long time, but everyone's talking about it all of a sudden. But why and why now? We've been

    programming in languages like Java for a while, quite well. Now we're asked to change and the languages themselves

    are changing towards this style of programming. In this keynote, a passionate polyglot programmer and author of

    "Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions" will share the reasons

    we need to make the paradigm shift and the pure joy—the benefits—we will reap from it.

  • Liked Thomas Gazagnaire
    keyboard_arrow_down

    Thomas Gazagnaire - Compile your own cloud with Mirage OS v2.0

    60 Mins
    Talk
    Intermediate

    Most applications running in the cloud are not optimized to do so. They make assumptions about the underlying operating system, resulting in larger footprints with increased costs and risks.  The open source Mirage OS represents a new approach where the application code is combined with the specific components of the operating system it needs into a single-purpose unikernel appliance. With Mirage OS, developers can create lean and efficient unikernels for secure, cost-effective and high-performance network applications. Mirage OS unikernels run directly on the Xen Project hypervisor, which allows them to be quickly deployed to many leading cloud platforms.

    Mirage OS is fully written in OCaml, from the device drivers and network stack to higher-level synchronisation protocols and databases. In this presentation I will explain how we developed Mirage OS and why we choose to do so in a strongly typed functional language with a powerful module langage. I will then present some of the new features of Mirage OS v2.0 such as: support for ARM devices, Irmin: a Git-like distributed database and OCaml-TLS: a comprehensive implementation of the TLS protocol in pure OCaml. 

  • Liked Ryan Lemmer
    keyboard_arrow_down

    Ryan Lemmer - Realtime Distributed computing: dealing with Time and Failure in the wild

    45 Mins
    Experience Report
    Intermediate

    There is a growing need for scalable, realtime business systems that are continuously running and highly-available. Two very different frameworks/approaches you could use to build such systems are Storm and Akka.

    Systems created with Storm or Akka are distributed, at runtime, on as many machines as you choose. The inherent concurrency implied by this brings the issues of State, Time and Failure into sharp focus. Functional programming has much to say about dealing with state and time; not surprisingly, both Storm and Akka have strong roots in functional languages (for Storm it is Clojure, and for Akka, Scala).

    In this talk we'll explore the core concepts and challenges of distributed computation; the role of functional programming in concurrent distributed computing; we'll take a look at Storm and Akka, by example, and see that as different as these 2 approaches are, the underlying difficulties of distributed computation remains evident in both: dealing with time, and dealing with failure.

  • Liked Dhaval Dalal
    keyboard_arrow_down

    Dhaval Dalal / Ryan Lemmer - Code Jugalbandi

    60 Mins
    Demonstration
    Beginner

    In Indian classical music, we have Jugalbandi, where two lead musicians or vocalist engage in a playful competition. There is jugalbandi between Flutist and a Percussionist (say using Tabla as the instrument). Compositions rendered by flutist will be heard by the percussionist and will replay the same notes, but now on Tabla and vice-versa is also possible.

    In a similar way, we will perform Code Jugalbandi to see how the solution looks using different programming languages and paradigms.

    During the session, Dhaval and Ryan will take turns at coding the same problem using different languages and paradigms. There would be multiple such rounds during the Jugalbandi.

  • Liked Mushtaq Ahmed
    keyboard_arrow_down

    Mushtaq Ahmed - Demystify the Reactive Jargons

    Mushtaq Ahmed
    Mushtaq Ahmed
    Mr Scala
    ThoughtWorks
    schedule 3 years ago
    Sold Out!
    60 Mins
    Demonstration
    Intermediate

    Sync, Async, Blocking, Non-Blocking, Streaming are the buzzwords in the reactive programming world. This talk will attempt to attach some meaning to them. It will also demo the performance and resource consumption patterns for blocking-io, Scala Futures and RxJava Observables for comparable programs. Finally, a command line application that consumes twitter streams API will demo what is possible using the new reactive abstractions.

  • Liked Ramakrishnan Muthukrishnan
    keyboard_arrow_down

    Ramakrishnan Muthukrishnan - An introduction to Continuation Passing Style (CPS)

    60 Mins
    Tutorial
    Intermediate

    Traditionally functions return some value. Someone is waiting for that value and does some computation with it. This "someone" is called the continuation of this value. In a normal functional call, the continuation is "implicit". In the "continuation passing style" (hence forth called with the short form, CPS), we make the continuations explicit. In this style, function definitions take an extra argument called "continuation" and it never return. The "return value" of the function 'continues' by passing this value as an argument to the continuation. Continuations are sometimes called "gotos with arguments".

    CPS is used as an intermediate stage while compiling a program since it makes the control structure of the program explicit and hence can be converted easily to machine code. Another feature of a CPS-transformed function is that it is tail-recursive even if the original function was not written in a tail-recursive style.

    Continuations enable a programmer to build new control operators (if the language's built-in operators does not already provide the control operators the programmer need).

  • 90 Mins
    Tutorial
    Beginner

    Code used during the talk: https://github.com/shashi/fuconf-talk
    Slides: https://docs.google.com/presentation/d/16Xfqd-xU8y2JEN0TIcacDoYnp0b5-W7ESDB5v1SmcXs/edit#slide=id.p

     

    Elm is a strongly typed functional reactive programming (FRP) language that compiles to HTML, CSS, and Javascript. In Elm, the Signal type represents a time-varying value--things like mouse position, keys pressed, current time are signals. With Signals, one can write terse code that is isomorphic to a dataflow diagram of the app. The code hence feels natural and is 100% callback free. All this, with powerful type inference.

    This talk is an introduction to FRP. It explores functionally composing graphics and UIs, and creating interactions and animations with the Signal type. There will also be an overview of Elm’s execution mechanism and the time traveling debugger: a consequence of Elm's purely functional approach.

    While instructive, it will be good fun too, in the spirit of Elm.

  • Liked Vagmi Mudumbai
    keyboard_arrow_down

    Vagmi Mudumbai - Clojurescript and Om - Pragmatic functional programming in the Javascript Land

    Vagmi Mudumbai
    Vagmi Mudumbai
    schedule 3 years ago
    Sold Out!
    60 Mins
    Demonstration
    Beginner

    Javascript programmers have had a lot of choices when it comes to programming. There were days of mootools, scriptaculous and jQuery and then there are now days of Angular, Ember, Knockout and the like. As a javascript programmer myself, I find that Clojurescript/React as Om offers a fresh perspective into building performant Javascript UIs that are easy to write.

    The talk will introduced concepts of React, immutable datastructures in Clojure and live code an application that demonstrates the concepts.

     

  • Liked Premanand Chandrasekaran
    keyboard_arrow_down

    Premanand Chandrasekaran - Functional Programming in Java

    60 Mins
    Workshop
    Beginner

    Functional programming has started (re)gaining prominence in recent years, and with good reason too. Functional programs lend an elegant solution to the concurrency problem, result in more modular systems, are more concise and are easier to test. While modern languages like Scala and Clojure have embraced the functional style whole-heartedly, Java has lagged a bit behind in its treatment of functions as first-class citizens. With the advent of Java 8 and its support for lambdas, however, Java programmers can finally start reaping the power of functional programs as well. Even without Java 8, it is possible to adopt a functional style with the aid of excellent libraries such as Guava.

    This talk will explore how to apply functional concepts using the Java programming language and demonstrate how it can result in simpler, more elegant designs. We will conduct this in a hands-on workshop style with attendants being encouraged to code-along. So bring your favorite Java 8 aware IDE, an open mind and prepare to have a lot of fun.

  • Liked Rahul Goma Phulore
    keyboard_arrow_down

    Rahul Goma Phulore - Object-functional programming: Beautiful unification or a kitchen sink?

    60 Mins
    Talk
    Advanced

    Scala began its life as an experiment to “unify” object-oriented programming and functional programming. Martin Odersky believed that the differences between FP and OO are more cultural than technical, and that there was a room for beautifully unify various ideas from the two into one simple core.

    How successful has Scala been in its goals? Is it the like “the grand unified theory of universe” or like the infamous “vegetarian ham”? [1]

    In this talk, we will see just how Scala unifies various ideas – such as type-classes, algebraic data types, first-class modules, functions under one simple core comprising of traits, objects, implicits, and open recursion. We will how this unification unintendedly subsumes many concepts that require seprate features in other languages, such as functional dependencies, type families, GADTs in Haskell. We will see how this has given a rise to a new “implicit calculus”, which could lay a foundation for next generation of generic programming techniques.

    We will see that this unification comes at a certain cost, wherein it leads to some compromises on both sides. However many of these trade-offs are particular to Scala (largely due to the JVM imposed restrictions). The goal of unification is still noble, and we need not throw the baby out with the bathwater.

    [1]: https://twitter.com/bos31337/status/425524860345778176

  • Liked Venkat Subramaniam
    keyboard_arrow_down

    Venkat Subramaniam - Haskell for Everyday Programmers

    90 Mins
    Talk
    Intermediate

    I learn different languages not to make use of them, but to program in my current languages in a better way. As we adapt functional style of programming in mainstream languages, like Java, C#, and C++, we can learn a great deal from a language that is touted as a purely functional language.

    Haskell is statically typed, but not in a way like Java, C#, or C++. Its static typing does not get in the way of productivity. Haskell quietly does lazy evaluation and enforces functional purity for greater good. Everyday programmers, like your humble speaker, who predominantly code in mainstream languages, can greatly benefit from learning the idioms and style of this elegant language. The next time we sit down to crank out some code in just about any language, we can make use of some of those styles, within the confines of the languages, and move towards a better, functional style.

  • Liked Aditya Godbole
    keyboard_arrow_down

    Aditya Godbole - Learning (from) Haskell - An experience report

    Aditya Godbole
    Aditya Godbole
    CTO
    Vertis Microsystems
    schedule 3 years ago
    Sold Out!
    45 Mins
    Experience Report
    Beginner

    Functional programming as a programming style and discipline is useful even in languages which are not pure functional languages. By practising programming in a pure functional language like Haskell, programmers can drastically improve the quality of code when coding in other languages as well.

    The talk is based on first hand experience of using Haskell in internal courses in our organisation to improve code quality.

    This talk will cover Gofer (one of the earliest variants of Haskell) as a teaching tool, including the choice of the language, the features from Haskell that should (and shouldn't) be covered and the obstacles and benefits of the exercise.

     

  • Liked Rahul Goma Phulore
    keyboard_arrow_down

    Rahul Goma Phulore - Promise of a better future

    45 Mins
    Talk
    Intermediate

    Futures and promises are no strangers to most programmers. They have made their way into major mainstream languages, including but not limited to, Java, C#, JavaScript, and C++.

    This abstraction however is typically dreaded due to the way they’re typically done. Most manifestations lead to the insidious callback hell, and some, like Java’s, are incredulously limited.

    Why can’t we keep our simple, straightforward code, and still have the concurrency benefits of futures and promises?! Turns out we can. Functional programming is here to the rescue! We will see how the seemingly obscure abstractions like monads, delimited continuations allow us to do just that.

    We will also go over some syntactic-transformation based approaches (basically, macros), such as C#’s async-await, and see how it compares with the functional approaches discussed earlier.

  • Liked Anuj
    keyboard_arrow_down

    Anuj - Embedded Erlang : Experiments with the BeagleBone Black

    45 Mins
    Demonstration
    Intermediate

    There is a close similiarity in the way which embedded systems are developed, and the paradigms involved in functional programming. 

    Error's will ALWAYS occur & The system must NEVER go down

    With the rise of homebrewed 3D printing, UAVs and IoT, the need for creating a robust programming environment is more than ever. It becomes close to impossible for casual dabblers to take their ideas to a level beyond the "maker"ware.

    In this talk, I will be talking about different efforts under embedded Erlang. There will be a demo on the process involved in building a simple application i.e. LED blink. I will then show the key highlights of a slightly more complicated application, and then how one would go about designing an embedded application.

    Perhaps it's too early to tell, but I think something like the Nerves Project is the answer here. We need a community driven platform for creating great embedded applications using Erlang.

  • Liked Shakthi Kannan
    keyboard_arrow_down

    Shakthi Kannan - Lambdaaaaaaaaaa Calculus

    Shakthi Kannan
    Shakthi Kannan
    DevOps Engineer
    Self
    schedule 3 years ago
    Sold Out!
    60 Mins
    Talk
    Beginner

    This talk is an introduction on lambda calculus and will address the foundations of functional programming languages.

    We will learn the building blocks of lambda calculus - syntax, rules, and application.

  • Liked Mushtaq Ahmed
    keyboard_arrow_down

    Mushtaq Ahmed - Typeclasses as objects and implicits

    Mushtaq Ahmed
    Mushtaq Ahmed
    Mr Scala
    ThoughtWorks
    schedule 3 years ago
    Sold Out!
    60 Mins
    Tutorial
    Intermediate

    Haskell has populairzed typelcasses a principled way to add ad-hoc extensions on existing data types. They allow you to 1) add new operations on existing data types and 2) support new data types on existing operations, and thus solve the famous "expression problem".

    There is a lot of similarily between typeclasses and Java good practices of programming to interfaces and preferring composition over inheritance. The missing link is the implicit dictionary passing which allows haskell to be much more concise and expressive.

    In this tutorial, we will look at how Scala adopts typeclasses by adding the missing link of implicits.

  • Liked Mushtaq Ahmed
    keyboard_arrow_down

    Mushtaq Ahmed - Functions as objects

    Mushtaq Ahmed
    Mushtaq Ahmed
    Mr Scala
    ThoughtWorks
    schedule 3 years ago
    Sold Out!
    60 Mins
    Tutorial
    Beginner

    Objected oriented languages like Scala support the paradigm of programming with functions. How does it work? The language has to conceptually map functions to objects. This tutorial will explain this idea by starting with a simple Java-like code and progressively refactor it to make use of higher order functions. As a result, you will learn about a few syntax sugars and also the cost implication of using objects to represent functions.