Experience Report: Building Shin - A Typed Functional Compiler For Computational Linear Algebra Problems.

schedule Dec 15th 10:45 AM - 11:30 AM place Crystal 1 people 18 Interested

Abstract: I wrote a distributed (mostly) Functional Compiler in Scheme, OCaml and Elixir that incorporates knowledge of Computational Linear Algebra and domain specific knowledge to generate highly optimized linear algebra code from specification of problems. This talk is about lessons learned in the process.

The problem:
In every domain that uses computational linear algebra (which is all of engineering and science), we encounter the 'how to optimize a linear algebra expression into an optimized sequence of BLAS (or LAPACK or $linear_algera library) kernel calls' problem.

Example: (if the math equations make you want to tear your hair out and go jump off a cliff, don't worry, it is just an example, you don't have to grok it. Just skim the equations The basic problem being addressed here is that solving such equations with code takes up a lot of effort and time from experts in computational linear algebra)


Here is a linear algebra expression from a genetics problem , specifically GWAS -Genome Wide Association Studies, looking for significant associations for millions of genetic markers- where the essence of the problem [1] comes down to generating the most efficient algorithm possible that solves these equations

This in turn involves solving a 2 dimensional sequence of Generalized Least Squared Problems of the form

The algorithms to solve these can be directly coded up in Matlab or Julia. But there are problems with this approach, with this specific problem.


1. For different input sizes, different algorithms give the most optimal performance. Which algorithm do you code up?


2. Even for a given input size, there are multiple algorithms that compute the same result, but have differing computational characteristics depending on the hardware etc. How do you generate the optimal algorithm for your hardware ?


3. Most importantly the structure of *this* specific problem allows optimizations that are specific to the problem which are not built into generic linear algebra routines. (Obviously, one can't expect MATLAB to incorporate problem specific information for every scientific/engineering problem ever). The GLS problems are connected to others, thus saving intermediate results can save hours of computation vs calculating every GLS problem from scratch

In practice, one needs to be an expert in Computational Linear Algebra to come up with the optimized algorithm for a domain specific problem, and then write (say Fortran) code to use BLAS, LAPACK etc optimally to actualize this algorithm, often with much iteration, often consuming 100s of hours.

The Solution:


Incorporating this 'expert knowledge' into a compiler speeds up the time taken to arrive at the best solution (often by a factor of 100 or 1000), and allows Computational Linear Algebra experts to do more interesting things, like focus on their research.

For this particular problem, the above equations, and additional knowledge of the problem domain are the input into an expression compiler. The output is highly efficient and 'proved correct' code

In compiler terms, incorporating domain knowldege into the compilation process results in being able to apply optimizations to the generated Syntax Trees/Graphs, resulting in optimal algorithms. (note: the output of the compiler is a program in another language- say Matlab).

In essence, "Domain Specific Compilers" consume knowledge about the structure of a problem and generate optimized code that solves that problem.

Shin is one such compiler. It consumes a problem description and outputs highly efficient Julia code that solves the problem.

This talk focuses on the engineering challenges I faced in building this compiler, with a special focus on the approaches that failed [5]

Trivia:

"Shin" is the Hebrew letter, not the English word meaning 'front of the leg between knee and ankle' ;-).

Every company uses names from a common theme to name their servers and components - Athena, Zeus, Hercules , or Thor, Loki, Odin, or Jedi, Sith, Skywalker etc. We use Hebrew words, so we have Ruach, Melekh, Malkuth etc..

 
6 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

(This is tentative and will probably change before the conf)

1. The Problem of Computing with Linear Algebra Expressions

2. Solving the problem with building Domain Knowledge into the compilation process

3. Existing Solutions

4. Solving with Functional Programming.

5. Iteration 1 - Prototyping with Scheme. Macros vs Types. Choose One.

6. Iteration 2 - Addressing the ' all lisp programs eventually need types" problem. Transitioning from Scheme to ML.

6. Iteration 3 - Addressing the "all programs eventually need to scale" problem. Enter Elixir.

7. Iteration 4. Julia as target language (vs C/Fortran). Challenges and Opportunities.

8. Successes - What worked.

9. Failures and Lessons Learned - What didn't work and why

10. What remains to be done. Future work

Learning Outcome

- Understanding a deep *problem* in Computational Linear Algebra. (Understanding a problem pays more dividends over the long term than understanding specific solutions)

- Understanding the challenges of building an production grade (vs toy) compiler, and integrating into an industrial (vs academic) product pipeline

- The joys and sorrows of using Scheme, OCaml and Elixir to build compilers. How to (attempt to) combine the strengths of each. The costs of such an approach.

- (Time permitting) Interesting subproblems in computation graph processing, tree pattern matching algorithms, symbolic computing, formally specifying [6] compiler sub components etc

Target Audience

Anyone interested in Linear Algebra, Compilers, or FP

Prerequisite

A few years of programming experience (not necessarily in FP)

If you have built a compiler, or work in a domain where Computational Linear Algebra is important, you'll appreciate some of the nuances in more depth than if you don't, but I'll be presenting all necessary background.

schedule Submitted 6 months ago

Comments Subscribe to Comments

comment Comment on this Submission

  • Liked Michael Snoyman
    keyboard_arrow_down

    Michael Snoyman - Functional Programming for the Long Haul

    45 Mins
    Keynote
    Beginner

    How do you decide whether a programming language is worth using or not? By necessity, such decisions are usually based on assessments that can be made relatively quickly: the ease of using the language, how productive you feel in the first week, and so on. Unfortunately, this tells us very little about the costs involved in continuing to maintain a project past that initial phase. And in reality, the vast majority of time spent on most projects is spent in those later phases.

    I'm going to claim, based on my own experience and analysis of language features, that functional programming in general, and Haskell in particular, are well suited for improving this long tail of projects. We need languages and programming techniques that allow broad codebase refactorings, significant requirements changes, improving performance in hotspots of the code, and reduced debug time. I believe Haskell checks these boxes.

  • Liked Raghu Ugare
    keyboard_arrow_down

    Raghu Ugare / Vijay Anant - (Why) Should You know Category Theory ?

    45 Mins
    Talk
    Intermediate

    Category Theory has been found to have a vast field of applications not limited to programming alone.

    In this fun-filled talk (Yes! We promise!) , we want to make the audience fall in love with Math & Category Theory in general, and Haskell in particular.

    We will address questions such as below:

    • What is the mysterious link between the abstract mathematical field of Category Theory and the concrete world of real-world Programming ? And why is it relevant especially in Functional Programming?
    • Most of all, how can You benefit knowing Category Theory ? (Examples in Haskell)

  • Liked Harendra Kumar
    keyboard_arrow_down

    Harendra Kumar - High Performance Haskell

    45 Mins
    Talk
    Intermediate

    Haskell can and does perform as well as C, sometimes even better. However,
    writing high performance software in Haskell is often challenging especially
    because performance is sensitive to strictness, inlining and specialization.
    This talk focuses on how to write high performance code using Haskell. It is
    derived from practical experience writing high performance Haskell libraries. We
    will go over some of the experiences from optimizing the "unicode-transforms"
    library whose performance rivals the best C library for unicode normalization.
    From more recent past, we will go over some learnings from optimizing and
    benchmarking "streamly", a high performance concurrent streaming library. We
    will discuss systematic approach towards performances improvement, pitfalls and
    the tools of the trade.

  • Liked Tony Morris
    keyboard_arrow_down

    Tony Morris - Parametricity, Functional Programming, Types

    45 Mins
    Talk
    Intermediate

    In this talk, we define the principle of functional programming, then go into
    detail about what becomes possible by following this principle. In particular,
    parametricity (Wadler, 1989) and exploiting types in API design are an essential
    property of productive software teams, especially teams composed of volunteers
    as in open-source. This will be demonstrated.

    Some of our most important programming tools are neglected, often argued away
    under a false compromise. Why then, are functional programming and associated
    consequences such as parametricity so casually disregarded? Are they truly so
    unimportant? In this talk, these questions are answered thoroughly and without
    compromise.

    We will define the principle of functional programming, then go into
    detail about common problems to all of software development. We will build the
    case from ground up and finish with detailed practical demonstration of a
    solution to these problems. The audience should expect to walk away with a
    principled understanding and vocabulary of why functional programming and
    associated techniques have become necessary to software development.

  • Liked Michael Ho
    keyboard_arrow_down

    Michael Ho - Making the Switch: How We Transitioned from Java to Haskell

    Michael Ho
    Michael Ho
    Sr. Software Engineer
    SumAll
    schedule 5 months ago
    Sold Out!
    45 Mins
    Case Study
    Intermediate

    In this case study presentation, SumAll's CTO, Todd Sundsted, and Senior Software Engineer, Michael Ho, will discuss the move from Java to Haskell along two parallel paths. First, the business/political story — how SumAll convinced the decision makers, fought the nay-sayers, and generally managed the people impacted by the transition. Second, the technical story — how they actually replaced their Java code with Haskell code. Along the way, they will address their hopes and expectations from transitioning from Java to Haskell, and will conclude with the results they've gained and seen to date.

  • Liked Debasish Ghosh
    keyboard_arrow_down

    Debasish Ghosh - Managing Effects in Domain Models - The Algebraic Way

    45 Mins
    Talk
    Intermediate

    When we talk about domain models, we talk about entities that interact with each other to accomplish specific domain functionalities. We can model these behaviors using pure functions. Pure functions compose to build larger behaviors out of smaller ones. But unfortunately the real world is not so pure. We need to manage exceptions that may occur as part of the interactions, we may need to write stuff to the underlying repository (that may again fail), we may need to log audit trails and there can be many other instances where the domain behavior does not guarantee any purity whatsoever. The substitution model of functional programming fails under these conditions, which we call side-effects.

    In this session we talk about how to manage such impure scenarios using the power of algebraic effects. We will see how we can achieve function composition even in the presence of effects and keep our model pure and referentially transparent. We will use Scala as the implementation language.

    In discussing effects we will look at some patterns that will ensure a clean separation between the algebra of our interface and the implementation. This has the advantage that we can compose algebras incrementally to build richer functionalities without committing to specific implementations. This is the tagless final approach that offers modularity and extensibility in designing pure and effectful domain models.

  • Liked George Wilson
    keyboard_arrow_down

    George Wilson - Laws!

    45 Mins
    Talk
    Beginner

    Laws, laws, laws. It seems as though whenever we learn about a new abstraction in functional programming, we hear about its associated laws. Laws come up when we learn about type classes like Functors, Monoids, Monads, and more! Usually laws are mentioned and swiftly brushed past as we move on to examples and applications of whatever structure we're learning about. But not today.

    In this talk, we'll learn about Functors and Monoids, paying close attention to their laws. Why should our abstractions have laws? We'll answer this question both by seeing powers we gain by having laws, and by seeing tragedies that can befall us without laws.

  • Liked Michael Snoyman
    keyboard_arrow_down

    Michael Snoyman - Applied Haskell Workshop

    Michael Snoyman
    Michael Snoyman
    Vice President of Engineering
    FP Complete
    schedule 6 months ago
    Sold Out!
    480 Mins
    Workshop
    Intermediate

    This full day workshop will focus on applying Haskell to normal, everyday programming. We'll be focusing on getting comfortable with common tasks, libraries, and paradigms, including:

    • Understanding strictness, laziness, and evaluation
    • Data structures
    • Structuring applications
    • Concurrency and mutability
    • Library recommendations

    By the end of the workshop, you should feel confident in working on production Haskell codebases. While we obviously cannot cover all topics in Haskell in one day, the goal is to empower attendees with sufficient knowledge to continue developing their Haskell skillset through writing real applications.

  • Liked Luka Jacobowitz
    keyboard_arrow_down

    Luka Jacobowitz - Testing in the world of Functional Programming

    Luka Jacobowitz
    Luka Jacobowitz
    Software Engineer
    codecentric AG
    schedule 6 months ago
    Sold Out!
    45 Mins
    Demonstration
    Intermediate

    Testing is one of the most fundamental aspects of being a software developer. There are several movements and communities based on different methodologies with regards to testing such as TDD, BDD or design by contract. However, in the FP community testing is often not a large topic and is often glossed over. While it’s true that testing in functional programming tends to be less important, there should still be more resources on how to create tests that add actual value.

    This talks aims to provide exactly that, with good examples on how to leverage property based testing, refinement types and the most difficult part: figuring out how to test code that interacts with the outside world.

  • Liked Tony Morris
    keyboard_arrow_down

    Tony Morris - Let's Lens

    Tony Morris
    Tony Morris
    Sr. Software Engineer
    Queensland FP Lab, Data61
    schedule 4 months ago
    Sold Out!
    480 Mins
    Workshop
    Intermediate

    Let's Lens presents a series of exercises, in a similar format to the Data61 functional programming course material. The subject of the exercises is around the concept of lenses, initially proposed by Foster et al., to solve the view-update problem of relational databases.

    The theories around lenses have been advanced significantly in recent years, resulting in a library, implemented in Haskell, called lens.

    This workshop will take you through the basic definition of the lens data structure and its related structures such as traversals and prisms. Following this we implement some of the low-level lens library, then go on to discuss and solve a practical problem that uses all of these structures.

  • Liked Debasish Ghosh
    keyboard_arrow_down

    Debasish Ghosh - Power of Functions in a Typed World on the JVM

    90 Mins
    Tutorial
    Beginner

    John Carmac once mentioned on twitter that "Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function." In this talk we will discuss the power of functions in modeling real world applications on the JVM. When we say functions, we mean "pure" functions as in the world of mathematics.

    Functions model behaviors, functions compose to build larger functions, and combined with a powerful type system allow us to abstract over the generalities in defining real world domain models. The combination of functions along with algebraic data types has proven to be extremely useful in designing large scale systems that are modular and extensible.

    Scala is a typed functional (well, almost) language on the JVM. In this session we will discuss how the functional features of Scala offer many of the above benefits in designing large scale systems. If you are coming from an OO background, you will appreciate how an alternative approach to programming can make your systems simpler to design, implement and maintain.

  • Liked Tanmai Gopal
    keyboard_arrow_down

    Tanmai Gopal - Using Haskell to build a performant GraphQL to SQL compiler

    Tanmai Gopal
    Tanmai Gopal
    Founder
    Hasura
    schedule 4 months ago
    Sold Out!
    45 Mins
    Case Study
    Intermediate
    1. Motivation/Problem statement: Lifecycle of a GraphQL query
    2. Design Goals
    3. Why Haskell
    4. Compiler implementation details:
      1. Fast GraphQL parsing with parser combinators
      2. Modelling and manipulating the GraphQL AST with algebraic data types
      3. Software Transactional Memory: Concurrency constructs for scaling GraphQL subscriptions
    5. Summary with performance benchmarks
  • Liked Manoj Govindan
    keyboard_arrow_down

    Manoj Govindan - Going Forth to Erlang

    Manoj Govindan
    Manoj Govindan
    Director, Engineering
    Perfios
    schedule 6 months ago
    Sold Out!
    45 Mins
    Case Study
    Intermediate

    Forth is a classic imperative stack-based programming language that fills a very specific niche. Erlang is a concurrent, functional, fault-tolerant programming language that occupies another, albeit wider, niche. There is very little in common between the two except for a shared respect for Alan Turing. This session is a case study that narrates the story of building a bridge between the two languages. The bridge allows for Forth nodes to run in a Erlang controlled world, with the ability to talk to Erlang nodes, send messages, invoke processes and so on.

    In particular we'll discuss the various threading techniques used in Forth, listed below. We'll discuss how the choice of threading model impacts the design of the Erlang-Forth bridge.

    Indirect Threaded Code (ITC)

    A classical Forth threading technique. All the other threading schemes are considered "improvements" on this. Consider the following Forth expression:

      : SQUARE DUP * ;

    In a typical ITC Forth this would appear in memory as shown in the image below:

    Figure 1

    Direct Threaded Code (DTC)

    This model is different from ITC in one respect: the Code Field contains actual machine code, rather than the address of some machine code.

    Subroutine Threaded Code (STC)

    This model relies on the following fact: a high-level Forth definition is nothing but a list of subroutines to be executed.

    Token Threaded Code (TTC)

    This model optimizes for size, at the cost of speed. A token-threaded Forth keeps a table of addresses of all Forth words. The token value is then used to index into this table, to find the Forth word corresponding to a given token.

  • Liked Andrew McCluskey
    keyboard_arrow_down

    Andrew McCluskey - Property-based State Machine Testing

    45 Mins
    Talk
    Intermediate

    Automated testing is key to ensuring the ongoing health and well-being of any software project,giving developers and users confidence that their software works as intended. Property based testing is a significant step forward compared to traditional unit tests, exercising code with randomly generated inputs to ensure that key properties hold. However, both of these techniques tend to be used at the level of individual functions. Many important properties of an application only appear at a higher level, and depend on the state of the application under test. The Haskell library hedgehog, a relative newcomer to the property based testing world, includes facilities for property-based state machine testing, giving developers a foundation on which to build these more complicated tests.

    In this talk, Andrew will give you an introduction to state machine property testing using hedgehog. He'll start with a quick refresher on property based testing, followed by a brief introduction to state machines and the sorts of applications they can be used to model. From there, he'll take you on a guided tour of hedgehog's state machine testing facilities. Finally, Andrew will present a series of examples to show off what you can do and hopefully give you enough ideas to start applying this tool to your own projects. The first set of examples will test a web application written in Haskell. These tests will include: content creation and deletion, uniqueness constraints, authentication, and concurrent transactions. A second set of examples will test an application written in a language other than Haskell to demonstrate that this technique is not limited to applications written in Haskell.

    An intermediate knowledge of Haskell and familiarity with property based testing will be beneficial,but not essential. The slides and demo application will be available after the talk for people to study in detail.

  • Liked Gabriel Volpe
    keyboard_arrow_down

    Gabriel Volpe - Cats Effect: The IO Monad for Scala

    Gabriel Volpe
    Gabriel Volpe
    Software Engineer
    Paidy Inc
    schedule 6 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    Since the first introduction of [Cats Effect](https://typelevel.org/cats-effect/) many things have changed taking its design and performance to a whole new level.

    Come join me and learn how to deal with side effects in a pure functional way while abstracting over the effect type and taking composition to the next level. We will review the most important features such as synchronous and asynchronous computations, error handling, safe resource management, concurrency, parallelism and cancellation starting by reviewing the basic concepts.

  • Liked Ram Dayal Vaishnav
    keyboard_arrow_down

    Ram Dayal Vaishnav - WebVR for creating immersive Virtual Reality experience

    Ram Dayal Vaishnav
    Ram Dayal Vaishnav
    Sr. Software Engineer
    Arcesium
    schedule 7 months ago
    Sold Out!
    45 Mins
    Demonstration
    Beginner

    Have you noticed that the development of Virtual Reality experiences has started a major makeover of the computer industry? I believe that Virtual Reality is going to become a primary platform soon, which will completely change the way we work, play and communicate digitally. The Web, being the most open platform, is now a key participant for providing cool Virtual Reality experiences. I would love to talk about Building Virtual Reality websites using A-Frame in this version of FunctionalConf.

    A-Frame is an open-source web framework by Mozilla for easily creating VR experiences using WebVR which work on all platforms. In this session, audience will learn to use various concepts & APIs of A-Frame through demos and live coding few WebVR scenes. With this, you will be able to create interactive and immersive VR websites on the web. A-Frame has incorporated functional programming in various ways, one such implementation was discussed in https://github.com/aframevr/aframe/issues/2012. I will talk about few of these functional programming related aspects related to A-Frame.

    This session will also cover following:

    - How A-Frame is different from ReactVR
    - Keys to make your VR website to have immersive experience
    - How can one get involved with the A-Frame community to contribute in its development.
    - Step by step example towards how a piece of code can be shifted towards being functional in A-Frame.

  • Liked Ankit Rokde
    keyboard_arrow_down

    Ankit Rokde - My Haskell Program does not fail. Proof?

    Ankit Rokde
    Ankit Rokde
    Student
    IIT Bombay
    schedule 5 months ago
    Sold Out!
    20 Mins
    Talk
    Intermediate

    Proving correctness of programs and ensuring they are bug-free has always been a challenging problem.

    Mostly we have relied on manual testing to check the correctness of programs.

    Strong static type systems help us to write bug free programs from the start but many interesting cases can miss out.

    Many tools such as QuickCheck, Liquid Haskell have been developed to address this issue.

    In this talk, we will presenting a different approach, Bounded Model Checking (BMC), which has been very successful in proving correctness of imperative programs by means of tools such as CBMC.

    We will explain how BMC works at high level, how we have adopted it for Haskell and our success with the same.

    We will also present how you can use it to prove correctness of your Haskell programs.

  • Liked Aloïs Cochard
    keyboard_arrow_down

    Aloïs Cochard - The Magnum Opus

    45 Mins
    Talk
    Beginner

    From Ancient Egypt to the Middle Ages humanity lost it's way in the quest to find the philosopher's stone.

    While following the recent advance in machine learning one might think that we are running in that same quest again,
    only differences this time are that our philosopher's stone is deep learning and the promise is general artificial intelligence instead of immortality.

    The current machine learning ecosystem is mainly based on python and pretty much feels like alchemy,
    lot of trial and errors, lack of tooling and good engineering practices, ...

    Let's take a tour of the current ecosystem and see how can we do better and safer high performance machine learning using Haskell!

  • Liked Gabriel Volpe
    keyboard_arrow_down

    Gabriel Volpe - Beautiful Folds

    Gabriel Volpe
    Gabriel Volpe
    Software Engineer
    Paidy Inc
    schedule 6 months ago
    Sold Out!
    45 Mins
    Demonstration
    Intermediate

    A journey from statistics computations to composable streaming folds.

    Come join me and learn by example about the beauty of generic folds and streaming data. I'll be revisiting these concepts, talking about its origins in Haskell and demonstrating the power of them by showing an example of a program that runs statistics computations using the Fs2 and Origami Scala libraries.

    Inevitably, I'll also be talking about performance, concurrency, parallelism and some type classes such as Foldable, Applicative and Monoid, you won't be bored!