Functional Programming in ... SQL?

schedule May 14th 11:40 AM - 12:10 PM place Green Room people 1 Interested

Many real-world lots-of-business-value-providing systems use a relational database. (Even more of them should!) Often, that database is used as a dumb data store – nothing more, logically, than an ACID-compliant coordinator of multiple flat files (tables). We send it basic queries – sometimes even joining multiple tables in one query! – inserts, updates and deletes. But nothing that might strain its little brain. Often, this is a mistake: a modern relational database is the most sophisticated data-munging tool in our toolkit!

We should consider doing more work in the database itself. But that's not easy to code well. How can we make our more complex SQL code easier to reason about, more reliable, and more testable? How can we make the overall system simpler?

These are questions that in not-the-database contexts, we solve with functional programming techniques. Without expecting SQL to out-lambda Haskell, are there techniques that we can borrow from functional programming and apply to improving our SQL?

 
1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Talk

Where are we going?

I would like to convince you that you should consider writing SQL like this:

(Note for reviewers: I'm working on a compelling concrete example which isn't proprietary/specific to Rokt; the below made-up rubbish is purely a "yeah, something vaguely like this".)


select to_json( ( fb.foo
, fb.bar
) :: myapp_types.foobar
),
sum(fb.baz) as count,
q.quux
from myapp.analyze_foobars( id => 12345
, start => '2019-01-01'
, end => '2019-04-01'
, name => 'Captain FooBar'
) fb
left join lateral myapp.aggregate_quuxes(arg => fb.foo_and_bar) q on true
where myapp.is_worth_returning( threshold => 9
, bloop => fb.bloop
)
group by 1, 3;

... and that you can and should build the SQL code required to support this sort of query in a genuinely "capital-FP Functional" style.


Motivation – why code in the DB?

Getting the maximum benefit out of your database requires more than "don't do 'n+1 selects'".

Modern databases are easier to scale (vertically and horizontally) than back in the Java EE era, when it was (apparently) imperative to make sure that the DB did as little work as possible, so that all scaling could happen in the application tier. It's reasonable to consider having them do more work.


Motivation – SQL + FP = ???

Writing complex SQL is hard – like many FP languages, it has a high "thinking time" to "typing time" ratio.

FP techniques are often fairly general approaches to taming complexity, aimed at:

  • reasoning about code ...
  • ... with the minimal possible context

Stored procedures are normally – unfortunately! – written in an extremely imperative style. But that doesn't have to be the case.


What FP techniques can we adopt?

  • function composition (in the "output of this function is passed to / joined with this function" sense)
  • pure functions
  • immutability (!!! – with appropriate modelling)
  • referential transparency (with query planner smarts to avoid redundant work)
  • pass-by-value, not -by-reference (it's not as natural in the DB, but just as valuable)
  • explicit modelling of time (with appropriate modelling)

In fact, pretty much anything you can achieve with map/reduce – that paragon of functional programming – is naturally well-suited to SQL!


Let's not over-promise

There are FP techniques we're not talking about:

  • higher-order functions (you can pass function names and invoke dynamic SQL with exec: please don't!)
  • point-free function composition (in the f = g ⋅ h sense)
  • abstraction (in eg the typeclass sense)
  • aggressive use of recursion (with recursive is really for specialised use cases only)
  • static types (though actually DBs are not bad for this, especially with custom domains)


Typical, non-FP SQL code (stored procedures)

  • extremely imperative style
    • lots of loops / cursors
    • intermediate (table) variables / temporary tables with mutation
  • ... and triggers are even worse


The FP SQL approach

  • define a coherent, minimal API
  • "functional core, imperative shell": pull together the heavy lifting at the outermost level, where possible ...
  • ... and inside that, favour referentially transparent functions
  • write functions of one SQL statement – the SQL equivalent of a single Lisp / lambda calculus expression
    • this means language sql rather than language plpgsql

Technique: function "composition" / factoring

Example-based exposition of function used as "from" relation, with and without lateral join to another function

Example-based exposition of a table-returning function also calling another (factored-out) function


Technique: writing pure SQL functions

Example-based exposition of use of pure functions as query predicates

Example-based exposition of use of pure functions for result transformation


Technique: immutability

Example-based exposition of including timestamps/versions to provide a history-retaining, append-only data store


What do I get out of this?

  • ability to reason about code
  • a well-defined API (that you can migrate with transactional DDL!)
  • testability (with much faster tests)
  • better optimisation from the query planner
  • a smaller codebase


How does testing work?

  • pgTAP
  • property-based tests with "parallelism" via set operations
  • test in transactions, roll-back on test completion to restore predictable state


Downsides & pitfalls

  • reading query plans to debug poor performance
  • difficult to be confident that the planner will consistently pick the plan you want
  • tooling support

Learning Outcome

I hope that after this talk you do more of your data-munging work directly in SQL.

I hope you consider defining an API of SQL functions/procedures, with function names, parameter and return types, rather than having your application send whatever SQL text it feels like dynamically generating over the wire.

Most importantly, I hope you implement each function in that API functionally: using an individual SQL statement, making use of neatly factored-out, referentially-transparent helper functions, to operate on its outputs and predicates in obvious, easy-to-reason-about ways.

I hope you consider property-based testing your database API.

Target Audience

Functional programmers of systems which use a database

Prerequisites for Attendees

This talk will assume a reasonable working knowledge of SQL. I will be using PostgreSQL to demonstrate techniques, and some techniques may not translate directly to other database systems, but users of other database systems should still be able to follow along without difficulty.

schedule Submitted 1 month ago