Functional Programming in ... SQL?
Many real-world lots-of-business-value-providing systems use a relational database. (Even more of them should!) Often, that database is used as a dumb data store – nothing more, logically, than an ACID-compliant coordinator of multiple flat files (tables). We send it basic queries – sometimes even joining multiple tables in one query! – inserts, updates and deletes. But nothing that might strain its little brain. Often, this is a mistake: a modern relational database is the most sophisticated data-munging tool in our toolkit!
We should consider doing more work in the database itself. But that's not easy to code well. How can we make our more complex SQL code easier to reason about, more reliable, and more testable? How can we make the overall system simpler?
These are questions that in not-the-database contexts, we solve with functional programming techniques. Without expecting SQL to out-lambda Haskell, are there techniques that we can borrow from functional programming and apply to improving our SQL?
Outline/Structure of the Talk
Where are we going?
I would like to convince you that you should consider writing SQL like this:
(Note for reviewers: I'm working on a compelling concrete example which isn't proprietary/specific to Rokt; the below made-up rubbish is purely a "yeah, something vaguely like this".)
select to_json( ( fb.foo
) :: myapp_types.foobar
sum(fb.baz) as count,
from myapp.analyze_foobars( id => 12345
, start => '2019-01-01'
, end => '2019-04-01'
, name => 'Captain FooBar'
left join lateral myapp.aggregate_quuxes(arg => fb.foo_and_bar) q on true
where myapp.is_worth_returning( threshold => 9
, bloop => fb.bloop
group by 1, 3;
... and that you can and should build the SQL code required to support this sort of query in a genuinely "capital-FP Functional" style.
Motivation – why code in the DB?
Getting the maximum benefit out of your database requires more than "don't do 'n+1 selects'".
Modern databases are easier to scale (vertically and horizontally) than back in the Java EE era, when it was (apparently) imperative to make sure that the DB did as little work as possible, so that all scaling could happen in the application tier. It's reasonable to consider having them do more work.
Motivation – SQL + FP = ???
Writing complex SQL is hard – like many FP languages, it has a high "thinking time" to "typing time" ratio.
FP techniques are often fairly general approaches to taming complexity, aimed at:
- reasoning about code ...
- ... with the minimal possible context
Stored procedures are normally – unfortunately! – written in an extremely imperative style. But that doesn't have to be the case.
What FP techniques can we adopt?
- function composition (in the "output of this function is passed to / joined with this function" sense)
- pure functions
- immutability (!!! – with appropriate modelling)
- referential transparency (with query planner smarts to avoid redundant work)
- pass-by-value, not -by-reference (it's not as natural in the DB, but just as valuable)
- explicit modelling of time (with appropriate modelling)
In fact, pretty much anything you can achieve with map/reduce – that paragon of functional programming – is naturally well-suited to SQL!
Let's not over-promise
There are FP techniques we're not talking about:
- higher-order functions (you can pass function names and invoke dynamic SQL with
exec: please don't!)
- point-free function composition (in the
f = g ⋅ hsense)
- abstraction (in eg the typeclass sense)
- aggressive use of recursion (
with recursiveis really for specialised use cases only)
- static types (though actually DBs are not bad for this, especially with custom domains)
Typical, non-FP SQL code (stored procedures)
- extremely imperative style
- lots of loops / cursors
- intermediate (table) variables / temporary tables with mutation
- ... and triggers are even worse
The FP SQL approach
- define a coherent, minimal API
- "functional core, imperative shell": pull together the heavy lifting at the outermost level, where possible ...
- ... and inside that, favour referentially transparent functions
- write functions of one SQL statement – the SQL equivalent of a single Lisp / lambda calculus expression
- this means
language sqlrather than
- this means
Technique: function "composition" / factoring
Example-based exposition of function used as "from" relation, with and without lateral join to another function
Example-based exposition of a
table-returning function also calling another (factored-out) function
Technique: writing pure SQL functions
Example-based exposition of use of pure functions as query predicates
Example-based exposition of use of pure functions for result transformation
Example-based exposition of including timestamps/versions to provide a history-retaining, append-only data store
What do I get out of this?
- ability to reason about code
- a well-defined API (that you can migrate with transactional DDL!)
- testability (with much faster tests)
- better optimisation from the query planner
- a smaller codebase
How does testing work?
- property-based tests with "parallelism" via set operations
- test in transactions, roll-back on test completion to restore predictable state
Downsides & pitfalls
- reading query plans to debug poor performance
- difficult to be confident that the planner will consistently pick the plan you want
- tooling support
I hope that after this talk you do more of your data-munging work directly in SQL.
I hope you consider defining an API of SQL functions/procedures, with function names, parameter and return types, rather than having your application send whatever SQL text it feels like dynamically generating over the wire.
Most importantly, I hope you implement each function in that API functionally: using an individual SQL statement, making use of neatly factored-out, referentially-transparent helper functions, to operate on its outputs and predicates in obvious, easy-to-reason-about ways.
I hope you consider property-based testing your database API.
Functional programmers of systems which use a database
Prerequisites for Attendees
This talk will assume a reasonable working knowledge of SQL. I will be using PostgreSQL to demonstrate techniques, and some techniques may not translate directly to other database systems, but users of other database systems should still be able to follow along without difficulty.