Snake Wrangling - The journey of designing `hpython`

schedule May 22nd 11:05 - 11:35 AM place Green Room people 87 Interested

`hpython` is a Haskell library implementing a parser, printer, and
syntax tree for Python, written to allow easy validation and refactoring of complete Python programs.

As examples, we will look at functions which:

* Replace tabs with spaces
* Fix mutable default arguments, and
* Turn tail recursion into loops

These functions can also be composed, so a user can mix and match refactor-functions,
or write their own.

The library wasn't always like this, though. We've had ups and downs, and actually ended
up throwing out the entire previous design, which was months old at that point. I'm going
to share this journey: how the project goals evolved, failures and successes, and the lessons
I learned along the way. Key points include:

* The real-world value of property testing
* The shortcomings of advanced type-level techniques and compiler-enforced correctness
* The difference between "making it difficult to do the wrong thing" vs. "making it easy to do the right thing"

1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist

Outline/Structure of the Talk

To begin, I will explain why we decided to write this library in the first place
(essentially the situation in the abstract). After that, I will touch on our two initial goals:

* syntactically-correct by construction AST, and
* formatting-preserving round-trip (`parse . print = id`)

and why we considered them important.

Next I'll talk about some technical details.

Firstly, the importance of property testing. Keeping the
whole Python language reference in your head is too difficult, so I wrote a property that hooked into
the Python interpreter to tell me if I made a mistake with the AST. It is such an immense boon to correctness,
and I want to get that across.

Secondly, the "advanced" type
machinery I was using (GADTs, DataKinds and singletons) to achieve correctness-by-construction. I won't try to
teach people how to use these things for themselves; I want to build a feeling of tension. Every time I encountered
a quirk in the Python syntax, the complexity increased.

After this is the "crisis point". I have some examples of things I wanted to be able to do, like the refactoring examples
from the abstract, and the design up to this point makes them too difficult to achieve.
The first big lesson comes out here:

* Model what *should be*, not *is*

Python is poorly designed, and I was hamstringing myself by matching the literal design of Python. I will show
the simplified version, how that better captures the "essence" of Python, and that it enables powerful refactoring
using lenses, prisms and Plated.

Next I point out that the new design is problematic because it sacrifices correctness,
and show how I address this with "opt-out correctness". I modify the syntax tree so that users can express
type-level preconditions on it e.g. "has correct indentation" or "is syntactically correct". It is no longer
correct by construction at the lowest level, but the "easy" way to use the library ensures that the types are
truthful about the correctness of the syntax tree. In other words, the user has to go out of their way to create nonsense.
This leads to the second lesson:

* Instead of making it impossible to do the wrong thing, make it easy to do the right thing

Making it harder to build an incorrect tree makes it harder to build a correct tree. Making it easier to build a correct
tree makes it easier to build an incorrect tree. In the latter case, we can craft an interface that rules out the incorrect
cases, so in practise we still get the benefits of the former with none of the costs.

Finally, I will give some examples of the refactoring functions I mentioned in the abstract.


When I am talking about Haskell code that I wrote, I'll include short code samples. I won't be teaching the audience what
the code does; it is there as a rhetorical device to contrast complexity and simplicity. For example, the audience won't
have to know how GADTs and DataKinds work to be able to see that:

data Test (assign :: Assignability) (context :: ExprContext) :: * -> * where ...
data OrTest (assign :: Assignability) (context :: ExprContext) :: * -> * where ...
data AndTest (assign :: Assignability) (context :: ExprContext) :: * -> * where ...
data OrExpr (assign :: Assignability) (context :: ExprContext) :: * -> * where ...
data AndExpr (assign :: Assignability) (context :: ExprContext) :: * -> * where ...
data ShiftExpr (assign :: Assignability) (context :: ExprContext) :: * -> * where ...
et cetera

is more complex than

data BinOp = Or | And | BitOr | BitAnd | BitShift
data Expr a
= ...
| Operator a BinOp Expr Expr

Learning Outcome

Increased awareness of the challenges involved with designing a good language tools library and how to overcome them.

Target Audience

Programmers who are designing APIs, and particularly language tooling. Programmers who write Python.

schedule Submitted 1 year ago

Public Feedback

comment Suggest improvements to the Speaker