miércoles, 25 de noviembre de 2015

Haskell and Type-Safe Routing

Haskell and Type-Safe Routing

This a reflection on my experiences with different approaches to type-safe routing in haskell. In each section, I will explain a technique and evaluate its merits and its detriments. Some of these (yesod's routing, boomerang, servant) are real libraries and others are mere thought experiments. They are presented roughly in order of increasing complexity. Let us begin.

Conventions

In the explanation of each technique, we will demonstrate how the routing and dispatch for the following routes would be built:

  • AllR: /products/:Bool
  • DetailsR: /product/:ProductId/view
  • BuyR: /buy/:ProductId/as/:UserId

The routes have the following functions:

  • AllR: Show all of the products, and if the boolean is true, only show recently added products.
  • DetailsR: View the details for a specific product
  • BuyR: Purchase a specific product as a specific user

Notice that route names end in an uppercase R. This is not required by any of the techniques I am going to discuss, but it is a common practice in yesod. I have found it to be helpful when working with other routing libraries, so it's a convention I'm going to use in this post.

Notice that the routes do not use Int for the Ids but instead use the following newtype wrappers:

newtype UserId = UserId Int
newtype ProductId = ProductId Int

This means that whenever we discuss a typeclass being used for path pieces, we will used GeneralizedNewtypeDeriving to provide instances for these.

Algebraic Data Types as Routes

The most straightforward approach to type-safe routing is to build an ADT:

data MyRoute
  = AllR Bool
  | DetailsR ProductId
  | BuyR ProductId UserId

Even though this isn't strictly necessary at this point, we're also going to create a typeclass for path pieces. This will help cut down on the noise introduced by the newtype wrappers:

class Piece a where
  toPiece :: a -> String
  fromPiece :: String -> Maybe a

instance Piece Bool where
  toPiece True  = "yes"
  toPiece False = "no"
  fromPiece "yes" = Just True
  fromPiece "no"  = Just False
  fromPiece _     = Nothing

instance Piece Int where
  toPiece = show
  fromPiece = readMaybe

deriving instance Piece UserId
deriving instance Piece ProductId

Next, we need to be able to render a route. So, we would write:

renderMyRoutePieces :: MyRoute -> [String]
renderMyRoutePieces r = case r of
  AllR b -> ["products",toPiece b]
  DetailsR pid -> ["product",toPiece pid,"view"]
  BuyR pid uid -> ["buy",toPiece pid,"as",toPiece uid]

Cool. And if you wanted to put the pieces together, you could build another function on top of it like this:

renderMyRoute :: MyRoute -> String
renderMyRoute = ('/' :) . intercalate "/" . renderMyRoutePieces

And now, in GHCi, we could type:

> renderMyRoute (DetailsR (ProductId 44))
/product/44/view
> renderMyRoute (AllR False)
/products/no

So far, there is nothing particularly painful about this approach. Let's see what happens when we try to handle parsing a route:

parsePieces :: [String] -> Maybe MyRoute
parsePieces ps = case ps of
  ["products",str1]       -> AllR <$> fromPiece str1
  ["product",str1,"view"] -> DetailsR <$> fromPiece str1
  ["buy",str1,"as",str2]  -> BuyR <$> fromPiece str1 <*> fromPiece str2
  _                       -> Nothing

And just like before, we could make a real parser that works on String instead of String. Instead of intersperse, we would use splitOn (from the split package). However, I'm going to skip doing the trivial plumbing:

parse :: String -> Maybe MyRoute
parse = ...

Dispatching is dead easy after we have parse. To keep it simple, I'm just going to have dispatch produce an IO ():

dispatch :: MyRoute -> IO ()
dispatch (BuyR pid uid) = putStrLn $
  "Buying product " ++ toPiece pid ++ " as user " ++ toPiece uid
dispatch (AllR b) = ...
dispatch (DetailsR pid) = ...

And then we can fuse parsing and dispatching together with:

parseAndDispatch :: String -> IO ()
parseAndDispatch s = case parse s of
  Just route -> dispatch route
  Nothing -> putStrLn "Bad route"

Cool. Let's try it out:

> parseAndDispatch "/foo/bar"
Bad route
> parseAndDispatch "/buy/44/as/12"
Buying product 44 as user 12

And that's it. We have now created the most naive solution to type safe routing. This is going to be a sort of baseline for the rest of the discussion. Let's evaluate its merits:

  • Type safety when creating routes
  • Since the Piece typeclass was merely for convenience, it is possible to parse/render the same type of piece differently in different routes. This flexibility is seldom needed, but it is available with the approach.

And now the detriments:

  • Adding a new route requires modifying code in at least four places (add data constructor, add MyRoute pattern matches to renderMyRoutePieces and dispatch, add list pattern match to parsePieces). If the dispatch function delegates work to a specially-named "handler" function for each route (which is almost always needed), it's five places.
  • Logic for parsing/rendering is duplicated.

The last detriment is pretty bad. In fact, it's the main problem that most of haskell's routing libraries try to solve. Let's look into the problem a little more. In renderMyRoutePieces, we have

BuyR pid uid -> ["buy",toPiece pid,"as",toPiece uid]

And in parsePieces, we have:

["buy",str1,"as",str2] -> BuyR <$> fromPiece str1 <*> fromPiece str2

These two pattern matches are eerily similar to one another. In fact, if I showed you only one of them, you should be able to tell me exactly what the other is supposed to look like. That's because they should be inverses of one another. If they don't have this property, then the routes that we render won't be correct. In general, the rendering function and the parsing function should satisfy this property:

parsePieces (renderMyRoutePieces r) 
  =
Just r

But the way that this routing technique works, it's completely possible to get it wrong. What we want is to write a single specification that can be used to generate both the renderer and the parser. Let's bring out the hammer.

Yesod Routes

Yesod's approach to routing is to use template haskell. This is probably the least elegant but most immidiately understandable solution to the problem. Why? Because it does almost the same thing that we just did. The differences are as follows:

  • The parse and dispatch step are unified. The route is parsed and then dispatch is called all in one step, without first recreating the MyRoute data type. This means that with yesod, there is neither an equivalent the parse function nor the dispatch function but only to the parseAndDispatch function.
  • A template haskell splices generates (1) the MyRoute data type declaration, (2) the route rendering function, and (3) the parse/dispatch function.
  • There is a typeclass named PathPiece which is equivalent to our Piece typeclass above. Any value that you capture in a route have a PathPiece instance.

That's basically it. Let's look at an example (we're going to use the same routes as before):

{-# LANGUAGE OverloadedStrings, QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell, TypeFamilies  #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import           Yesod

data MyApp = MyApp

newtype UserId = UserId Int
  deriving (Show,Read,Eq,PathPiece)
newtype ProductId = ProductId Int
  deriving (Show,Read,Eq,PathPiece)

mkYesod "MyApp" [parseRoutes|
/products/#Bool            AllR     GET
/product/#ProductId/view   DetailsR GET
/buy/#ProductId/as/#UserId BuyR     GET
|]

instance Yesod MyApp

getAllR b       = defaultLayout ...
getDetailsR pid = defaultLayout ...
getBuyR pid uid = defaultLayout ...

main = warp 3000 MyApp

There are some other features (there's a PathPieceMulti, there are subsites, etc.), but all that's really happening is something similar to what we wrote earlier. Let's take a look at some of the strengths of this approach:

  • It's now impossible to mess up the inversion issue with routing/parsing. One specification is used to build both. This is the biggest advantage.
  • Adding new routes is easier. Notably, we've gone down from needed to change five places to needing to change two places.

Now, the detriments:

  • We don't have a route parsing function that we can use anymore. Not too bad, since you likely only use it for dispatching anyway.
  • Magic. Magic is seldom good in programming. Things should follow a consistent set of rules, and you shouldn't need to learn new rules when you use new libraries. You do have to learn new rules to use yesod routing. Notice how in the example above, we had to prefix all of the handler names with the word get (getDetailsR, etc.). This is because the dispatch function (generated by template haskell) expects functions with those names to be in scope. If we instead had defined AllR with /products/#Bool AllR POST, then we would have had to define postAllR instead of getAllR. But, if we had defined it as /products/#Bool AllR (with no method at the end), then we would need to define handleAllR. Fortunately, yesod has pretty good documentation in the form of a cookbook, so it's not like these rules are unlearnable, but they are something you have to remember.
  • When you're using a complicated quaiquoters, the normal syntactical rules for haskell go away. For example, if AllR was supposed to take a Maybe Int instead of a Bool, what would you do? You might try /products/#Maybe Int AllR GET, but that doesn't work. You have to use hyphenation to apply a type: /products/#Maybe-Int AllR GET.

Now, I've been really hard on yesod in the detriments section, so I figured I would clarify something. I use yesod (with the template haskell routing) all the time at work in about half a dozen projects. I'm a contributor to yesod. Even though the routing uses template haskell and has tricky rules, if you learn what's going on, it's fine for most things. I know that it "doesn't compose", but for the most part, that hasn't been a problem for me.

Another thing to keep in mind is that yesod was around when GHC 7.2 was released. So a lot of the fancier type level tricks weren't possible back then.

However, returning to my original "tough on crime" stance, using template haskell in the way that yesod does makes things hard to understand and hard to change. Auto-generated documentation like haddock doesn't help because it can't know what the splice is going to bring into scope. There are other ways to use template haskell that are more restricted in scope and easier to understand. Notably, lens and aeson make use of template haskell in a way that reduces broilerplate and remains easy to understand.

I'm going to break this up into multiple parts because this is taking longer than I thought it would. In future installments, expect to see:

  • Routing with Applicative Functors (which don't work, but I figured the mental exercise would be nice).
  • Boomerang (maybe)
  • Servant
  • Serpentine (my own attempt at dependently typed routing)