Friday, February 16, 2007

Understanding Monads. For real.

Again an article about the "what are monads?" topic? Aren't there enough on the net already? Maybe. But since I've read a lot of them and still had problems to really 'grok' the concept, I suspect that other may have those problems too.

I'm not a mathematician. I've studied physics and while I learned a lot mathematics in the process it's a totally different way of learning mathematics than directly studying mathematics: In physics mathematics is an important tool which always have to be backed by reality. If you simply calculate something strictly by using mathematical rules you often get physically nonsensical results. So you always need to have some 'image of reality' back in your mind.

In mathematics on the other hand those 'images' are less important and sometimes even counter-productive. If an image is to much tied to our image of the world, it can be quite limiting. Using such an image can even prevent finding more uses for a certain abstraction. So mathematicians have to learn to work with the abstractions itself without using an image, because every image could limit the use of an abstraction.

But if we want to apply mathematics in some domain we need the image back. For a mathematician listing the monadic rules may be enough to know about the structure which is created by those rules, but for non-mathematicians which haven't been trained to think this way, it's not. And after all there really is an image which shows what a monad (as used in functional programming) really is:

A monad is like a 'macro': All it does is a code-transformation.

I know, this notion has been used in some articles about the topic, but often only casually along the way. And if you're a mathematician this is really nothing more than 'uninteresting applicative stuff', but if you're a programmer which has to use monads in some real application, you simply need something more real, something you can relate to. Just a set of mathematical rules isn't enough. So why not think of a monad as some kind of 'functional-macro'? As programmer this should be a rather familiar picture. Lets elaborate a bit on this:

What is a macro? It's something which get some data and dices and splices them. But before it can do its dicing and splicing, there need to be some slicing - which is done by the macro-processor.

Let's look for example at the C-preprocessor as an example of a very primitive (but widely used) macro facility:

#define m1(x, y) x y y
m1(a, + 1)

Now m1 is read here by the preprocessor which recognizes it as a macro and slices the following text between the parenthesis at the ',' and feeds those text-slices into the macro. This creates the new text ("a + 1 + 1") from it which is then feed back into the preprocessor as new input (so that macro expansion can happen again on the resulting text).

In Lisp it's a bit more sophisticated because the macro processor works on s-exprs and not on plain text (yes, I know about reader macros, but let's concentrate on standard macros here). If we have a piece of code like:

(some-function (m2 x (+ y 1)) z)

and 'm2' is a macro, then the macro-processor does some slicing. The resulting pieces are s-exprs, namely "x" and "(+ y 1)" which are then feed into the macro 'm2'. The result is then put there where the (m2 ...) was and is evaluated again.

And in Haskell? There the 'slicer' is the 'do-statement'. If we write

v1 <- stmt1
v2 <- stmt3

then the compiler slices the content into separate statements and wraps them into closures. It then put function calls around it (using a function called 'bind' or in '>>=' in Haskell). The result of this transformation is than again used as new input.

The difference to the other macro processors is that the evaluation (the 'splicing and dicing') is done by the 'bind' function at runtime now. This has the advantage that the type checker has ran and by using type information the bind-function can be overloaded. So instead of providing an explicit name for the macro like in the C and Lisp examples above, the concrete macro is chosen by the type of the first statement in the do-block now.

And how can it be assured that the whole block is translated by the same macro? That's the task of the type-checker: By giving the bind-function the type

bind(M<a>, A -> M<b>) -> M<b>

the result of a bind-function has to be of the same type as the input. The monad can be parametrized by a parameter and this parameter can be changed by the bind-function. But the monad is still the same. This ensures that all statements are evaluated 'in the same monad' - or in other words: That all statements in a do-block are subject to the same kind of code transformation.

The type 'M' is often also called 'the monad'. And it's reasonable: In Lisp the macro is chosen only by a name and so we would call the above the 'm2-macro'. But in Haskell the choosing is done by type and thus the type of the monad gives the macro/monad its name. Thus List-monad, Maybe-monad, etc. But the monad isn't just the type, its also the implementation of the bind function (and there is also have to be a 'return' function), because without a bind-function it simply wouldn't do anything. The combination of type, bind and return functions together are needed to build a macro - and so all those things together are called 'a monad' (like the Lisp macro is not only the name of the macro but also the code which does the code transformation).

So that's it: A monad is a code-transformer. Like a macro. It's simple to remember: Both names even start with a 'M'.

While the fundamental entity of the C preprocessor is simple text, in Lisp it's a s-expr. And in Haskell this fundamental entity is a function. Everything in pure functional programming can be expressed by it. And so it's no wonder that in Haskell 'macros' get functions as input. That's all the magic. Of course we could also use the bind function directly and write code using '>>=' manually, but 'do' makes it often much more easy to read and write the code.

I put the '' around the word 'macro' because Monads are not really macros by definition. So if you want to be picky you can find reasons why this picture isn't totally correct. But does this really matter? Or isn't it more important to have a useful way of thinking if we deal with an abstraction?

What are those differences? The main one is that input to 'real' macros hasn't to be valid code. The C-preprocessor accepts everything which is text and Lisp macros accepts all kinds of s-exprs. But monads only work on valid Haskell-code. They can only change the runtime semantics of it. And the syntax itself isn't changeable too, because one always has to obey the syntax of the do block. So a monad is not a macro in the sense that you can create a real new syntax (like in Lisp). You can only create new semantics.

Also all kinds of macros have their limitations. Lisp macros sometimes need a code-walker, the C preprocessor is quite limited, too and also Monads have their limits.

And those limits are in fact quite severe which lead to the invention of more able constructs like 'Arrows'. The core of limitation is that a bind-function can't look '
'into' its second argument. It we have a do statement the above:

v1 <- stmt1
v2 <- stmt3

the compiler transforms it into the following code ("\v -> ..." is the syntax for a closures here):

\v1 -> bind(stmt2,
\_ -> bind(stmt3,
\v2 -> stmt4)))

(The '_' parameter is used in 'dummy-assignments' which are created if we used no explicit assignment) [Edit: Corrected error]

If we now look at the first 'bind', it takes 'stmt1' and a closure. Now this bind can do lots of things depending on the result value of stmt1 but if has no clue, what its second parameter (the closure) returns until it evaluates it. And thus it has no possibility to look into the later bind-functions. This is a severe limitation: It's impossible to create a transformation which analyzes the whole do-block before evaluating it.

So it's for example impossible to create an LALR-monad which transforms it's statements into a LALR-parser. It seems possible to simply define those statements as 'actions' which return a value to instruct the bind-functions to build the right code, but this would be quite limited because we can't add semantic actions this way: The result of the evaluation of the second bind-parameter has not only to contain the monad itself, but also the result of the semantic action of the parser. And this is only possible if we evaluate both in one turn.

The next problem is binding values to 'variables'. Monads simply use the normal functional way of doing binding via lambdas. The 'var <- ...' syntax simply creates a closure with 'var' as parameter which is then visible in all the levels 'below'. This works fine if the structure of the resulting code is similar to the structure of the input code, but it makes it for example impossible to transform a do-block into code which executes backwards from the last statement to the first one.

So while monads are quite powerful to create certain kinds of abstractions (= code transformations) they can't do everything. But nonetheless, that's really it: Monads are code transformations. This is also the reason why monads seem to be a bit difficult to understand: Each monad create a new language. The Maybe-monad creates a simple 'first-fail-fails-the-whole-computation'-language, the List-monad creates a simple 'try-out all combinations'-language, the State-monad a 'Haskell-with-mutation'-language, etc.

The transformations necessary from ordinary 'one statement after each other'-form written in a do-block to the resulting code can by quite difficult to comprehend: Simply because such a code transformation can be quite different from the input code in the end.

And because we have to learn new language semantics for every new monad, it's no wonder that the concept seems to be a bit hard to grasp. We may know the semantics of basic Haskell, but for every monad we have to learn a new language. Again and again. And if we don't even know that a monad creates a new language, understanding this becomes much more difficult, too.

But at least this shouldn't be a problem anymore now.

To deepen our image let's now look at some examples now. I will use only basic Haskell syntax here or even 'Algol-like' syntax to make it better understandable.

Some of the most simple monad is the 'Maybe monad'. Lets write something like

x <- func1 ...
y <- func2 ...
func3 ...
return (x + y)

Here all three functions 'func1, ..., func3' should returns a 'Maybe' value which can be 'Just x' or 'Nothing'. Because 'Maybe' is a monad and we've used the 'do' syntax, this code is transformed into something like this:

tmp1 = func1(...)
if isNothing(tmp1) then
return Nothing
x = fromJust(tmp1)
tmp2 = func2(...)
if isNothing(tmp2) then
return Nothing
y = fromJust(tmp2)
tmp3 = func3(...)
if isNothing(tmp3) then
return Nothing
return Just(x + y)

This looks a bit ugly, but shows what's happening: Each statement in the monad is transformed into a if-then-else expression in a way that the first statement which return 'Nothing' aborts the whole block and let it return nothing too.

We could also say 'The maybe monad is an abstraction to represent computations which can fail'. True, thats what macros do: Create new abstractions. Without remembering that a monad is just a kind of macro this sentence would sound quite 'arcane'. But now as we know the secret name of monads, the esoteric flair is gone and we see that they are something which we all know quite well. So the maybe-monad is nothing more than a mechanism which translate those innocent looking statements in a do-bloock above into a nested if-then-else-chain like the one below.

This works for other monads too. The list-monads transforms a linear list of statements into into nested map-functions. Sure, a mathematician may say something like 'The list monad represents non-deterministic-computation'. But in the end all it does is to transform this:

a <- [1, 2, 3]
b <- [4, 5]
return (a + b)

into this:

concatMap (\a -> concatMap (\b -> [a + b]) [4, 5]) [1, 2, 3]

concatMap maps list elements over a function like the normal map, but concatenates the results to a single list. This allows to return any number of values in each invokation. [Edit: fixed error here (mistakenly used fmap instead of concatMap)].

If you're not that familiar with Haskell, the above works like this imperative code:
result = []
foreach a in [1, 2, 3]
foreach b in [4, 5]
result = append(result, [a + b])


But we can do more complex ones, too. One of this 'a bit more complex' transformations is the state-monad. What we want to do is something like this:

value <- get
put (value + 1)
value <- get
return value

Here we have 'commands' which do something which isn't possible in plain functional programming: Reading and mutating a 'state'. With 'get' we get the actual state and with 'put' we can store a new value as the current state. The state in the above example is a simple numeric value, but since we can use arbitrary structures as values too, it allows that a state consists of multiple values.

But how does this work? We all know that pure functional programming don't allow something like mutation. But to store a new state we need exactly this. To make it possible we need to chain our state thru all relevant function calls. This can be done like this:

function get(state)

function put(state, value)

let (state', value) = get(state) in
let (state'', _) = put(state', value + 1) in
let (state''', value') = get(state'') in

Wow, this looks ugly. We need a new 'fresh' name for each new version of 'state' and 'value' and we also have to chain it manually thru the calls of put and get. But this method allows the simulation of mutation without requiring real mutation.

To make this more readable we can now use a monad to transform the original code to the code above. We can't do the calculation directly this time because the in the example below, the code first needs some initial 'state'-value which is than chained thru the calls. So instead of calculating the result directly, we let the monad create a new function. This function can then take the initial state as a parameter and will call the generated code. And this creates the result then. So this is the first case of a monad doing real code-transformation.

To create a new monad we first need a new type. We simply call the type 'State':

data State st t = State (st -> (st, t))

This is out monad. It takes two type parameters: The type 'st' of the state to carry and the return type 't'. This return type can vary for each statement which is a constraint of the monad type.

The inhteresting part here is that the monad don't carries the state itself around, but a closure which takes the old state and returns a tuple of the new state and a result value. This closure is also called an 'action', because it encapsulates the action defined by the statement.

Now we create the bind and return functions for this type:

getCode (State code) = code

instance Monad (State st) where
(>>=) prev next = State $ \st -> let (st_next, res) = (getCode prev) st
in (getCode (next res)) st_next

return v = State $ \st -> (st, v)

The 'getCode' function simply returns the actual 'code' with is stored in our monad. 'return' is simple: It creates an action which takes a state and returns the same state and the return value. The bind function (here named '>>=') takes the previous monad 'prev' and a function 'next' which will return the next monad. It now creates an action which first evaluates the code in the prev-value with the actual state. Then it uses the result to call the 'next'-function which in turn creates the next monad in the chain. This next monad is then again evauluated, but this time with the new state, the prev-monad returned.

This chains the state thru the actions. First the initial state thru the 'first' monad creating a new state. And then this new state thru the result of the 'next'-monad, createing the final state (which is then evaluated by the calling bind-function etc.).

Now we can build out 'set' and 'get' functions. This is quite straight-forward. The 'get' simply uses the actual state-value as return value:

get :: State st st
get = State $ \st -> (st, st)

And the 'set'-function ignores the previous state and creates a new one. It also returns the state in turn to allow assignments like 'x <- set (x + 1)'. This isn't necessary but convenient.

set :: t -> State t t
set val = State $ \_ -> (val, val)

That's it. Out state-monad. Now lets create a simple do-block to use it:

test1 = do
x <- get
set (x + 4)
x <- get
set (x * 3)
x <- get
return (x-5)

Here we can see, that the first statement is a get. But where does the state come from which is returned by 'get'? Simple: If we call 'test1', we don't get the real return value, but a closure we have to evaluate first with the initial state. Let's do this:

main = (getCode test1) 0

'test1' returns a State-monad. We first have to get the code to call out of the monad by calling 'getCode' again. This code can now simply be evaluated by calling it with our initial state (a 0 in this case). As the result we will get a tupel with the value (12,7) in this case. The first value is our last state, the second is the real result (as returned by 'return (x - 5)'). Both values make sense, so our monad seems to work correctly.

Now lets take a look under the hood of the above:

The do-block above first creates the follwing code:

bind(get, \x ->
bind(set (x + 4), \_ ->
bind(get, \x ->
bind(set (x * 3), \_ ->
bind(get, \x ->
return (x-5))))))

The bind function now does the real code-transformation. I've tried to write down the resulting closure if we only expand calls to bind, return, get and set, but it was simply to long and crumbersome. Lets do it instead for a simplified version of the above:

x <- get
return x*2

this is rewritten into

bind(get, x -> return(x*2))

which if we evaluate bind, get and return and pull out the code from the resulting monad, creates the following closure:

\st ->
let (st', x) = \st -> (st, st) -- our 'get' statement
in (x -> (\st -> (st, x*2))) st' -- our 'return' statement

Again we see, that the monad simply does code transformation in the end, so the image of looking at monads as code transformations holds. So even if it looks a bit wierd, in the end the state monad really does the transformation we started with.


Cale Gibbard said...

I'd just like to point out that pure mathematicians really do make use of quite of a lot of imagery and intuition regarding the abstractions they work with. It's pretty essential for finding one's way around in order to prove theorems. (And I feel it's one of the things which makes humans so much better than machines at general theorem proving for the time being.)

Setting that tangent aside, yes, this is basically the whole idea of monads in programming. Capture a wide variety of functional programming idioms, and abstract them away behind a common interface. The interface is specific enough to be meaningful, and yet general enough to include many interesting examples. This is also the idea behind Hughes' Arrows, comonads, applicative functors, and so on. :)

Another similar way to look at it is that for a long time, functional programmers have been writing combinator libraries, which are basically domain specific languages. Monads (and these other abstractions) take a few of the more common components in those libraries, and abstract over them so that code can be reused across the libraries.

There's a reason why one of the early papers on monads referred to them as "The Essence of Functional Programming".

Good work, you got it. :)

Karsten Wagner said...

Thanks :)

About the 'tangent': Yes, I know that mathematicians like to draw little images too and the names of certain mathematical concepts are also quite 'pictographic' (think of 'fiber-bundle' for example). But I remember that in university I was much more successful in mathematics after I stopped thinking in images and started to simply apply the rules to build proofs from them (and of course being awfully pedantic about applying those rules). Mathematical imagery can really lead someone astray sometimes.

In physics its generally quite the opposite. Of course it depends on the field you're working in: In geometrics it's much simpler to create useful images than in number-theory.

Unknown said...

Thanks a lot for this post. It was the best explanation of Monads I have read so far.

Barry Kelly said...

Finally, a cogent explanation. Excellent work. I understood already that functional programming needs to thread state through a program by calling functions recursively, to "update" the parameter values, and that this can work through e.g. tail call so that it doesn't use unbounded stack. Now I understand the State monad here as implementing this pattern, along with 'do' being a simple code transformation, and the implicit monadic bind at the end of each line being an overloadable operator. Everything's becoming clear :)

Anonymous said...

I've been trying to grok monads for two weeks and the comparison to C/Lisp macros is by far the best I've come across.


Anonymous said...

ghc doesn't like the main function in this article, claims its mistyped..

Anonymous said...

Thank you very much for this. I finally get it.

Well, I think I do :)

Anonymous said...

Thanks. It is the most clear article I found in the internet.

Anonymous said...

Thanks. It is the most clear article I found in the internet.

James Pike Friend of Cats said...

Best monad tutorial I've read so far.

I don't get to use Haskell often and when I finally get some time I always have to re-remind myself what monads are, so I've read quite a few of these.

I'll book this one for next time, well just in case ;)

Oh and Thanks!