Friday, February 16, 2007

Understanding Monads. For real.

Again an article about the "what are monads?" topic? Aren't there enough on the net already? Maybe. But since I've read a lot of them and still had problems to really 'grok' the concept, I suspect that other may have those problems too.

I'm not a mathematician. I've studied physics and while I learned a lot mathematics in the process it's a totally different way of learning mathematics than directly studying mathematics: In physics mathematics is an important tool which always have to be backed by reality. If you simply calculate something strictly by using mathematical rules you often get physically nonsensical results. So you always need to have some 'image of reality' back in your mind.

In mathematics on the other hand those 'images' are less important and sometimes even counter-productive. If an image is to much tied to our image of the world, it can be quite limiting. Using such an image can even prevent finding more uses for a certain abstraction. So mathematicians have to learn to work with the abstractions itself without using an image, because every image could limit the use of an abstraction.

But if we want to apply mathematics in some domain we need the image back. For a mathematician listing the monadic rules may be enough to know about the structure which is created by those rules, but for non-mathematicians which haven't been trained to think this way, it's not. And after all there really is an image which shows what a monad (as used in functional programming) really is:

A monad is like a 'macro': All it does is a code-transformation.

I know, this notion has been used in some articles about the topic, but often only casually along the way. And if you're a mathematician this is really nothing more than 'uninteresting applicative stuff', but if you're a programmer which has to use monads in some real application, you simply need something more real, something you can relate to. Just a set of mathematical rules isn't enough. So why not think of a monad as some kind of 'functional-macro'? As programmer this should be a rather familiar picture. Lets elaborate a bit on this:

What is a macro? It's something which get some data and dices and splices them. But before it can do its dicing and splicing, there need to be some slicing - which is done by the macro-processor.

Let's look for example at the C-preprocessor as an example of a very primitive (but widely used) macro facility:

#define m1(x, y) x y y
m1(a, + 1)

Now m1 is read here by the preprocessor which recognizes it as a macro and slices the following text between the parenthesis at the ',' and feeds those text-slices into the macro. This creates the new text ("a + 1 + 1") from it which is then feed back into the preprocessor as new input (so that macro expansion can happen again on the resulting text).

In Lisp it's a bit more sophisticated because the macro processor works on s-exprs and not on plain text (yes, I know about reader macros, but let's concentrate on standard macros here). If we have a piece of code like:

(some-function (m2 x (+ y 1)) z)

and 'm2' is a macro, then the macro-processor does some slicing. The resulting pieces are s-exprs, namely "x" and "(+ y 1)" which are then feed into the macro 'm2'. The result is then put there where the (m2 ...) was and is evaluated again.

And in Haskell? There the 'slicer' is the 'do-statement'. If we write

v1 <- stmt1
v2 <- stmt3

then the compiler slices the content into separate statements and wraps them into closures. It then put function calls around it (using a function called 'bind' or in '>>=' in Haskell). The result of this transformation is than again used as new input.

The difference to the other macro processors is that the evaluation (the 'splicing and dicing') is done by the 'bind' function at runtime now. This has the advantage that the type checker has ran and by using type information the bind-function can be overloaded. So instead of providing an explicit name for the macro like in the C and Lisp examples above, the concrete macro is chosen by the type of the first statement in the do-block now.

And how can it be assured that the whole block is translated by the same macro? That's the task of the type-checker: By giving the bind-function the type

bind(M<a>, A -> M<b>) -> M<b>

the result of a bind-function has to be of the same type as the input. The monad can be parametrized by a parameter and this parameter can be changed by the bind-function. But the monad is still the same. This ensures that all statements are evaluated 'in the same monad' - or in other words: That all statements in a do-block are subject to the same kind of code transformation.

The type 'M' is often also called 'the monad'. And it's reasonable: In Lisp the macro is chosen only by a name and so we would call the above the 'm2-macro'. But in Haskell the choosing is done by type and thus the type of the monad gives the macro/monad its name. Thus List-monad, Maybe-monad, etc. But the monad isn't just the type, its also the implementation of the bind function (and there is also have to be a 'return' function), because without a bind-function it simply wouldn't do anything. The combination of type, bind and return functions together are needed to build a macro - and so all those things together are called 'a monad' (like the Lisp macro is not only the name of the macro but also the code which does the code transformation).

So that's it: A monad is a code-transformer. Like a macro. It's simple to remember: Both names even start with a 'M'.

While the fundamental entity of the C preprocessor is simple text, in Lisp it's a s-expr. And in Haskell this fundamental entity is a function. Everything in pure functional programming can be expressed by it. And so it's no wonder that in Haskell 'macros' get functions as input. That's all the magic. Of course we could also use the bind function directly and write code using '>>=' manually, but 'do' makes it often much more easy to read and write the code.

I put the '' around the word 'macro' because Monads are not really macros by definition. So if you want to be picky you can find reasons why this picture isn't totally correct. But does this really matter? Or isn't it more important to have a useful way of thinking if we deal with an abstraction?

What are those differences? The main one is that input to 'real' macros hasn't to be valid code. The C-preprocessor accepts everything which is text and Lisp macros accepts all kinds of s-exprs. But monads only work on valid Haskell-code. They can only change the runtime semantics of it. And the syntax itself isn't changeable too, because one always has to obey the syntax of the do block. So a monad is not a macro in the sense that you can create a real new syntax (like in Lisp). You can only create new semantics.

Also all kinds of macros have their limitations. Lisp macros sometimes need a code-walker, the C preprocessor is quite limited, too and also Monads have their limits.

And those limits are in fact quite severe which lead to the invention of more able constructs like 'Arrows'. The core of limitation is that a bind-function can't look '
'into' its second argument. It we have a do statement the above:

v1 <- stmt1
v2 <- stmt3

the compiler transforms it into the following code ("\v -> ..." is the syntax for a closures here):

\v1 -> bind(stmt2,
\_ -> bind(stmt3,
\v2 -> stmt4)))

(The '_' parameter is used in 'dummy-assignments' which are created if we used no explicit assignment) [Edit: Corrected error]

If we now look at the first 'bind', it takes 'stmt1' and a closure. Now this bind can do lots of things depending on the result value of stmt1 but if has no clue, what its second parameter (the closure) returns until it evaluates it. And thus it has no possibility to look into the later bind-functions. This is a severe limitation: It's impossible to create a transformation which analyzes the whole do-block before evaluating it.

So it's for example impossible to create an LALR-monad which transforms it's statements into a LALR-parser. It seems possible to simply define those statements as 'actions' which return a value to instruct the bind-functions to build the right code, but this would be quite limited because we can't add semantic actions this way: The result of the evaluation of the second bind-parameter has not only to contain the monad itself, but also the result of the semantic action of the parser. And this is only possible if we evaluate both in one turn.

The next problem is binding values to 'variables'. Monads simply use the normal functional way of doing binding via lambdas. The 'var <- ...' syntax simply creates a closure with 'var' as parameter which is then visible in all the levels 'below'. This works fine if the structure of the resulting code is similar to the structure of the input code, but it makes it for example impossible to transform a do-block into code which executes backwards from the last statement to the first one.

So while monads are quite powerful to create certain kinds of abstractions (= code transformations) they can't do everything. But nonetheless, that's really it: Monads are code transformations. This is also the reason why monads seem to be a bit difficult to understand: Each monad create a new language. The Maybe-monad creates a simple 'first-fail-fails-the-whole-computation'-language, the List-monad creates a simple 'try-out all combinations'-language, the State-monad a 'Haskell-with-mutation'-language, etc.

The transformations necessary from ordinary 'one statement after each other'-form written in a do-block to the resulting code can by quite difficult to comprehend: Simply because such a code transformation can be quite different from the input code in the end.

And because we have to learn new language semantics for every new monad, it's no wonder that the concept seems to be a bit hard to grasp. We may know the semantics of basic Haskell, but for every monad we have to learn a new language. Again and again. And if we don't even know that a monad creates a new language, understanding this becomes much more difficult, too.

But at least this shouldn't be a problem anymore now.

To deepen our image let's now look at some examples now. I will use only basic Haskell syntax here or even 'Algol-like' syntax to make it better understandable.

Some of the most simple monad is the 'Maybe monad'. Lets write something like

x <- func1 ...
y <- func2 ...
func3 ...
return (x + y)

Here all three functions 'func1, ..., func3' should returns a 'Maybe' value which can be 'Just x' or 'Nothing'. Because 'Maybe' is a monad and we've used the 'do' syntax, this code is transformed into something like this:

tmp1 = func1(...)
if isNothing(tmp1) then
return Nothing
x = fromJust(tmp1)
tmp2 = func2(...)
if isNothing(tmp2) then
return Nothing
y = fromJust(tmp2)
tmp3 = func3(...)
if isNothing(tmp3) then
return Nothing
return Just(x + y)

This looks a bit ugly, but shows what's happening: Each statement in the monad is transformed into a if-then-else expression in a way that the first statement which return 'Nothing' aborts the whole block and let it return nothing too.

We could also say 'The maybe monad is an abstraction to represent computations which can fail'. True, thats what macros do: Create new abstractions. Without remembering that a monad is just a kind of macro this sentence would sound quite 'arcane'. But now as we know the secret name of monads, the esoteric flair is gone and we see that they are something which we all know quite well. So the maybe-monad is nothing more than a mechanism which translate those innocent looking statements in a do-bloock above into a nested if-then-else-chain like the one below.

This works for other monads too. The list-monads transforms a linear list of statements into into nested map-functions. Sure, a mathematician may say something like 'The list monad represents non-deterministic-computation'. But in the end all it does is to transform this:

a <- [1, 2, 3]
b <- [4, 5]
return (a + b)

into this:

concatMap (\a -> concatMap (\b -> [a + b]) [4, 5]) [1, 2, 3]

concatMap maps list elements over a function like the normal map, but concatenates the results to a single list. This allows to return any number of values in each invokation. [Edit: fixed error here (mistakenly used fmap instead of concatMap)].

If you're not that familiar with Haskell, the above works like this imperative code:
result = []
foreach a in [1, 2, 3]
foreach b in [4, 5]
result = append(result, [a + b])


But we can do more complex ones, too. One of this 'a bit more complex' transformations is the state-monad. What we want to do is something like this:

value <- get
put (value + 1)
value <- get
return value

Here we have 'commands' which do something which isn't possible in plain functional programming: Reading and mutating a 'state'. With 'get' we get the actual state and with 'put' we can store a new value as the current state. The state in the above example is a simple numeric value, but since we can use arbitrary structures as values too, it allows that a state consists of multiple values.

But how does this work? We all know that pure functional programming don't allow something like mutation. But to store a new state we need exactly this. To make it possible we need to chain our state thru all relevant function calls. This can be done like this:

function get(state)

function put(state, value)

let (state', value) = get(state) in
let (state'', _) = put(state', value + 1) in
let (state''', value') = get(state'') in

Wow, this looks ugly. We need a new 'fresh' name for each new version of 'state' and 'value' and we also have to chain it manually thru the calls of put and get. But this method allows the simulation of mutation without requiring real mutation.

To make this more readable we can now use a monad to transform the original code to the code above. We can't do the calculation directly this time because the in the example below, the code first needs some initial 'state'-value which is than chained thru the calls. So instead of calculating the result directly, we let the monad create a new function. This function can then take the initial state as a parameter and will call the generated code. And this creates the result then. So this is the first case of a monad doing real code-transformation.

To create a new monad we first need a new type. We simply call the type 'State':

data State st t = State (st -> (st, t))

This is out monad. It takes two type parameters: The type 'st' of the state to carry and the return type 't'. This return type can vary for each statement which is a constraint of the monad type.

The inhteresting part here is that the monad don't carries the state itself around, but a closure which takes the old state and returns a tuple of the new state and a result value. This closure is also called an 'action', because it encapsulates the action defined by the statement.

Now we create the bind and return functions for this type:

getCode (State code) = code

instance Monad (State st) where
(>>=) prev next = State $ \st -> let (st_next, res) = (getCode prev) st
in (getCode (next res)) st_next

return v = State $ \st -> (st, v)

The 'getCode' function simply returns the actual 'code' with is stored in our monad. 'return' is simple: It creates an action which takes a state and returns the same state and the return value. The bind function (here named '>>=') takes the previous monad 'prev' and a function 'next' which will return the next monad. It now creates an action which first evaluates the code in the prev-value with the actual state. Then it uses the result to call the 'next'-function which in turn creates the next monad in the chain. This next monad is then again evauluated, but this time with the new state, the prev-monad returned.

This chains the state thru the actions. First the initial state thru the 'first' monad creating a new state. And then this new state thru the result of the 'next'-monad, createing the final state (which is then evaluated by the calling bind-function etc.).

Now we can build out 'set' and 'get' functions. This is quite straight-forward. The 'get' simply uses the actual state-value as return value:

get :: State st st
get = State $ \st -> (st, st)

And the 'set'-function ignores the previous state and creates a new one. It also returns the state in turn to allow assignments like 'x <- set (x + 1)'. This isn't necessary but convenient.

set :: t -> State t t
set val = State $ \_ -> (val, val)

That's it. Out state-monad. Now lets create a simple do-block to use it:

test1 = do
x <- get
set (x + 4)
x <- get
set (x * 3)
x <- get
return (x-5)

Here we can see, that the first statement is a get. But where does the state come from which is returned by 'get'? Simple: If we call 'test1', we don't get the real return value, but a closure we have to evaluate first with the initial state. Let's do this:

main = (getCode test1) 0

'test1' returns a State-monad. We first have to get the code to call out of the monad by calling 'getCode' again. This code can now simply be evaluated by calling it with our initial state (a 0 in this case). As the result we will get a tupel with the value (12,7) in this case. The first value is our last state, the second is the real result (as returned by 'return (x - 5)'). Both values make sense, so our monad seems to work correctly.

Now lets take a look under the hood of the above:

The do-block above first creates the follwing code:

bind(get, \x ->
bind(set (x + 4), \_ ->
bind(get, \x ->
bind(set (x * 3), \_ ->
bind(get, \x ->
return (x-5))))))

The bind function now does the real code-transformation. I've tried to write down the resulting closure if we only expand calls to bind, return, get and set, but it was simply to long and crumbersome. Lets do it instead for a simplified version of the above:

x <- get
return x*2

this is rewritten into

bind(get, x -> return(x*2))

which if we evaluate bind, get and return and pull out the code from the resulting monad, creates the following closure:

\st ->
let (st', x) = \st -> (st, st) -- our 'get' statement
in (x -> (\st -> (st, x*2))) st' -- our 'return' statement

Again we see, that the monad simply does code transformation in the end, so the image of looking at monads as code transformations holds. So even if it looks a bit wierd, in the end the state monad really does the transformation we started with.

Friday, February 02, 2007

Why monads are 'evil'

This is an heavily updated version of a previous article with a somehow similar name. From the comments to this article I learned where the original article was misleading and partly even wrong. So I try to correct that with this updated version.

What is functional programming? Many people tend to think that having closures in a language makes that language a functional one. But by this definition almost every modern language would qualify as 'functional'. Even Java (because of Javas anonymous inner classes which are closures too). So if this can't be the qualifying property of a language, what else is it?

Of course it's "referential transparency": A 'function' is an special kind of relation which assigns values of a domain-set to values of a result-('codomain') set. To qualify as a function this mapping has to be unambiguous: For every element of the domain-set the function always gives the single, same result.

If a 'function' isn't referential transparent, this property isn't fulfilled and it's simply not a function anymore. It's something which is often called 'procedure'. We can argue if this property really has to be fulfilled rigorously by a language to qualify as 'functional', but I would say 'Yes, it has!'. The reason is that we can use procedures to create functions in every language (just by making sure that there are no side-effects), but to really call a language 'functional' it has to be assured by the compiler that every function is really a function. BTW: we can make every procedure a function by simply including the whole 'environment' as input to the procedure. With this little trick every procedure would now be a function and every programming language would be functional. Yes, this is nonsense - remember that for later.

But with this rigorous definition of the notion 'functional', there aren't many functional languages lest. Ocaml for example is clearly non-functional now. But even Haskell isn't. The reason is the IO-monad.

Do to I/O, Haskell uses something which is called 'I/O-monad'. If we write

main = do
x <- getStr
putStr ("you entered " ++ x)

the following happens: First the 'do' statement transforms the code by using a function called '>>=' (pronounced as 'bind').

getStr >>= (\x -> putStr ("you entered " ++ x))

(If we use the name 'bind' instead of '>>=' and the prefix form of function calls this would look like this:

bind(getStr, (\x -> putStr ("you entered " ++ x)))

The getStr function (which is in fact a constant and not even a function because it don't takes any parameters) is just a parameter for the 'bind'-function. It returns an 'action', a value of type 'IO String'. 'IO' is a special type here which simply encapsulates something and is parametrized with the type 'String' (in Java/C++ we would write this as IO). But if 'getStr' always gives the same value, how can it be used to input Strings from the console?

The answer is that 'getStr' doesn't do this. Its only a command to do it. And this command goes as first input into the 'bind'-function which executes it. The second paramter of the call is the 'to-do'-function. Its the code which is associated with the action and has to be called with the result of the action. The bind-function returns a value itself which is also a action. This allows us to use the result of a bind-function as first parameter in another bind-function. So those functions can be chained arbitrarily - and this is what the 'do'-syntax does (just in a more convenient way).

Back to our example: The bind-operator received the 'getStr'-action as input. This action instructs it to fetch a String from the console and call the to-do-function with it. Now this function again returns an action, this time it's a 'putStr' action. This 'putStr' action is again a constant, but it was created 'on the fly' from the putStr function which takes one parameter: The String to write out. The next bind operation is invisible, because it happens outside the main function in the REPL or compiler. But it's executed like the first bind and it uses the 'putStr' action to write the data out.

So it's the bind-function which isn't really referential transparent: If you apply it to the same action twice, it can call it's 'to-do' function with a different value. Now Haskell clever hides that because it don't allows anybody to examine the content of the actions: Because 'bind' always returns such an opaque action, its (theoretically) impossible for a program to see that two results are in fact different. And because Haskell allows no mutation, the to-do-function can't write this result somewhere outside the I/O-monad. But isn't that enough to ensure referential transparency? I would say no.

The reason is the same that I don't consider ordinary procedures as referential transparent: It's just a trick. The operational semantics of the whole mechanism are simply non-referential transparent, if Haskell hides it well or not. We can in principle write the whole program in the I/O monad and there is no difference to an ordinary imperative language anymore. So we should go with the 'duck-paradigmization': If it mutates like the imperative paradigm, is non-referential transparent like the imperative paradigm and has a fixed execution order like the imperative paradigm, I would call the it the imperative paradigm.

Lets look at an alternative approach to the functional I/O problem: We create a 'world'-value which contains the relevant data from the 'outside' (likes files, console input etc) and feed this value into a function. This function can now create some output by processing informations from this 'world'. By evaluating the 'world'-value lazily we can even create interactive programs, because the function simply stops the evaluation in the moment there are no new input values and continues evaluation in the moment we provide them (for example by typing something on the keyboard). With this concept a main function would look like this:

main :: world -> output

In a simple console application both 'world' and 'output' would simply be lists of characters. But for a more complex applications the 'world' could contain all kinds of mouse/keyboard/etc-events while the 'output' would contain for example drawing-commands.

Whats the difference to the monadic-I/O concept of Haskell? Couldn't we simply use this approach to use it to implement our I/O-monad? The interpreter of the I/O-actions would simply use such 'world'- and 'output'-values and use them by the actions the program provides. Aren't both concepts now simply identically, but easier to use in Haskell because the difficult stuff is well hidden inside the I/O-monad?

While this is true 'in principle' it's only true on the same level that all Turing-complete languages are identically 'in principle':

What the I/O monads does is creating a new sub-language with imperative semantics. Every code which runs 'in' the I/O-monad is in fact evaluated imperatively. This transformation is done by the 'bind' functions and hidden from view by the 'do'-construct. The 'do'-construct slices all the statements in the body into small pieces and feeds them into those bind-functions. Now their execution order isn't anymore in the statement-order but is the bind functions which choose to evaluate them (in any order, multiple times, or not at all). And the values those statements give and take is also controlled by those bind-functions because they provide them (as long as they are assigned with the '<-' operation).

So every code we have inside such a do-block can have nearly arbitrary semantics. It's a bit similar to Lisp macros but the transformation happens at runtime. And because of the chaining of bind-functions, semantics of such a block only depends on the type of the first statement in the do-block.

Think about this: By writing the right 'bind' functions we can create in principle every semantics we want. For example we can create a language with all the semantics of the Java language right into Haskell - we 'only' have to create the right monad. Sure, the syntax would be different from Java because the code is still parsed by the Haskell parser and needs to follow certain rules, but the semantics of this code could be identically to Java. With this 'Java-monad' we're now able to program in Haskell like we're using Java. But Java is an imperative, object-oriented language so nobody would say that we're still writing code in a functional programming language.

Using the I/O-monad is similar: It provides a new language with new semantics by doing runtime code-rewriting. It's not a functional language anymore, even if it's implemented inside a functional language. We simply have left 'functional land' if we use the I/O monad - and we can never return from it because the I/O-monad is grounded in the compiler, the outermost layer of every program. We can only call functional code from this imperative layer but this functional code can't do any I/O anymore.

But whats the difference to the explicit way of doing I/O? It's that we still have full control about what we're doing: We're working on the level of functions instead of creating actions which are somehow evaluated by an invisible interpreter. We have to supply the input values manually and we call real functions instead of building some action-values which are evaluated somehow. If we want we can slice the 'world' into pieces, supply only parts of it to functions and the result of those functions can be something arbitrary. And we can use all the normal functions instead of creating new 'monadized' ones.

Sure, we have to think more about certain things - but this is part of doing work in a certain paradigm. If we want to have the advantages of the paradigm we can't simply create a new sub-language in a different paradigm and expect to still have the advantages which are the reason why we used this paradigm in the first hand. If we want do do I/O with the I/O-monad we have to switch programming languages. We stop using to program in a shiny new functional way and are back in the boring old land of the imperative. Even worse: Because Haskell don't provides a different way of doing I/O, it's like a confession that functional programming can't do I/O. And all this only because of the 'evil' I/O-monad.

And there are other problems which apply to monads in general:

  • The performance seem to lack because of the runtime-code-translation: The translation costs time and memory and can sometimes even kill important properties like tail recursion (because the program we thought we wrote is not the program which is executed because of the monadic translation). If you compare Haskell with the Clean-language (which the direct state-chaining approach instead of monads based code-translation to do I/O), Clean wins hands down in many benchmarks.

  • Code reuse gets more difficult: This is a common problems with domain specific languages: Because we create parts of code with different language semantics, it's hard to fit those parts of code together later. We've not only to worry about different interfaces - we have to consider even different semantics! In Haskell we can put monads into other monads (and create some kind of translation-chaining) but this won't works always and so sometimes code reuse gets impossible. And after we left 'plain functional land' and entered the land of some arbitrary new semantics we need special versions of our well known functional tools and need specialized ones.

  • The real semantics are much more difficult to understand: The sheer number of 'how do monads work'-articles speak volumes. And many are still missing the real point: Monads are code transformers. Because this they can do nearly 'everything' - and to understand them you have to understand every concrete monad on each own, because each of them creates a new language! This is it what makes monads so hard to grasp.

  • It can hurt readability: A concrete monad is choosen by the return type of a function. For example a simple 'x <- get' can switch the rest of the do-block into 'state-monad'-land. This is quite easy to overlook because the type of a function isn't always obvious. In Lisp macros often have at least lengthy, descriptive names, in Haskell it's far less obvious. Explicit type annotations are a must here to see whats really happening.

As more I understand the concept of monads the more I'm becoming skeptical about them. Like Lisps macros they simply are to powerful. Instead of creating tools to build new language inside a language, why not directly create a powerful language?

I know that many people will see this differently because they like to use languages as 'construction kits' for new languages. Yes, this this is a valid reason, but only in a very limited domain. In most areas we don't need a language to create another language but to solve a certain, concrete problem: Create a web-application, a word processor, a computer-game or something else. I prefer to have a language with fixed semantics, which I only need to learn once and which don't change (at least until the next language revision). This makes code much more easy to understand and to reuse and this enhances productivity.

As Haskell being some kind of 'research language' originally, monads are surly helpful in this domain. But for a language directed to build applications we need different properties.