Wednesday, December 27, 2006

My Haskell experiences

I've known Haskell for some time now. The first time I've used it, it was with a system named Gofer after I read a book about the implementation of a compiler for the Miranda language (there were no Miranda system available at this time because it was a commercial system). Gofer was quite complex for the time and also quite useless for practical application (especial considering the performance of the PCs at this times, I used it on a 8MHz Atari ST). So I went back to C (and later C++) but tried out functional languages from time to time again. The one I used most was Caml (later called OCaml), but I had certain problems with it and abandoned it after some frustrating experiences.

But now Haskell seems to be en vogue and after playing around with it, doing some toy programming and trying things out, I decided that I have to do something 'real' in it. For some time now I casually work on a system for compiler creation based on a customizable parser front end, a template engine for output, an rule based language with forward inference for doing the semantic stuff and a graphical Browser to work with the system comfortably. Because I reached a point where small incremental changes weren't enough anymore, I decided to do a bigger rewrite. It's written in Java but I thought it would be a good idea to rewrite it in Haskell to learn more about the language.

So for some weeks I've done quite a bit of Haskell programming, produced about 2000 lines working code and made some experiences with Haskell I'd like to share.

Where Haskell really shines is pattern matching: Creating tree walkers or subexpression-matchers is a piece of cake, and if you're writing a compiler, this is really useful. Also the easy and powerful list-handling is quite an asset. Things which would require tens of lines in Java are done in 2-3 lines of Haskell code. No more clumsy visitor patterns, not boring List-object creation with complicated insertions etc. All this is really smooth walking and where you can use it good, it really gives a lot productivity.

Lazy evaluation is the icing on the cake because it allows you to write code in very natural way which would lead to rather bad performance in other languages because calculations could easily calculate much stuff which isn't really required there. To avoid this you often have to create quite artificial structures instead of being able to simply reuse old code. In Haskell you simply create a infinite or big finite lists, even if you require maybe only the first two elements. So in a non-lazy language it's always necessary to have some criteria to abort the evaluation while in Haskell you simply write like it is most natural and then let the function which request the data choose how much it wants to get. Caching inclusive.

Also using a fast REPL (read-eval-print-loop) to test out new code (I used Hugs, because of it's good integration into the Windows environment) can be quite nice. That's also a big plus for languages like Lisp and with Haskell it's quite the same. One bothersome thing with Hugs is that it doesn't allows you to test code out, as long as it finds a single error in one of your modules, even if the part of code your test doesn't touch. Thats a result of the compilation model and type inference and partly understandable. But in contrast to using the REPL with Lisp it's still a bit annoying.

Also in contrast with Ocaml the Haskell type system is much more powerful and allows easy overloading of functions. In Ocaml I always had to invent lots of names or use qualified names for functions because there was no overloading possible (but don't know for sure it this is still true because the last time I used Ocaml was in 2002 and maybe they made some advance since then).

Haskell has type inference and while this can really make code more compact because the lack of necessary type annotations it also has its downside: A new line somewhere in your code can easily create compiler errors somewhere else. Type inference works by creating lots of constraints and the inferencer then tries to solve them. So if an error occurs, the compiler flags this error somewhere in the dependency chain and often at places where everything is totally correct while the real error is somewhere else. If you develop code incrementally this is not as bad because you know what you have written since the last run and can look at the right spot it to figure out the error. But if you write longer parts of code without testing the program in between, it can take you quite a bit of time to find out where the error really is.

Haskell has a solution to it: Type annotations. Simply declare the type you want a function to have and things are working much more easy. So if you want to do something bigger than 20 lines in Haskell: Use type annotations, at least for all top level functions. Luckily it's easy to put annotations into your code later, because you can query the inferenced type of a function each time your code compiles and simply paste it into your program (but sometimes you should rewrite it a bit to support a bit more general typing). The code isn't as small as without annotations, but thats better than doing lots of searching around for the source of compilation errors.

The real problems with Haskell start to show up in the moment you require some kind of state. And yes, you will need it: Many things can be done in pure functional style, but I've often encountered situations where a single function deep down in the call hierarchy suddenly needed some informations from the outside. And in some cases they even needed to modify those informations. Then things starts to get ugly.

In Ocaml you can simply declare global, mutable variables and so the problem is a non issue. But in Haskell it's impossible because it don't allows mutation. Even if our small block of code only needs to read some state from the outside, it's not directly possible in Haskell anymore. In my project I often found a nice functional solutions to a problem, but later on, after I put the (tested and good working) code into action I discovered that some function 'deep-down' suddenly required access to outside data and I had to do lots of rewrites because of this.

Haskell has a solution to this 'outside context problem': Excessive use of Monads. While a bit strange in the beginning, after using monads a bit, you get quite fast accustomed to them. But with more complex situations, the usage patterns are also getting more and more complicated.

To use outside state I often used the StateT monad. This monad allows you to create your own state data, modify it like in an imperative language and it can also wrap an other monads into it (for example the IO monad). So if you use this monad you can access your state at will, even modify it and also have access to IO functionality to write out informations while your program is executing.

(This is another big issue I had with Haskell: Simply create some debugging output. In most language you can simply write to some logging-stream to see what your program has done. This is often even more useful than using a debugger, because you can examine the history of the program execution and see what lead to a certain error. But in Haskell you need access to the IO monad to write data out and this requires you to change the way you write your programs)

Also the conciseness of Haskell stops in the moment you want to access or change state. Even Java code is more concise than Haskell code which has to change things. I know that this is probably intended to force programmers to write programs in a functional style. But compared to the otherwise clear and compact syntax of Haskell its really a pain.

Monads can encapsulate state into some kind of data. In fact all monads are simply data constructors with a few required functions. So most monads work by returning itself from functions. It requires a bit backward thinking, because the state-monad for example creates a chain of closures which are executed later by evaluating the value of the monad. So to use state you don't have to give this state to a function, the monad shows up only in the result type. While this is quite clever (and also is the reason why things like backtracking and continuations can be implemented as monads) from a practical point of view, it also means you have to rewrite your code:

Lets have two simple, small functions:

giveElements :: (a -> Bool) -> [[a]] -> [[a]]
giveElements condition = filter (detectElement condition)

detectElement :: (a -> Bool) -> [a] -> Bool
detectElement = any

The above seems a bit useless but I've extracted from it's surrounding code to show the problem. The 'giveElements' function is able to extract certain elements from a list and returns the filtered list. The 'detectElement' function does the job of deciding which elements are in and which are not. It does this by checking if any element of a list fulfills a certain condition.

So short so nice, but after some work on your program you discover that your 'condition' function required access to outside state to do it's job. Maybe the function has to do some a small modification of some data or should simply write a short message into a log. To allow this, you have to rewrite the above functions to use a monad. For this example lets simply use the IO monad here. Now the functions become:

giveElements :: (a -> IO Bool) -> [[a]] -> IO [[a]]
giveElements condition = filterM (detectElement condition)

detectElement :: (a -> IO Bool) -> [a] -> IO Bool
detectElement c els = do
      r <- mapM c els
      return $ or r

The first function still looks quite similar, only the 'filter' function is replaced by 'filterM' now. 'filterM' is the monadic version of filter and gives the monad back from the condition. So without filterM the state won't get up from the detectElement function (and of course there would be a type error).

detectElement looks a bit more different. The reason is that there is no monadic version for 'any'. So I use a monadic map (the mapM function) to create a list of Bools by applying the condition to each element and then check if at least one of the results it true. This doesn't requires more evaluations of c than the first version because of lazy evaluation, but it still required a complete rewrite of my code. It's still readable, but not as much as before.

And there are some other dreadfulnesses: The '$' after the 'return' is necessary because if you simply write "return or r" it would think or and r are both parameters to return (because 'return' is no part of the Haskell syntax but an ordinary function), so we need the '$' to change this. I could also write "return (or r)" but thats a bit longer and with more complex expressions can easily look like you're programming in Lisp. And you can't easily put the 'r' into the expression below. That's because the '<-' is no assignment but a special syntax which creates a closure with r as parameter.

All this has to be considered if you start to use monads. Only because I wanted to access the IO state from my condition-function, I had to rewrite my little program substantially. And if there would be no filterM if even get even more difficult. I had for example to use a similar code with sets instead of lists - but there is no filterM for sets, only a normal filter. And you can't 'lift' (Haskell slang for 'transform a function by using a combinator') a filter function to a monadic one. Map or fold functions can be lifted to monadic forms because they take a function which returns an arbitrary value, which can be used to put the monad in. But if a filter simply returns a Bool and then there is no way to return the monad from within. So I had to convert the set to a list, filter it with filterM and then build a set again - which has worse runtime characteristics than directly filter it.

And if you have more complex functions the above translation process from a non-monadic into a monadic-one is much more difficult and IMO it also destroy the clarity of the code. Also state monads have bad runtime characteristics because of their 'inside-out' evaluation which can fill up the stack quickly. You can use IORefs instead - but then you can't simply transform your program to using continuations anymore for example. And if you put a IO monad into a state-monad, you have to lift all IO-accessing functions via liftIO.

And whats the point of having a pure functional language, going thru lots of trouble to maintain the purity, only to get things like IORef which you tried to get rid of in the first place? If you design a program to use IO monads from the beginning everywhere, then you have fewer problem maintaining and expanding it - but then you're using Haskell like a normal imperative language. Whats the point of Haskell then?

Another bad one with Haskell is documentation: While you find lots of introductions all over the net, it get quite difficult to find informations if you're beyond the basics and start do do real work. The 'official' docs are simply horrible: No examples, no real descriptions how things work or should be used. And finding tutorials for the advanced stuff becomes quite difficult. You can get lucky and find a solution to a problem buried in some of the pages of the Haskell-wiki, a newsgroup or a research paper - but it will take its time to find it and sometimes there is no such solution. And this is only about 'language stuff', not how to solve a very specific situations in some obscure framework.

In retrospective: Haskell is a nice language, but if you need state handling then it gets ugly. That even kills the language for me. Whats the use of concise and compact code if you have to rewrite it later on into a much more ugly form if you discover that you need state? This is no problem for small programs, so I've never ran into those problems with my experiments before. But with a bigger program things changed. So my program still isn't implemented completely and I will probably abandon the Haskell version and do a rewrite in Java again.

Its not a problem of functional programming in itself, its a Haskell thing: While monads are quite powerful in principle IMO they are a pain to use in practice. Because monads show up in the return type of functions, the easy composition of functions which is the prime reason for the power of functional programming only works as long as all those functions use (compatible) monads, and sadly many functions simply don't exists in this form.

With Ocaml for example you won't get this kind of problem, but there you have unlimited mutation which has it's own problems. But there are other pure concepts like for example uniqueness typing (Clean) or transparently chaining state-parameters thru calls (Mercury). And maybe with more time to get accustomed to the use of monads I would maybe see things differently. But since I have no unlimited time and need to get things done in a programming language, I consider my Haskell experiment as a failure - but only a partial one because it was nonetheless fun and brought lots of insights.

So what to do with Haskell? Maybe it would be an idea to put monads automatically in every return type. If I write something like:

func :: a -> b

then the compiler would creates internally

func :: Monad m => a -> m b

and also rewrites all bodies to do/return-syntax. I don't know, if this would really always work and would really solve all the problems but I suspect that by hiding the use of monads better and letting the compiler decide where he can drop the monad constructors safely you would get you 'clear and normal' looking programs back even if they use Monads everywhere. I'm far from a Haskell expert, so take this with a grain of salt. But since Haskell already provides some syntactic support for monads, why not go one step further? If there wasn't the problem with state (and documentation - but thats only a question of effort) Haskell could be a very nice and maybe even the 'next big' thing.


Anonymous said...

Have you looked at Scala at all?

Hans Schmid said...

Very helpful your assessment on Haskell and Ocaml in comparison. Thanks a lot.

Tony Morris said...

"Even Java code is more concise than Haskell code which has to change things."

Care to show an example? I have searched wide and far having used both Haskell and Java for some time now, to no avail. Enlighten me.

Anonymous said...

I probably have much less experience in haskell than you, but i think there is a better way than doing everything in a monad. This effectively turns haskell into an imperative language.

if you need outside "state" for the predicate in giveElements, you could partially apply the predicate to this state.

pred :: State -> a -> Bool
giveElemens (pred myState) myData

if the outside state needs to be written, you are effectively computing a new state. With a imperative monad this is implicit. You could make it explicit using foldl/foldr giving you a new state in the end.

I agree with you about the lack of good reading about the higer level design of programms as it is very different from OO Design. There are lots of low-level texts about every detail but design in the large is lacking.

Hope you still enjoy haskell

Anonymous said...

I forgot to mention, there is a trace function somewhere in the library. It uses unsafeIO to print a value passed through it.

It is not clean but ok for dumping some debug output. It might appear at some unexpected time because of lazy evaluation.


Anonymous said...

A few small comments:
1) The idiom
x <- foo
return (f x)

is equivalent to
fmap f foo

which would make your code a lot more readable.

2) Most probably, your any-inside-of-a-Monad would have been neatly accomplished by
liftM any

There are a lot of these tricks and neat little helper function to help you handle the monad enclosures. Enough that the suggestions you make sound .. at the very least not necessarily very interesting to me.

Peter Arrenbrecht said...

Have you looked at Oleg Kiselyov's proposal for threading state?

Anonymous said...

The problem with updating state comes from Haskell's clumsy way of modifying a record. Say I define a record containing fields "foo" and "bar". In Haskell foo and bar are automatically defined as getter functions. But there is no equivalent setter function. Instead you have to use a clumsy bit of special case syntax. To increment the foo field in a record "state" you would have to write

state1 = state { foo = foo state + 1 }

The workaround is to write your own setter functions using this syntax.

The other thing about state and monads is that monadic actions are "first class", meaning that they can be manipulated and passed around like any other value. I once wrote a parser where the parser actions had to do IO (it was parsing a config file). Rather than try to create some hybrid IO/parser monad I caused the parser to assemble the appropriate IO action as a return value, embedded in a structure that also contained the other configuration data.

Debugging trace messages can be obtained by using the "Debug.Trace" module, which in turn uses the evil "unsafePerformIO" to print strings to stdErr from within a pure function. It works fine, but be aware that lazy evaluation can cause stuff to be printed at odd looking times.

Karsten Wagner said...

Combined answer:
Yes I know about Scala, but I wanted to work with something 'pure functional'. I'm not a big believer in the multi-paradigm-approach to languages.
For example I have to add elements to a map. In Java I can write

map.add(key, value)

In Haskell the code is something like

modify $ \st -> st { st_map = Map.insert key value (st_map st) }

('modify' if from a state monad).

If I want to access those data:

value = map.get(key)

in Haskell:

st <- get
let value = Map.lookup name $ st_map st

This is not the real problem, but it's rather lengthy especially compared with the otherwise concise code you can write in Haskell.
Chaining state thru function calls is an alternative to monads and will work - but without syntactical support its getting quite lengthy. Also your example only works 'one-way', if you want to change state somewhere deep within you also have to return if somehow. This would also require the rewriting most basic functions like map for example (which is only simple if you only use lists - but if you use maps, sets of other more complicated data structures, you can simply forget it).

Monads are also more powerful because by using closure instead of direct state data they can transform the execution order very freely.
@Michi: I know, that it is often possible to transform code into a more compact form. But it also makes the code more 'rigid' and sometimes even unreadable. Also the examples in the article are 'small on purpose' to show the core of my problems with the languages.

And I believe you that there is much more to learn about using Haskell for me. But if I encountered those problems (with experience in other functional languages) what do you think a 'pure imperative programmer' would think? If you have much time on your hand it's one thing, but I do programming for a living and also like to try out thinks in my free time - but this time is limited and even there I'm 'result oriented'.

I also mentioned my problems with the available documentation which makes the problem worse. While I found multiple infos about Kleisli arrows or Backtracking monads, I've never encountered Debug.trace for example.

Take my post as a hint what problems a Haskell beginner (even with experience in functional programming) has to face and to maybe improve the language or at least the documentation a bit.
@Peter Arrenbrecht: Not yet. I've put it onto my 'to-read' list.

Anonymous said...

Yes the Java code is shorter

> map.add(key, value)
> value = map.get(key)
> modify $ \st -> st { st_map = Map.insert key value (st_map st) }
> st <- get
> let value = Map.lookup name $ st_map st

But you never write that Haskell code more than once. If you find yourself doing it more than once, you define new functions to do those operations (one line each), and then the Haskell is equally short.

I agree that refactoring non-monadic code into monadic code is a little bit of a hassle. But using liftM and/or ap it's not that painful, actually.
I also like that it forces me to think about what I'm doing, You can often redo things slightly to make them pure. And I usually find the pure code nicer.

Anonymous said...

Have you considered using Python. It has many of the functional aspects of Haskell and as good OO as Smalltalk.

Karsten Wagner said...

Sure, you can write something like

addValue key value = modify $ \st -> st { st_map = Map.insert key value (st_map st) }

But chances are that you only need it in once in exactly this form. Problem here is that you can't simply write a generic version of 'addValue' which works not only with 'st_map' but also with some other value in a state, for example 'st_props'. If you only need a getter, it's simple:

getValue prop key = do
st <- get
return $ Map.lookup key $ prop st

now you can simply write "value <- getValue st_props key" (but you still need the '<-' syntax of course and can't simply insert (getValue ...) into some expression).

But this solution isn't possible for setters because "st { st_map = ... }" is a special syntax and not a function. So you first have to create a 'setter-function' (like \st v -> st { st_map = v }) from it and then use this in a generic setter function like the one above:

setValue setter getter key value = do
modify $ \st -> setter st (Map.insert key value (getter st))

This can now be used this way:

setValue (\st v -> st { st_tables = v }) st_tables key value

But it isn't much shorter and less readable than simply using the full code at the top so I haven't used this.

Anonymous said...

I am interested in your approach to testing when working on your application's rewrite in Haskell.

In OO programs, I will delegate most of the work to other other objects that are composed into the object under test. Using dynamic mocks, I can specifically test my object's behavior and how it uses its components.

I am struggling with how to do something similar in functional programs.

If the function I want to test relies on another function, I have to test the independent function first. This forces me into bottom up design, which I have found to be a bad idea in my OO experience.

Anonymous said...

When using Data.Map with MonadState, I always create some helper functions:

getMapState k = liftM (! k) get
setMapState k a = modify (insert k a)

These two go a long way towards relieving the syntax bloat you were speaking of.

Anonymous said...

I think most of the issue here is that you're trying *so* hard to use Haskell in an imperative style.

Flipping this the other way, consider the pain of writing C++ in a functional style.

Sure, you can go use|write all kinds of helpers (boost!) to make it less painful, but when it comes down to it, you've got to program in a style that fits the language.

Anonymous said...

Also just to get debug comments you can just use Debug.Trace it brakes refrentral transparency by using unsafePerformIO (or something), but if you are just debuging its great!