Karsten Wagner's Blog

Saturday, August 02, 2008

The most misunderstood book in CS

After reading yet another blog post which got it totally wrong, I decided to write a bit about design patterns. In my opinion, the 'GOF' book 'Design Patterns' is probably the most misunderstood book in computer science.

The main misunderstandings are:

The authors of the book thought they invented the patterns
The book says that you have to use the patterns from the book to write good software.
The book says that the patterns in the book are the only and most important design patterns

All this is totally, absolutely, completely, outright and utterly wrong. The intent of the book was totally different than many people seem to think today.

To make my point clear, let me quote directly from the book:

"The purpose of this book is to record experience in designing object-oriented software as design patterns. Each design pattern systematically names, explains, and evaluates an important and recurring design in object-oriented systems. Our goal is to capture design experience in a form that people can use effectively. To this end we have documented some of the most important design patterns and present them as a catalog."

So it was obviously known to Gamma et al, that they haven't invented the patterns. They simply collected and named them. Also they explicitly stated that they used patterns from "design in object-oriented systems". They never talked about functional, procedural or logic programming. The book is explicitly about OOP and its stated right in the introduction.

Another quote:

"We all know the value of design experience. How many times have you had design déjà-vu—that feeling that you've solved a problem before but not knowing exactly where or how? If you could remember the details of the previous problem and how you solved it, then you could reuse the experience instead of rediscovering it."

Thats the reason why the authors thought that collecting the patterns was a good idea. They simply wanted to show some successful ways to solve certain problems to teach novices in the field the concepts they have to discover later them self. In no way this means that programmers are required to use certain pattern! The intent was vice versa: Because you nonetheless stumble over the patterns soon enough, why not easy the way and show them right ahead?

And one more:

"Despite the book's size, the design patterns in it capture only a fraction of what an expert might know."

This should remove the notion that the authors thought that the presented patterns are exhaustive.

So what was the real intent and achievement of the book? Mainly two things:

Collect frequently used 'patterns' and give them names to make communications between developers easier.

Programming already have names for certain concepts. "Procedures", "variables", "statements" etc. But before the book there were no names for higher level constructs. For example if we use a wrapper object to lazy create and maintain a single instance of another object, we can now simply say "We use a singleton". Thats the main advantage of giving things names: Make communication easier by using a short name instead of having blurt out lengthy descriptions. This says NOTHING about the question if using Singletons is a good idea or not. But obviously programmers have used this pattern, so the book gave it a name.
Present newbies the 'intuitive' knowledge of experienced programmers by making it explicit.

The presented patterns where common knowledge for most experienced programmers already. When I read the book first more than 10 years ago I quickly recognized most of the presented patterns from by previous experience in OOP. But the nice and exciting thing of the book was that the patterns where presented extracted and isolated from the program they're used in. Most books at this time concentrated on concrete solutions and algorithms and you had to extract the implicitly used patterns by yourself intuitively. By doing this extraction work, Gamma et al were the first who described (for a wider audience) high-level concepts in a easy to understand and also abstract way. There simply wasn't anything like this before. Today all this may seem rather boring because the concept introduced in the book is common knowledge today. But at that time, it was truly revolutionary.

Another common mistake is to think that design patterns are only present in languages like Java. But in fact patterns are everywhere. In every language, in every paradigm. The book just concentrated on OO-design! It primarily used C++ for code examples, because C++ was the 'next big thing' then. C++ doesn't became as successful as expected, but a rather similar language (Java) did. And so it's no surprise, that the presented patterns are now commonly used in Java.

Norvig once wrote that the existence of design patterns is a sign of weakness in a language. I think he was wrong there. Sure, certain patterns are only necessary because a language don't allow to archive a certain goal directly by some build-in concept. But if you build it in, you would quickly discover other patterns 'a level above' instead of getting rid of those pesky design patterns. So we always have patterns, it's simply unadvoidable.

A concrete example: Java uses the "Iterator pattern" quite frequently. Now in functional programming this pattern is quite useless. We simply write map and fold functions and don't need no "iterator". Hah! But hey, isn't "writing a map and a fold function" not also a design pattern? All we need to do is to call it the "map and fold"-pattern. Using this pattern is common knowledge for every user of functional programming languages - so it's good to teach this pattern novices in functional programming right from the beginning. And give it a name to talk about it. And exactly this was the intent of the book. Nothing more, nothing less.

There are lots of other patterns. A small list:

Lisp for example often uses the 'macro-pattern' (move often used code into macros instead of repeating it). CommonLisp uses a certain pattern ('gensym') to make macros safe (which lead to Schemes 'hygienic macros' as a way to make this explicit pattern unnecessary).
In Haskell we have the 'monad-pattern' which was so successful that they even gave it syntactic support (similar to Java which later got syntactic support for the iterator-pattern). Especially the 'parameter-function'-pattern is quite commonly used in functional programming (it's also known as 'higher-order-functions).
From Smalltalk we have the famous 'MVS' (model-view-controller) pattern, which is now very commonly used in other languages too (think of Ruby on Rails for example).
In web-programming we have the 'REST-pattern' and the 'Ajax'-pattern.
And in Javascript we use various patterns to create modules and classes.

This could go on and on and on... but I hope you see my point

So please stop misunderstanding the rather important step in the history of software design this book was. With this I not only talk about certain authors of certain blog posts (who should read at least the introduction of the book before starting bashing it). I also talk about programmers who think that the presented patterns are the pinnacle and sole tool for writing programs!

The authors of the book wrote in the conclusion "be a critical consumer" and "look for patterns you use, and write them down". Patterns are everywhere, but they change with time, language and application. So don't limit yourself to the patterns in the book! And only use them where they are really useful (and not because you think it makes good design to use them over and over and over again). The intent of the book was to make people observant of the higher-level concepts they use. To show that there is more to programming than algorithms and data-structures. To discover and name structure which was always there, right before our eyes - but without really noticing it. Now we notice - and that was the main achievement of the book.

Wednesday, May 07, 2008

What makes programming so difficult - and can we make it easier?

I often thought about the reasons why programming seems to be so difficult and also so different to many other professions. And to what degree it's possible to simplify and quicken the process. Inspired by this blog post, I want to share my view of the topic here.

What's the 'process' behind programming? How do we do it? I think that we can break down the process of programming info three steps:

Step 1: Analyse the problem you want to write a program for and create a model of the problem which can be implemented as a program.

This step has do be done no matter which programming language you use for implementation. It requires both domain specific knowledge as implementation specific knowledge. It also requires knowledge how people interact with computers, about which things are solvable with computer and which are not, etc. This step is something no program can do (at least unless we have real general AI with 'whole-world-knowledge').

And it's also the reason why there are good and bad programmers. Bad programmers aren't able to really understand the problem in all it's details. Because of this they tend to 'emulate' or 'simulate' it step by step (for example by looking how a human solves the problems the program should do and than writing a program which does it similar). But simulation isn't the same as creating a solution based on thorough understanding. Simulation leads to more lengthy and at the same time less comprehensive programs. Also it's much more probable that the program isn't working correctly because by just simulating it's easy to overlook inconsistencies or limitations. Better programmers who really understand the problem can create an abstraction which solves the problem and detect inconsistencies or flaws of the problem description. I think this step is the main reason why it takes so long to get from a beginner programmer to a good one and why certain people never get it. It takes a lot of time and practice to really understand real world problems in a way deep enough to create an abstract model of the problem or process which can later translated into code. In normal life, people don't have to understand exactly how they do what they do: They know the general rules and if they come across an exceptions from this rules, they simple get creative. But to create a program which is able to do the same, you have to think about all those possibilities before even starting the implementation. This kind of analytical thinking, combined with the ability to acquire the necessary knowledge from a previously unknown domain is the real key to create a good solution.

Step 2: Recursively break down the model into sub-models until you reach a level where you can implement such a sub-model with a library or by writing code.

This step starts to depend on the implementation-language/framework. The more powerful it is, the less breaking down we have to do. Bad programmers may have problems in this step too, because they don't know their tools (or the market with alternative tools) good enough to lead this process in the right direction. In this step also most of the algorithms, data-structures and 'patterns' have to be chosen. This also requires a lot CS-specific knowledge on those topics. In this step the quality of the programmers matters a lot, but it's mainly a question of education and training.

Step 3: Implementing the sub-models (the actual coding).

This is the simplest part of programming. Beginners and non-programmer tend to overestimate this step tremendously, because this is where 'the real work' is being done. But in fact the success and amount of work in this step depends hugely on the quality of work in the previous steps. This step is also relatively easy to learn. In principle everybody can do it with a little training. On this level, there's also not much distinction between good and bad programmers. It's nothing but grunt work. But this is also the step where languages matter most, because they can remove much of the work in this step by providing better and faster ways to do it.

All this steps above are done repetitively in most projects. Even master programmers overlook things in step 1 so they have to go back after getting more insight after working with implementations from step 3. But the main points here are:

Step 1 is the most important part of programming, because the quality of work in this step determines how much work and what quality the following steps will bring.

Step 1 is also the hardest and less learn able skill of every programmer. Companies try to use 'software architects' to do quality work in this step because it's much easier do find grunt workers for step 3 than programmers which do good work in step 1

Step 2 shouldn't be underestimated because it has a big impact on the actual work to do in step 3. This step requires lots of 'high-level-knowledge' of frameworks, patterns, data-structures, libraries etc. but not much domain-specific knowledge. This step is where the expertise of experienced 'senior-programmers' can count much. It's also the step people learn to to if they study computer sciences.

Step 3 is the part where the hard but easy work is to be done. Companies often try to source it out into low-wage countries. But the problem is that there's always a strong coupling between all three steps and that the knowledge gained in step 3 is needed to improve the models in step 1 and step 2. That's why small teams with good programmers which do both analyzing and implementing have a big benefit here: It's simply less likely that they run into dead-ends because they have a much better overview over the project.

Step 3 is also the most work intensive part of the job. It often requires more than 90% of the total working time. But this is also the reason why better programming languages and better frameworks can bring huge productivity benefits. It's also the reason why good programmers are more productive: By creating better abstractions and structures in step 1 and step 2, they eliminate lots of the work they otherwise had to do in step 3. But productivity in this step don't depends that much on the quality of a programmer.

The above explains why it's so hard to teach 'good programming':

It is relatively easy to teach people how to to step 3. I think most people can learn this. At this step programming is like a craft.

Step 2 is much more difficult because the required knowledge is more abstract and also requires more experience and training. To be successful here, a programmer needs good knowledge of data-structures and algorithms, of frameworks and libraries. Much of this don't even depends on a certain language but is general abstract knowledge about programming. But learning abstract things is more difficult for many people than simply learn the concrete syntax and semantics of a programming language. So it takes more time to become successful in this step and there will be people who will never 'get it'. This is also the step where programming is very similar to engineering.

Step 1 is the hardest of them all. I don't even know how to teach this step. I think the only way is to work as programmer and acquire the necessary knowledge by time with experience. Learning the way to think to be able to do this step successfully is rather 'unnatural' because in normal life it would be considered as pedantic, nitpicking and overly-analytic. This may be the reason why good programmers are often looked at as a little bit strange - if they use their problem analysing skills in real life to often by their surroundings. It's also the area where programming becomes kind of an 'art' and where most of the 'design part' of programming lies. But since this step can't be isolated from the other steps, good programmes can't be just designers, they still have to be also engineers and craftsman.

This breakdown also explains why the language often doesn't matter as much as many people estimate: A good programmer with good step-1/2-skills will create models which are much easier and faster to implement in any language. So even if the implementation is done in language A with only 30% of step-3-productivity as language B, the better models, frameworks and libraries can easily outweigh this, creating a superior net productivity in the end.

Conclusion: In the end programming is at the same time craft, engineering and art. It only depends on which step you focus. And while it's possible to split up those steps between different people, the tight coupling between those steps imposes hard limits on how successful this split-up can be done. I think it would be the better way to improve the tools and languages to make step 3 as least cumbersome as possible. But all this won't change the fact that programming is much more than step 3 alone and that it still depends on people who are good at steps 1 and 2 to create good programs.

Wednesday, March 12, 2008

Is 'power' really what we want most from a programming language?

Power seems to be a nice thing to have. But what does it really mean? Think of the president of the USA: He's considered quite powerful because (in principle) he is the one who can 'press the button'. But with all his power to destroy, he still can't create peace in the middle-east. That's because power isn't the same as productivity. Power enables one to exert force - but often this simply isn't enough to reach a certain goal.

Now back to programming. Lets take Common Lisp. It's a language which gives you lots of power: You can program in nearly every style you want, forcing your way through all paradigms and programming styles. You have the power to control nearly everything, from the reader to the compiler to the semantics of the language. This language really has power!

But what about productivity? For all this power, the number of applications written in Lisp is quite small. You may argue, that this is a result of political reasons or simply because "people don't get Lisp". But if something is really so much better, then it will wipe away political or psychological inhibitions with ease. Just because people can't afford to give their competition an advantage. But if this is true, why hasn't all this impressive power created equally impressive results?

The answer may be that "power" is simply the wrong metric here. What we need to look at is the "fitness" of a programming language. As we all know, "fitness" is the measure which really counts if lots of things are competing against each other. Fitness is the driving force in evolution, it made the living world the way it is now. And evolution isn't a concept which is limited to biological systems alone: Every system which allows for the concepts of "selection" and "mutation" evolves. Societies, companies in a free market - so why not also programming languages?

But if we now take a look into animals kingdom, it's easy to see that "power" don't seem to play a role in deciding which species succeed and which become extinct. Where are the big and powerful dinosaurs now? What about the big and strong mammoth? Instead of them, it's the small and relatively fragile cockroach which could be considered as a huge evolutional success. So fitness seems to be everything but power.

Back to programming: What determines fitness in the realm of programming languages? A programming languages “lives” if people use it. So what do people really want from a programming language? What makes them use it?

Maybe those three: Productivity, accessibility and usability.

"Productivity" seems to be a very important one. In the end most people use programming languages to create some kind of application, in other words: "To get the job done". Sure, there are people who use a language just for fun, and this may create a evolutional niche for languages which have properties like "beauty". But living in a niche isn't the same as success. It is "productivity" which determines if a language can be used with commercial success. If it's more a toy or of a tool.

"Accessibility" determines how easy it is to get started with the language. A powerful, but also very complex and thus rather inaccessible language has it much harder to convince people to use it. Also a language which is only available commercially and costs a lot of money? Its accessibility goes down dramatically. Same for languages without good documentation and tools. It depends on its "accessibility" if people start to use a new language or stay in the one they're familiar with.

And there is "usability" which determines if using the language feels good or like a nightmare. This has also a lot to do with the language infrastructure: Tools, libs, documentation etc. But also with the syntax and semantics of the language. All this also determines productivity, those three points are kind of overlapping (but still distinct). In the end "usability" decides if people are happy to use a language and stay or if they look around to switch to another as soon as possible.

What about "prevalence"? In fact this is only a secondary property because for a language to become prevalent it first have to succeed without. Once it reached it, a good prevalence rate contributes a lot to let a language stay alive. But don't make the mistake of overestimating this property. Every language starts small and has to compete against more prevalent languages first. PHP had to compete against Perl, Java against C++ and Cobol, Ruby against Java. Every new language which is able to step out of its minority niche had to prove itself first in the previous three areas, so those are the more important. But "prevalence" explains the inertia in the world of programming languages because once reached it can increase the first three factors. But those are still the real important ones.

And where is "power" now? Can it improve those three factors above? Yes, but it also can be a hindrance. So power is a only secondary means, what really counts are the results of power in the above mentioned categories. Like in nature, power is always a question of balance: More power power may make a creature more competitive against more powerful creatures in the same ecological niche - but it also requires more energy throughput which can be fatal in times of scarcity. In programming languages more power can increase complexity or micromanagement which would in turn decrease all the above three factors. So instead of using "power" as a primary means in discussing and comparing programming languages, we should more concentrate on the real important properties above.

With this in mind it's easy to explain why certain languages had big success in the past while others only live in a niche. Like in the biological world, every species survives as long as there is some kind of space where it fits in better that it's competition, so there's always room for lots of programming languages in the world. But only few will be able to break out of their niche and have general success. Let's look at some languages in detail:

Lets start with Common Lisp: Why is it a niche language now? Lets look at the three factors from above: "Accessibility" (what makes people start using a language). Big problem here is the syntax. All those parens... This is no problem after you got used to it, but for the beginner it's simply hard. "Usability" (what make people stay with the language). This depends. For me usability is bad because of the lack of static typing (look here for the reasons. And yes, I know, there are ways to get static typing in Lisp, but I still have to use 3rd party libs which are created without and Lisp simply don't work well if you use static typing. It's simply not “the Lisp way”). And productivity? Not that good either (for me for the same reasons) but of course it depends on the developer. And of course on the money you're able to spend because good Lisp systems (which would allow be work productively) where quite expensive in the past and some aren't really cheap even now.

Now a rather successful one: Java, of course. Java was very accessible from the beginning. It's familiar C-like syntax combined with a rather simple language definition made entry a piece of cake. And it was freely available from the beginning. Sure, usability and productivity weren't perfect right from the start, but things go better fast (especially after MS stopped to mess-up Java and Sun got full control over the language). And after Java 1.3 productivity was very good (mainly because of the huge available libraries and tools and because Java had garbage collection). And usability was great too (mainly because of the powerful Java-IDEs, the good documentation and again because of gc). So it's no wonder why Java became a big success. Things have changed a bit in the meantime because of changes in the technological landscape which starts to hurt Java. But it will stay for some time, simply because of it's prevalence.

Another big OOP-language was Smalltalk. Why wasn't it more successful? Again: Accessibility. This time it was primarily the availability of programming environments which prevented its success. Smalltalk-systems where very expensive and they only worked in their closed 'world'. And usability? Quite good - but only if you were able to shell out lots of money and could afford to neglect OS specific stuff. Smalltalk applications were hard to deploy and always foreign on the desktop. And at the time, Smalltalk has it's high, desktop-apps where 'the thing' (this nearly broke Java's neck, too, but because Java was a little bit later, it could concentrate on the again upcoming market of server-side-programming). So while Smalltalk could be quite usable and productive in principle, in practice it was only in it's small ecological niche while outside other languages (like C and C++ at this time) ruled the market.

Now at last, a look at a more current language: Haskell. It's still to early to make a final judgement, but in the moment it doesn't looks good to me. The language has bad accessibility because of it's complexity. Its usability is also not that good because of bad documentation, lack of good IDEs and certain hard to use language features (I wrote about this some time ago). And productivity doesn't seem to be that good too for the same reasons. The language has for sure big appeal to the research community and to people who like brain-twisters, but this isn't enough to make it a general success.

I could go on and look at a lot more languages this way, but I think you can do this by yourself. Just try to be free of prejudice and don't let your personal preferences blur your view. Can you guess what the "next big thing" may be?

To come to an end: The criteria above may not be the best. Maybe you can up with better ones. But one thing is for sure: "Power" is as much overrated in programming languages as it is in nature.

Monday, March 10, 2008

Static vs 'dynamic typing', part 2: The personality-factor.

After talking about some of the basic nomenclature I will look now into the pros & cons of static typing. But instead of going into pure technical details I want to look at it from a less common angle. What if it's not really a technical but more of a psychological matter?

Let's start by looking at my personal experience: I've used languages without static typing often enough to know that they don't fit into my style of development. I've used Lisps, Ruby, Javascript and played in various other languages without static typing. But it never felt good. And I never was really productive. Sure, when I started a project, everything went fine and I enjoyed the freedom I had without thinking about static types. But as the project grew bigger and bigger, I started to miss static typing more and more and I also needed more and more time for rewrites and 'relearning' my own creation. With static typing I have no problem creating and maintaining programs 10000s of lines in size, but without it seems that above 1000 lines my productivity is dropping dramatically. Like I got stuck in a tar pit.

Of course this is only my personal experience. It may differ for other people but I suspect I'm also not alone with this kind of experience. So what if the question it's not simply a pure technical thing but depends much on the personality of the programmer? That there are personalities which can work well without static typing while others struggle?

Static typing creates some kind of 'safety-net'. It protects you from yourself, forcing you into thoroughly thinking about your design. Without it, it's very tempting to use short cuts instead of solid design (which seems to have always have some overhead in line-count and artificial complexity, at least in the beginning). I am a lazy person and because of this I often take those short cuts - and regret it later.

Now this is also true to a certain degree if I use a statically typed language. If I enter new territory and come across a problem I've never had before, I try to get a working version as fast as possible to try out things and learn more about the problem. It's rather uncommon that I'm able to create something solid in the first try. I could try to work things out on paper, but in the end I've always overlooked crucial things which tend to pop up only in a real implementation. In most cases my first implementation kind of works but is full of bad hacks that it will be rewritten or at least heavily reworked later until it it ready to stay.

But what has this to do with static typing? Static types create global constraints which describes your program in a non-local way. This creates a skeleton you have to fit your program into. If I have to solve a sub-problem which is new territory for me, the rest of my program creates a solid framework in the background which can lead the process while the IDE detects mismatches instantly. From my experiences with languages without static-typing this typed-framework combined with instant-feedback is the thing I'm missing most and which cost most of my productivity.

Static typing also helps a lot if you're refactoring your code. I'm not only talking about automatic refactorings, but also the good old cut&paste-refactorings. Just cut some code out and paste it somewhere else. And the IDE instantly highlights the unresolved names, type-errors etc. which shows where the code has to be changed to fit into the new place. I'm refactoring my code constantly. I rename, change signatures, cut code into pieces to assemble it new elsewhere, extract methods etc. I always 'sculpt' my code, forming it in place. As I write new code I always refactor old code. Because of static typing this nearly always works without introducing new errors and can often even be done semi-automatically by the IDE. If I write code in languages without static-typing I refactor much less. Now every refactoring is a risk, may corrupt my code without noticing it. So it takes more time because I have to be more careful. And even if a mistake shows up by a failing test later I still have to find the source of the error instead of simply checking-off one error-highlight after the other.

The funny thing is that people tend to think that static types hinder exploratory programming. But with my style of work I can explore things quite easily with the abilities to restructure my code again and again, instead of first thinking about how to write for example a macro abstracting things out. So static typing makes exploratory programming even easier for me (but only thanks to the IDE - without it doing all those things by hand would be quite cumbersome) and in the moment I have to work without, I feel bounded and forced to think much more ahead instead of simply exploring things.

Many people praise the 'REPL' ('read-eval-print-loop', a way to execute code immediately in many non-static-typed languages) because of it's way to instantly try out things. But a IDE with instant error-highlighting is even faster in pointing out mistakes - and works with all of your code, even if you don't suspect a mistake.

REPLs are nice and fine, but they need to be used explicitly be the programmer. This requires to enter and write additional code and it also works only if the code kind of works. Instant checks based on static types work automatically and even if the code is still incomplete. So instant checks based on static types can be much more immediate and faster, while 'dynamic checks' like tests (done in the REPL or with testing frameworks) only work later if you're already finished a piece of code.

This leads to the disadvantages of TDD ('test-driven-development'): Tests must be written explicitly and thoroughly or they aren't of much use. This takes time. And if you change your design you also have to rewrite all your tests. And that's is my problem with real TDD: It's again my laziness. Writing and maintaining those lots of tests is essential for this method to succeed - and I'm simply to lazy to do this.

I always let my tests stay behind instead of leading my designs, testing only the most important things instead of doing the kind of fine-grained testing which is necessary to make this methodology work. I always think "hey, this code is simple, why test it, I shouldn't have made any mistakes" - and later I find a typo or something similar stupid. But with a static type-system which let the IDE quickly point out where I made mistakes this happens far less. And it scales well with the size of the program. In small ones the overhead of searching for typos or simple stupidities isn't that big. But as bigger the code-base grows the longer it takes to find those little mistakes. So without static typing I've lost my most important tool to write new working code and it's not wonder why I'm not very successful without it.

The same is true for documentation. Static types in combination with expressive names are often already enough documentation. Sure, for more complex parts of the code, writing additional documentation is mandatory. But again the problem is to hold it up to date with code changes. With static typing I use the types as part of documentation and write only documentation which explains the non-trivial parts of a method or class (like algorithms, how-to-use etc). So static types are an excellent tool for compiler checkable documentation which never gets out of sync with your code. Without static typing the documentation is optional and lazy people as I am tend to omit it to often for their own good.

I think here is also the reason why there are people who can work well without static typing: If you're very disciplined (or work under strict project management which enforces the rules without exceptions), TDD can probably be a feasible alternative to static typing. And if you don't refactor that much as I do (because I am often to lazy to create the right names and interfaces in the first try and have to change it later if the code stays), the advantages of safe refactorings are quite useless for you.

But for lazy people as I am, discipline has to be enforced by the language. And this is what static typing does. It's not perfect because it can't find all kinds of errors. But it forces me to create clean designs and solid interfaces. It points out flaws in my designs and forces me to think about them. It gives me immediate feedback and it is also useful as documentation. It makes refactorings easy and let me clean up my code more often. And the things it can check are not only checked, they are even proved, so I can rely on them (compared with TDD which can check a wider range of things but with much less guarantees because I never know if the tests were correct. I can remember spending hours on 'broken code' discovering later that the code was correct but one of my tests was faulty).

So the prime reason why people can't stop to argue about the topic is probably that people are different and have different ways to work. If you're a disciplined person who writes as much code for tests as for the 'real' program, you will benefit from the freedom a language without static typing gives you, without getting stuck at a certain complexity as I do. If you tend to write working code in the first try, you won't refactor that much and rely more on building different abstractions using the more dynamic nature of a language without static typing. But if you see comprehensive tests as a distraction from the 'real work', while at the same time you're not having problems with thinking more about overall design, creating interfaces and specifications, then static-typing is the way to go because it fits much better to your way to work. Even if it requires additional code and stands sometimes in the way of creating a better abstraction.

And I think that's also the reason why most 'enterprise-programming' is done in static typed languages: With many different people creating and maintaining a project often over decades, it it much to risky to depend on the personality of the individual programmer. And the situation is also asymmetric: People who can work well without static typing can also do productive work in a static typed language. But a programmer who works better with static-typing can totally mess-up a program written in a non-static-typed language. So the risks are much smaller if the language used has static typing. And I'm also quite sure, that the number of programmers who can work well without static-typing is much smaller than vice versa because people in general and programmers in particular are more on the lazy than on the disciplined side.

So if you argue next times over the question 'what is better', please keep in mind that people have different ways to work and to think and that there may be no 'one right way'. To make people most productive, it's important to acknowledge their way to work. Static typing is an additional tool which is very helpful for some people but may also be to restrictive for others. Try out both and see what fits you best. As programmers we all rely on tools. Nobody would be productive today by entering machine-code with switches as in the early days of computing. So there is no disgrace in relying on a compiler or an IDE to do your job. Find the tools which help you most without feeling forced to adopt a certain style which is 'hip' in the moment. It's important to learn new things and try out alternatives constantly - but in the end, all what really matters is to get the job done.

Tuesday, February 26, 2008

Static vs "dynamic typing" (part 1)

Yes, I admit, it's an old, often reiterated discussion - but nonetheless it's still a quite frequent topic. So I decided to write a bit about it - in fact it's not only a bit so I've split it into multiple parts. In this first part, I want to talk about some of the basics, in the next one or two I will discuss the various pros and cons.

Lets start by talking a bit about nomenclature.

First: The opposite of static typing is not dynamic typing! It's also neither weak typing nor latent typing. Many languages with static typing also have runtime type checking (For example Java). Static typing is an additional layer on top of an otherwise (strong, weak etc) typed language. Because of this the opposite to static typing is simply no static typing!

Second: The existence of static typing can have effects on the semantics of the language. While this is not true in typed lambda calculus (the common textbook stuff) certain existing languages mix the boundaries between typing and language semantics. A quite common example for this is static function overloading in languages like Java or C++ which can't be implemented with a dynamic typing alone (because it requires a "static type"). But there are also languages where static typing is an totally isolated layer which can be removed without changing the semantics of the language.

Third: Latent typing and static typing aren't mutually exclusive. "Latent" only means that we don't have to explicitly write out the types of variables etc. But this can also be archived by doing type inference in a static typed language. So "Latent typing" is in fact only the opposite of "nominal typing".

The above may sound a bit nitpicky - but it's so often mixed up in articles and discussion about the topic! I think its important to know the differences before we delve into details.

But what is "dynamic typing" now? It's simply "type checking at runtime". In other words: Object can have a "dynamic type" which allows to choose the right operation not only by the name of the operation but also by the types of the operand(s).

For example Java and Ruby are both dynamically typed languages. But Java has also static and nominal typing while Ruby has no static typing and is also latently typed. Both languages are also strongly typed.

Now certain languages are "dynamic beyond types". The notion "dynamic language" has been established for those languages in the last years. A dynamic languages simply allows the program to change itself. This can be self modifying assembler code, the dreaded "eval"-function or the addition or removal of fields and methods at runtime. Most languages are "dynamic" to a certain degree. So called "dynamic languages" languages (like Ruby for example) are simply much more dynamic compared to a less dynamic language (like Java for example). But it's a continuum, not a have or have-not. "Dynamic languages" don't even need to omit static typing, if they have a compiler which does type-checking after every code-modification (which is a bit hard to do, so it's rather uncommon). But I don't want to talk about the pros and cons of "dynamic languages" here, the term is just often used in discussions and I think it's important not to mix it up with the typing-topic. On a fundamental level the terms "dynamic language" and "dynamic typing" are even orthogonal concepts.

Now from the above it should be clear that non-statical-typing is not really a feature but in fact an omission of a feature. Dynamic typing in itself is a rather common feature: Every OOP-language requires a certain amount of dynamic typing (for the method dispatch).

So talking about "Static vs dynamic typing" is in fact nonsense (thats the reason I used the quotes in the title). The whole discussions is in really centered about the following question:

Is it a good idea to have a static type-system in a language or not?

But thats the topic of part 2.

Friday, August 17, 2007

Better exception handling

Exception handling seems to be a solved problem. Just use some kind of try/case/finally constructs and we're done. Are we? Exception handling creates several flaws and severe holes in the static checking of a program. While lots of people are researching type systems, exceptions open an ugly hole in the back of most of those systems. But if you try to do it right and type-check exceptions (like Java does) we get horrid and sometimes even hard to write constructs which blow up our code with lots of boilerplate. I tried to find a better approach to the problem.

Whats wrong with checked exceptions? Lets look at a simple example in Java (I use Java here because most languages use unchecked 'dynamic' exceptions). We want to open a file and read some data from it. If we are naive, we could think it could be implemented as


String readData(File file) {
  BufferedReader in = new BufferedReader(new FileReader(file));
  String data = in.readLine();
  in.close();
  return data;
}

Of course it's not so simple. The 'new FileReader(file)' can throw a FileNotFoundException. And since FileNotFoundException is a 'checked exception' the above code wouldn't compile - which is kind of a good thing because it prevents us to overlook the fact that the reading-process can fail and makes us think of a more sound solution.

So we have to handle this exception somehow. We may decide that this function should return a default value if there is an error opening the file. So we update our code accordingly:


String readData(File file, String default_value) {
  String data;
  BufferedReader in = null;
  try {
    in = new BufferedReader(new FileReader(file));
    data = in.readLine();
    in.close();
  }
  catch(FileNotFoundException e) {
    return default_value;
  }
  return data;
}

Looks fine. But it still won't compile yet. First our 'readLine' could throw an IOException, so we replace our FileNotFoundException with a IOException to catch both kind of exceptions. Now it would compile - but it's still not correct. Look at the 'in.close()'. If an exception occurs in 'in.readLine()' the program will skip 'in.close()' and we may have a resource leak (sure, it will collected by the GC some time later but we want to be really correct here).

So we change our code into


String readData(File file, String default_value) {
  String data;
  BufferedReader in = null;
  try {
    in = new BufferedReader(new FileReader(file));
    data = in.readLine();
  }
  catch(IOException e) {
    return default_value;
  }
  finally {
    if (in != null) in.close();
  }
  return data;
}

to make sure that in.close() will be called if an exception occurs. But again this won't compile because in.close() can throw again a checked exception.


String readData(File file, String default_value) {
  String data;
  BufferedReader in = null;
  try {
    in = new BufferedReader(new FileReader(file));
    data = in.readLine();
  }
  catch(IOException e) {
    return default_value;
  }
  finally {
    if (in != null) {
      try {
        in.close();
      }
      catch(IOException e) {
        return default_value;
      }
    }
  }
  return data;
}

Just look at the size of this code. Is it correct now? It will compile, but the behavior may be still wrong. If our 'in.readLine()' succeeded and then an error happens in 'in.close()' we get out default value even if we have some valid data. Of course this depends on the specification but to make sure that we get data in the moment even in this case we create a final version of the code:


String readData(File file, String default_value) {
  String data = default_value;
  BufferedReader in = null;
  try {
    in = new BufferedReader(new FileReader(file));
    data = in.readLine();
  }
  catch(IOException e) {}
  finally {
    if (in != null) {
      try {
        in.close();
      }
      catch(IOException e) {}
    }
  }
  return data;
}

Thats finally it. This code seems to work - and funny enough the 'catch' blocks are both empty now. Its all boilerplate. And it has required more thinking than one really wants to invest in such a simple piece of code.

So while checked exceptions give some safety because they don't let you miss a exception in your code so easy, they do create lots of hard to read and write code. Thats the reason the Java creators obviously started to question themselfs if its really a good idea and added unchecked exceptions to the language too. The most famous one (also available in most other OOP-languages) is the quite frequent 'NPE', the 'null pointer exception'. In principle every method call or field access in the language can create such an exception so Java don't require to check those explicitly. But is this really a sensible solution?

First: Every time we write code which could throw an exception we should be aware of this. Simply doing a core dump if we call a method on a null pointer would be much worse.

And for less frequent situations like opening files it may be a good idea to force programmers to think about handling of the possible error cases. The main reason someone uses a 'static typed' language is that the compiler can find runtime errors by using some typing rules. But if the compiler can only prevent 'no-method-found exceptions' while lots of other kinds of exceptions slip thru, we wonder if this is worth all the type-annotations and harder to understand language semantics. Java tried to solve this problem at least partially - but in the end the result wasn't that impressive.

But could we do it better? Or have we simply to accept the fact that exceptions can happen all over the place and we go happily back to unchecked exceptions with try/catch using lots of unit tests instead of relying on compiler checks. While some people seem to draw this conclusion, I think there is a better way: Create a language which can't throw any exceptions anymore!

Lets for example look at one of the most common exceptions, the NPE. The only reason it's possible to have such an exception is that we allow references to be 'null'. If we require that all references have to be non-null, we also don't need a NPE anymore. Ok, you may object that this is a bit to much, at least for a normal OOP language like Java, so lets tackle the problem a bit more cautiously and require that references which can be 'null' have to be explicitly declared as 'nullable' and that accesses via those references are only possible if the compiler can prove that it will be never null at the points where it matters.

In other words: If we want to call a method on a nullable reference 'ref' we have to use some if(ref != null)-expression first and handle the other case explicitly.

This isn't a new idea and would be a great extension to many static typed OOP-languages (I know that it's kind of possible in Java now by using annotations, but its far from comprehensive enough in the moment). It makes the NPE unnecessary and we would be able to remove it from the language completely. Why not think about concepts to do similar things with all kinds of exceptions?

This is of course hard and I suppose it won't work for most of the current languages. But lets start to think about it. One of the second most frequent exceptions is the "Array index out of range" exception. Can we eliminate this one, too? If we look at code which uses arrays or array-like collections, we notice that indexed access is in fact quite seldom really necessary. Most often data is appended or the array is simple iterated over. Those operations are 'index safe' because they simply don't need an explicit index. And if we look at list-handling functions in most functional languages we see that direct array access isn't seldom necessary there, so it seems possible to get around it in most cases.

But what if we really need some direct access via some index in one of the remaining cases? We can now simple require the same thing as from the above concept to remove NPEs from a language: If we want to access an array by providing an index we first have to make sure that the compiler can prove that the index is valid. Which again means that we may need to add a 'if (index < array.length) ...' check. This may look ugly but we have to check for bounds somehow anyway, so why not make it provable by the compiler?

To make this a bit more easy we can also add special annotations to method/function declarations. For example if we have a function with signature


    int addTwoElements(int[] array, int index1, int index2)

and we require that both indexes are in the range of the array length, we add annotations to let the compiler know that 'index1' and 'index2' are indexes into 'array'. This could look like


  int addTwoElements(int[] array, @index(array) int index1, @index(array) int index2)

and would allow the compiler to make sure that addTwoElements is called with valid parameters and we won't need to check it the function. This is a Java-esque solution and could be improved. But remember: The main idea is to write code in a way that indexed access isn't even necessary. So those annotations are only necessary for the remaining cases.

This way we can start to write programs in a way exceptions aren't possible. In most cases where a method or function can throw an exception means that we haven't thought enough about the problem. If we pull the ways a function can go wrong out of the code and put it into the type of the function we can make the compiler prove that our code has to succeed and that all necessary checking occurs before a function is called.

This is slightly similar to the concept of dependent types but it's more simple to do because the compiler only has to to the checks while the programmer creates the annotations himself. We don't require the compiler to prove everything. Where it fails to do so, we simply add a check or an annotation. Such a type system is much easier to create, to implement and to understand than a real, useful 'dependently typed' type system.

But what about the file-access problem like in the example above. File access can go wrong in any part of the operation (even in a 'close') and we somehow have to react to it. And since the success of an operation depends on the external state of the program it can't thus be proved or disproved by the compiler. So how to tackle this problem?

First idea: Just give up and add checked exceptions just for those kinds of cases where access to some some external resource can go wrong. By adding some language support (like C#'s 'using' construct or something similar) and we can even avoid most of the boilerplate code. But aren't there better solutions?

There are, but it's hard/impossible to do in a language like Java because it requires to write side-effect-free code. For example for the file access we can pull the file access 'out' of the body of a function and use lazy evaluation to process the data. We would now have a model where all I/O read all the data at once and if it fails, it fails in the read operation before the operation which uses the data. This way we would simply handle the check once here instead of distributing it to each and every 'readData' call alls over the code. But its impossible to implement it this way, first for performance reasons and second because input and output may happen simultaneously and may even interact to each other. So we do the reading and writing lazily but provide a model where it looks like it happens at once.

If now something goes wrong, the whole process is aborted and the operation is undone. The 'outer'-layer of the code detects the error and uses some ordinary if-then construct to do error handling. Since 'bubbling' of exceptions from a 'throw' to a 'catch' isn't necessary now, we can use simple and explicit ways to handle failure.

The whole code which does the operation (a complex parser for example) won't even need to know about the abortion because this is only of interest of the caller. Why is this impossible in a language like Java? Because Java could do operations with unlimited side-effects which aren't 'undoable' anymore. So at least the code which does the operation has to be pure functional or there has to be some kind of transaction-support.

This will work for many situations where we have a 'all or nothing' requirement. But what if we also want a partial result if an error occurs somewhere in between? Just add some 'error-token' to the data stream and let the operation handle it. This way we have two different kinds of IO-streams: One which will undo the whole operation if a failure happens and one which will simply provide an 'ERROR' token from the moment where some error happens but which won't undo the operation. By adding this behavior into the type of the IO-facility we have compile-time safety and still lots of flexibility.

Of course those three examples are only a little part of the huge number of exceptional situations a program can run into. To solve all the other ones we have to think about those and have to develop new idioms and patterns. But in the end it will always require that a programmer thinks a problem thru instead of relying on tests.

Bugs in programs will of course still exists. But by writing programs in the way described above (and having programming languages which support this style) we could get rid of lots of the most annoying bugs and write better and more stable programs. And by the way we could get rid of lots of ugly exception handling boilerplate code.

Wednesday, March 07, 2007

Discussion: Why do most people seem to use inferior programming-languages?

My last article created a relativly high number of comments, which I will try to discuss in this post.

The article was intended as my 'answer' to the topic which is very common in all places where discussions about programming languages takes place: Why do certain ones succeed and others not. I think this is an important question because most people who create new languages want to have some kind of success with it.

"'the market has to be right by definition.' is a wrong assumption" or "the best things seldom win"

I understand why people think this this way. They simply fail to see the 'big picture'. If you only look at certain aspects of a topic you can easily overlook important parts of the reason which are outside your field of view. This is particular true here: If you only look at programming languages by looking only at the language itself, you can easily get a wrong impression. We always have to look at the whole system, not only at a particular part of it.

A often used example is the win of VHS against Beta. Beta was technically better and still VHS succeeded. Isn't that a proof that the best thing will not automatically win? Only if you fail to see the whole picture. Technological superiority is only one part of the whole. Others are prices, license-fees, patents, available support etc. Beta failed because of Sony's licensing strategy. VHS was 'free' and could produced by companies without having to wear the straight jacked Sony tried to put them in. Whole People like 'better things', they don't pay every price for it. High-end fans may hate this, but most people simply don't care this much for quality that they will pay a much higher price for it.

So if a 'inferior' technology succeeds over a seemingly 'superior' one, then there are always good reasons for it. And believe it or not, but this this reason is never stupidity of the users.

"Companies are eager to adopt new things, but the problem is the scarcity of competent workforce."

This is the reason why it always takes some time to adopt new tools. But it can't explain why certain tools never catch on. Remember that the job market is also a market. Unemployment is a risk in most countries and people always want to get a good paid job with good working conditions. So the 'workforce' does compete for the best jobs like the companies compete for the best workers. But its always an optimization process: The average developer has to decide if it really pays to spend lots of time into learning a certain language. People always have to weight the options against each other. Of course this strongly depends on the area you're working in. It's a completely different picture if your working in academics, if you're doing programming as a hobby or if you're working in the industry. If people only look at one of those areas, they can easily come to wrong conclusions.

If now a company tries to adopt a new language, it's like the mentioned car manufacturer which starts to invest in robots: It costs a lot of money and it will take some time for the investments to pay back. Switching languages because of higher productivity also means that there is an initial higher cost for qualified workers and training. But only in the beginning. Remember that people learn their job not in school or in university, they learn it when they do their jobs. Sure, companies would really like to hire experts only, and some may even afford to pay the price for it (like Google in the moment). But thats not a model for the whole industry because most companies simply can't afford it or aren't 'attractive' enough. So companies have to do their share in training their workforce them self or they will never get the personal they need. Most companies really do this. And this is independent from the language you train people in. Some companies are even using obscure in-house languages which are totally unknown outside the company, because in certain domains this can be the most productive way of doing work - but you can't expect that they are able to find programmers with experience in this language, so they have to train them in-house.

And if you hire a 'fresh' Java-programmer for example, you can't expect that he/she is able to do good work from the start. Java may look simply to some people, but doing good work in it isn't that simple either. So the company has to do training and this is the reason why those people start to work as junior programmers which are supervised by senior programmers who know more about the job. And after some years, the junior starts to do good work and can eventually become senior himself. The reason why this seems to look different in other areas of programming (like Lisp, Haskell etc) is that companies which try to adopt a new language are often startups or companies which are in a phase of rapid growth or transition to a different market and need good workers quick. So they pay more to get the seniors directly without having to train them them selfs. But this is only a phase, it will never work for the market as a whole. And this has also happened with Java in the beginning.

And if the company succeeds because of using a different tool, their competitors will notice this and start to use this more productive tools also. This will in turn noticed by the Job marked, creating higher demand for this new language, creating more workers able do do their job in this language with lesser training. And in the end, the industry has again made a transition, even if there are of course still companies who use the old tools for various reasons.

"Lisp/etc. is simple? You must be joking."

No, I'm totally serious here. I know that there is the big fallacy that mainstream languages are more simple than less mainstream ones. This may really be true for some of them, but these are relativly rare exceptions.

Sure, to use a different language, you need time to adopt. But even switching from C++ to Java is much more difficult than certain people assume (and I suspect that those people don't know much about at least one of those languages). While the languages seem quite similar on the outside, to really master them you have to use quite different methods of programming. And if you even switch from C to Java, the step is at least as big as switching from C to Lisp or to Smalltalk. The syntactical similarities between C and Java are quite deceptive. It's a marketing gag, designed to give Java a bit more momentum in the beginning, but after you're over it, you see that both languages really haven't much in common.

Now users of certain non-mainstream languages seem to like the idea that their languages are 'to sophisticated for the masses' giving them a false sense of superiority. This way of thinking may give a good feeling, but it's nonetheless utterly wrong: People don't use Lisp because it's so difficult, they use it because they don't get their job done as good as in other languages. Because of missing tools and (especially some years ago) because of missing performance. The best language isn't useful if you have to reinvent all the wheels you can get for nothing if you use another language. This isn't of much interest for 'fans' of a language, but it's quite important for people who use a language as a tool to reach a certain goal.

If I compare Common Lisp and C++, Common Lisp is more simply in every area: Creating useful macros in C++ is much more difficult using cpp and templates than using Lisp-Macros (which is really a piece of cake compared to C++). Closures are also quite simple, in fact they are very similar to objects (just imagine the captured environment as fields of a class which is created on the fly). What's left? Maybe the syntax, but after you got accustomed to it (may take a week or two) it's hardly a problem anymore (especially if you compare it to the mess, heavy duty C++ template usage can create). In Lisp you don't have to care about memory allocation, you don't have to worry about pointers, you have simple access to basic data structures like lists etc. So if we only count language simplicity I would always choose Common Lisp before C++. So please, Lisp users: If you really think that your language is so difficult, please tell me the reason, why. I simply don't see it.

And Smalltalk? Yes, I've used it (some years ago). It's very simple to learn but like every language you need to learn all the patterns and idioms which takes a lot more time than learning the language itself. The main reason I've not continued using it were: Smalltalk-environments where proprietary and quite expensive at this time. There were only really ugly looking Smalltalky GUI-libs. It was hard to impossible to create stand-alone applications. And the resulting code was much to slow compared to C++ (which I primarily used at this time). Some of those things have changed in the meantime, but if you compare it with Java, Smalltalk is still behind (and it's old. I have the theory that languages make a breakthrough relatively fast or never - and Smalltalk's time is simply gone by).

"What about Ruby. It's slow but it's still a success"

Ruby is still far from being mainstream. The only reason for it's success in a very small domain is 'Ruby on Rails'. And again it's not the language which created the success, it's a tool (a framework in this case). But I think that this still isn't enough in the long term. While RoR can really give you some productivity benefit, that only works in it's small domain. And because of Rubys poor performance it also don't scales well. While using RoR for small scale web-apps can be quite successful, this is only a marginal market. And because the RoR crowd is quite 'loud', it may give wrong impressions on the real spread of RoR usage.

Something similar is true for Python. It's successfully in certain domains, but it's still far from really 'big'. Python also have a big performance problem which limits its success. And while Python IDEs of course exists, they're still far away from Java ones. But the main problem of Python is it's poor performance.

And remember: The perception created by all those bloggers and web-discussions can really lead one astray. Those people (me included) are not 'common programmers'. They do this stuff not only as job, but also as a hobby. And because of this, they are simply 'louder' than the broad masses of normal users who only read about it and do it primarily for work. So academic and hobby users simply have more time and fun to write about the topic and because those people are much more likely to try out new things instead of using the 'boring old stuff', it creates the impression that certain new things (which appeal more to a 'geeky' audience) are more successful than they are in reality.

"Use Erlang"

Some people mentioned Erlang. I think it's to early to talk about it. Erlang really shines in the moment you need massive concurrency - but this isn't really important in most domains yet. But it will change in the coming years and so maybe Erlang will become a 'Big Thing'. But we still need good IDEs and better performance until then: Whats the use of scaling an app to 1000 processors if the result can hardly compete performance wise with a C++ solution running on only 10 processors? Remember, that Erlang is quite slow as long as the program isn't using massive numbers of threads.

"Java isn't slow"

No, it's not slow, but it is still slower than C++. Even if it's in a relative close range. From my tests and other benchmarks (like the shootout) I think that well optimized Java code is about 1.5-2 times slower than well optimized C++ code in general. In certain areas Java can be faster than C++, but I consider this as an exception which seldom shows in a bigger program.

In Java the compiler (the JVM) has to do more runtime checking (Array overflows, Null refs, maintain stack-frames, etc) and this simply shows. Also because of the GC and the jit compilation it needs more memory which reduces performance too. The difference is not that big, but in combination of long startup-times and higher memory consumption, it's still enough to give Java a clear disadvantage in certain domains compared to C++.

"Language XYZ is much better than Java if you want to do ABC"

I know that most existing languages have their niche. But I'm not talking about those niches, I'm talking about the mainstream, about the 'Big Thing'. But Mainstream is something which is 'bad by definition' for certain people, so I'm sure, that every language which reaches the mainstream will always be bashed by those people. Just because it's mainstream and because bashing the mainstream let many people feel superior. Also if something becomes mainstream, it attracts lots of bad programmers. Like every kind of successful 'underground' music-style attracts bad musicians in the moment it starts to become mainstream which then start to produce more and more crap.

"IDEs and libraries followed Java, not the other way around"

Java 1 wasn't really used much and Java really started to catch on maybe from 1.2 upwards. The original idea of sun was to use Java applets to revive the server market which was crumbling because the desktop PC seemed to win. This haven't worked out (main reason was MS which fought back) and so they decided to switch to the server market instead (because of the now more and more successful Internet). Now Java could really start to gain market share and this was around 1999 with Java 1.2. At this time good Java IDEs where already available (NetBeans and VS integration) and IDEA (which later was copied heavily by Eclipse and Netbeans) was on the market around 2000. Creating an IDE for a 'new language' was of course risky, but if you want to make money finding niches is a good idea, even if you always have the risk that the language won't catch on as hoped.

But I think that Sun did a very good job in "bootstrapping Java". They provided comprehensive libs, tools, documentation etc. People or companies who want to make their language a success should take a good look at it.

"The real reason are network effects"

True, this can lead to the eventual success of a language and it's of course necessary in the end. But to make a language a success we first need tools, libs and docs. All those can created by several methods: By the language creator, by the community or by a 3rd party which is interested in promoting the language. Independent developed languages mostly use the 'community method' because the creator of the language isn't able to create all the necessary infrastructure himself. But first you need to have a community and for this you need something to attract them. And you need to attract the 'right' community. Having one which creates lots of 'tempest in a teacup' may work to let people notice your language, but in the end you need a community which actively creates libs, tools and docs or your language will fail.

But in the end network effects aren't so important for programming languages. It's much more important for libraries, interfaces, operating systems or hardware. A programming language is highly invisible: If you have created a product, nobody really cares in which language it was written as long as it does what it should do. Now you may argue, that network effects are more important in maintenance of a program. But I doubt this: Maintaining a product requires often much more time for the maintenance programmer to get to know the program good enough do do its maintenance thing, than it would take to get to learn a new programming language.

"Lisp and Smalltalk have IDEs too"

Sure. I've never implied, that a IDE is the one and only thing required to make a language successful. But it can help a lot. How much depends on the language. For Java an IDE is probably more important than for Ruby - but even Ruby can really profit from a good IDE. That Lisp and Smalltalk haven't succeeded has other reasons.

"Java isn't that productive"

Those big frameworks like J2EE may look a over-engineered today, but remember the time those frameworks where created. There were reasons for it. Of course many mistakes have been made, but that's because this stuff was quite new at those times. Now you may say 'Look how much crap it all is', but this is always easy if you have the advantage of knowing the outcome. Also many of the concepts of worked out well - and those are now integrated into many newer systems, Java based or based on other languages. Of course we've not reached end of history. There will be new systems, new languages and new frameworks, and people in 10 years will laugh about the stuff we call 'high-level' today. Thats natural.

Java was quite productive in the past and it still is. Sure, in certain domains you can even be more productive, for example by using RoR. But what if you expect the system to grow? Can you still recommend RoR then? What if you have to integrate legacy applications or huge data-bases which can't be easily refactored? Then RoR isn't as useful anymore. Similar things are true for many other systems. Java may be the least common denominator, but it works in a very broad range of domains. You may find a better solution in each of those domains, but what if your project touches more than one of those? Or if you're not sure, how you have to scale and extend if afterwards? So it's totally understandable if people choose to use solutions which may be not optimal in one domain but can work in all directions you have to travel later without the need to reimplementing the whole system (that's also the main reason why many people like platform independence so much).

There are lots of 'benchmarks' which try to measure productivity. But all those have one thing in common: They use simple toy examples instead of big real-world ones. And because of this they aren't useful if you need to decide what language to use for your big real-world project.

And people don't like to use many languages at once. It makes life harder and reduce productivity because of all of the additional interfacing you have to do. So if you start a project you choose the language which is overall the best fit and not one which is optimal only for a certain part of the project.

"There are big sites which don't use Java"

Sure. Why not. But having a big site doesn't mean that the language used to implement it has to be mainstream. It's like saying "XYZ has won the Formula 1-WM so XYZ must be the best car manufacturer".

"[This article] explains exactly why people use inferior programming languages, even though the intent was to say that they don't."

The intent of the article was to show that choosing a programming language not only depends on the language, but also on the whole infrastructure. Only looking at the language is like only looking at the engine if you want to choose the right car. So you can buy a car with an inferior engine and still get the best overall product. So people may use an inferior language but still use the superior system.

"People only use Java because they are forced to"

While I have to use Java for certain things because of external constraints in the moment, I'm free to choose the language I want for lots of other things. But despite having tried out lots of different languages, I always fall back to Java. I don't even like the language much because of all the well known limitations, but the system as a whole is still hard to beat. This may of course change in the future. Maybe even tomorrow. But I'm talking about the 'now and here', not about promises which may or may not be fulfilled sometimes.

"It's all the fault of those evil managers"

This is a very subjective view. It's true in certain very big companies, but loses its validity in smaller ones. Of course the 'manager' is always the 'bad guy' to a programmer. Like parents are often the 'bad guys' to the children because they have to order them around. But companies have hierarchical structures because they work best (at least until now, otherwise other structures had succeeded in the past). And this means that nearly every person in a company is ordered around by some other person. And of course mistakes will be made in this process. But everyone who knows it better has the freedom to build his/her own company. Just do it and you will start to experience constraints you may never thought of, when you worked as a programmer. Managers have those constraints too and they also have to live with them.

There are ways to overcome this, but this means that programmers have to 'face reality' and can't only concentrate on programming anymore. But this would also make them less productive as programmers and in the end this may be the reason why we have this separation between management and development. It may be a bad solution, but other solutions seems to work out even worse.

"The article is badly written, the grammar is poor"

Sorry, but I'm not going to write my articles in German and hire a professional translator to translate them into good English. I do my best, incl. using a spell checker to correct mistakes, but English really isn't a language which is easy to master (like probably every natural language). I would really like to see Esperanto or Lojban as 'world-language' instead (to level out the playing field), but in the moment it's English and the native speakers simply have to bear with us non-natives.

To come to an end: It may be possible, that in this moment, Java isn't the most productive language anymore. We will only know this in some years, but it's impossible to know today. And yes, in certain domains Java is for sure not the most productive language - and maybe never was. But that's not the point. The point is that most companies and programmers use Java, because it is (or at least was in the near past) the best overall solution for most programming tasks. If you really think that 'your' language is the 'next big thing' then ask yourself the following questions:

Can my language compete with Java productivity wise if Java is used by a good Java developer using all the available tools and libs?

Can my language compete with Java not only in a single domain, but in most domains important to general programming today?

Has my language comprehensive, easy available documentation which is directed at everyday-use and not only for beginner or 'esoteric' stuff? Which has both concise parts for reference purposes and also explains concrete usage for learning purposes.

Has my language a GUI lib which enables me to creates good looking GUIs which can compete with professional software at least on the main platforms (Windows, Mac, Linux - ordered by importance)?

Can I write code which is fast enough to use my program on a bigger scale or use it on the desktop.

If one of the answers is 'no', 'your' language can still become a success, but if there are more no's, it's highly improbable, even if the language is full of cute concepts, has a beautiful syntax and is a pleasure to use.

Monday, March 05, 2007

Why do most people seem to use inferior programming-languages?

If you read discussions about programming languages, one topic is quite common: "Why do people use an inferior language like Java/C++/what-the-hell and not a superior language like Lisp/Haskell/Python/you-name-it"?

The reason is: The language is only a small piece a the big scheme. Todays languages aren't that different at all, productivity wise. While some things have a certain impact (having a garbage collector for example), most things in the language don't really matter in practice to a big degree and are largely overshadowed by the real important things: The availability of tools, libraries, documentation and the performance of the resulting code.

Imagine two competing car-companies A and B. A decides to invest in better automation buying robots, while B shys away from it because if the initial investment and continues to use human workers. After 2 years the investment of A starts to pay out: Productivity increases, the quality of the products gets better and the production costs are reduced. To stay competitive, B now simply has to catch up or lose market share and eventually even disappear.

I don't see a reason why software production is different: If company A can be more productive with a certain language, then the competition has to catch up or lose. That's how capitalism works. As long as we have relativly free markets, all companies have to strife for the best tools to do a job. And this really shows: Not so much time ago, most programming was done in C and later C++. But today Java is the most used language. The industry is obviously able to change to a different language if it really gives an advantage. But why has it moved to Java and not to Lisp, Haskell or Smalltalk?

There are some people who seem to think that most people simply are to stupid to grasp the complexity of those languages and thus using the more simple Java. But this is a fallacy, because Lisp and Smalltalk's simply are not more difficult to use. In fact Smalltalk is a much more easy to use language than Java. You can learn Smalltalk in a day and be quite productive after 2 weeks, something which is impossible in Java.

And Lisp can be quite simple, too (at least in the 'Scheme flavor'). Especially if we take a look at C++, the most used language not long ago, it becomes obvious that simplicity has nothing to do with it: Even Common Lisp is a piece of cake compared to C++. Not only the language is more easy to use, also the concepts used by the language are more simple (People who can use templates won't have problems with Lisp-macros after a short time of accustomization for example).

So if simplicity was really the reason to choose a certain language, nobody would ever have used C++ and the world would probably program in Scheme or Smalltalk today.

The next fallacy always heard of is the stupid manager who choose a language based on the 'marked leader'. But again this doesn't seem to be correct: When Sun created Java, it was no big market leader. The leader was Microsoft and MS used C++ and VB. But still Java made an impact and became the 'big thing'. MS has to actively fight against it, but still partly failed and had to jump on the bandwagon later by creating it's own 'Java' to get market share back. So it's quite short-sighted to use conspiracy theories where the 'market-leaders' make out which the 'next big thing' is and everybody else has to follow. If only a single company would abandon this 'agreement' and start to use a more productive tool, the others have no choice than to catch up, like our car-company B above.

The real reason why Java is still the number one language in the market is that it gives the highest overall productivity. Of course this isn't true in all areas, but it has to be true in the areas where the majority of developers work. But what are the reason for this?

I look at many programming languages and also try out a lot. I've done this for years. And still I use Java as my primary language. You can call me a stupid Java-Joe or whatever you like, but then you will overlook the real reason why I and many others still prefer Java despite the fact that they know most of those new, fancy, shining languages.

Before I used Java, my primary language was C++. I've switched to Java relatively late when version 1.3 came out. But the main reason I switched wasn't the language. I've looked at Java directly after it becomes available - and found it horrible. But years later I tried out JetBrains IDEA and it was unbelievable how much productivity this IDE gave me compared to VC++ I used for C++ development. While VC++ had 'Wizards' for code generation, those Wizards only created 'code-skeletons' which were only useful as a starting point but are no big things over the whole live cycle of a project. But in IDEA all those little helpers were usable in all phases of development. Many of the uglities the Java language has simply disappeared when I used this IDE, and I suddenly could concentrate on the problem instead of fighting with the language.

But maybe the biggest advantage was that the IDE removed the 'documentation-problem' I've to face in almost every other language. If you write code you constantly have to remember names, parameter order etc. By using IDEA with Java this wasn't necessary anymore. The very intelligent code-completion and the integrated documentation-lookup removed the necessity to remember all those things. Finding the right method or class was most often only a key-press away. And because of the real-time error-checking compile errors were nearly eliminated, too. In the end IDEA (and later also Eclipse) made Java a completely different language. Yes, the IDE matters. A lot. Just consider to write Lisp-Code without a REPL or Smalltalk-Code without the Smalltalk-Browser if you're a user of one of those languages.

So the real reason why I switched over to Java was not the language itself, it was the combination of the language, the IDE and the available, well documented and comprehensive libraries. In fact the language was the least important one of those reasons. This hasn't changed: Java without the tools is total crap. Most languages are better. But Java is so 'IDE-friendly' that it was possible to create an IDE like IDEA.

Not every language allows this. The dynamic ones like Python, Ruby, Lisp and Smalltalk always have the problem that it's mostly undecidable at compile-time which methods are available for a certain variable. But hey, that's not a bug, that's the big feature. It's the reason those languages are called 'dynamic', and an Java-IDE is in principle nothing else than an interactive extension to the compiler.

But there are other problems in certain languages. For example Haskell: It's a really nice language to create small, cute programs. The language is like a puzzle, it makes your brain hurt - but in a good way, like your muscles 'hurt good' after a good workout. But if I work to build a house, muscle-ache from to much lifting heavy stuff isn't that funny anymore. And the same is true for doing 'real stuff' in Haskell: Just look at the code of a big project implemented in Haskell, like the sources of the GHC. I don't see 'beauty' anymore in this code. It's hard to read and it's BIG. The type-checker of GHC alone is nearly as big as the whole code of Hugs - but Hugs is written in C while GHC is written in Haskell. Sure, GHC has some more extensions compared to Hugs, but this still should make you think. Also Haskell has a big documentation problem: The Haskell docs are absolutely insufficient on many levels. This alone is enough to ensure that Haskell will never become mainstream unless it changes.

And for other languages there is the performance problem. Yes, performance does matter. People sometimes spend hundreds of bucks to get a 50% performance increase of their hardware. Do you really think they wouldn't care if a program is 100 times slower because of using a certain language? Sure, if you write a script which is used once a week, it really doesn't matter it it takes 0.1sec or 10sec to finish. But if you look at web-apps, a factor of 100 can easily cost millions because you need to buy and maintain 100 times the number of servers.

For desktop application even Java is often still ruled out compared to C++ because of performance reasons. And Java is in average only 1.5-2 times slower than C++! So how could a language like Python or Ruby compete here? The answer is: They simply can't. You may use them in parts of a project, for example for scripting, but the majority work is still done in one of those boring 'previous-generation' languages. And again it's not because the developers are stupid, it's because nobody would buy a program which is as slow as if you made a step back in time to a 486, even if you have invested in a modern dual-core CPU.

And there are the libraries. Libs can make or break a language. And one of the most important one is the dreaded 'GUI-lib': If you want to create a desktop application and a language has no GUI-libs which allows to create good-looking(!!) and easy to use GUIs, the language is ruled out. And since most developers have to create GUI-apps from time to time and don't like to switch languages over and over again, no or only poor GUI-libs often means that the language won't be considered, even for projects which don't seems to need a GUI. This may be inconceivable for some people, but it's a fact of life: A language without good GUI-libs will never become mainstream. This also nearly broke Java's neck, because AWT was horrible and Swing took it's time to become useful, but with 1.3 Swing was decent and from 1.4 up it was good enough (if you use 3rd party look&feel libraries) for many purposes. But even if Swing is relativly good, its deficiencies (especially the non-native looks) still hinders Javas breakthrough on the desktop.

So even the best language can become totally useless if it lacks on of those: Competitive speed, good documentation, comprehensive libraries, necessary tools (like IDEs). If language fails to address those points (and really all of them) it won't become the 'next big thing'.

That's the big secret. Nothing more. Create a language as beautiful as you want. But without the necessary libs and comprehensive documentation: No chance. Slow: No chance. Not competitive if compared to other languages including tools like IDEs: Again, no chance.

So if you really want that your favorite language becomes more used or even mainstream: Help to solve those problems. Lamenting in blog-posts or online-discussions is as useless as spending lots of time writing tutorials explaining concepts from 'your' language to the world. The best tutorial and the most cunning style of discussion is useless in the moment a potential user discovers that there is no library for a certain problem he has to solve (like creating a GUI), or if the novice has to face incomprehensible and lacking documentation in the moment he/she tries to do something real with the language.

[Update:] I've written a follow-up article which discusses some comments and objections to this article. You can read it here .

Friday, February 16, 2007

Understanding Monads. For real.

Again an article about the "what are monads?" topic? Aren't there enough on the net already? Maybe. But since I've read a lot of them and still had problems to really 'grok' the concept, I suspect that other may have those problems too.

I'm not a mathematician. I've studied physics and while I learned a lot mathematics in the process it's a totally different way of learning mathematics than directly studying mathematics: In physics mathematics is an important tool which always have to be backed by reality. If you simply calculate something strictly by using mathematical rules you often get physically nonsensical results. So you always need to have some 'image of reality' back in your mind.

In mathematics on the other hand those 'images' are less important and sometimes even counter-productive. If an image is to much tied to our image of the world, it can be quite limiting. Using such an image can even prevent finding more uses for a certain abstraction. So mathematicians have to learn to work with the abstractions itself without using an image, because every image could limit the use of an abstraction.

But if we want to apply mathematics in some domain we need the image back. For a mathematician listing the monadic rules may be enough to know about the structure which is created by those rules, but for non-mathematicians which haven't been trained to think this way, it's not. And after all there really is an image which shows what a monad (as used in functional programming) really is:

A monad is like a 'macro': All it does is a code-transformation.

I know, this notion has been used in some articles about the topic, but often only casually along the way. And if you're a mathematician this is really nothing more than 'uninteresting applicative stuff', but if you're a programmer which has to use monads in some real application, you simply need something more real, something you can relate to. Just a set of mathematical rules isn't enough. So why not think of a monad as some kind of 'functional-macro'? As programmer this should be a rather familiar picture. Lets elaborate a bit on this:

What is a macro? It's something which get some data and dices and splices them. But before it can do its dicing and splicing, there need to be some slicing - which is done by the macro-processor.

Let's look for example at the C-preprocessor as an example of a very primitive (but widely used) macro facility:


 #define m1(x, y) x y y
 m1(a, + 1)

Now m1 is read here by the preprocessor which recognizes it as a macro and slices the following text between the parenthesis at the ',' and feeds those text-slices into the macro. This creates the new text ("a + 1 + 1") from it which is then feed back into the preprocessor as new input (so that macro expansion can happen again on the resulting text).

In Lisp it's a bit more sophisticated because the macro processor works on s-exprs and not on plain text (yes, I know about reader macros, but let's concentrate on standard macros here). If we have a piece of code like:


(some-function (m2 x (+ y 1)) z)

and 'm2' is a macro, then the macro-processor does some slicing. The resulting pieces are s-exprs, namely "x" and "(+ y 1)" which are then feed into the macro 'm2'. The result is then put there where the (m2 ...) was and is evaluated again.

And in Haskell? There the 'slicer' is the 'do-statement'. If we write


do
  v1 <- stmt1   
  stmt2    
  v2 <- stmt3    
  stmt4

then the compiler slices the content into separate statements and wraps them into closures. It then put function calls around it (using a function called 'bind' or in '>>=' in Haskell). The result of this transformation is than again used as new input.

The difference to the other macro processors is that the evaluation (the 'splicing and dicing') is done by the 'bind' function at runtime now. This has the advantage that the type checker has ran and by using type information the bind-function can be overloaded. So instead of providing an explicit name for the macro like in the C and Lisp examples above, the concrete macro is chosen by the type of the first statement in the do-block now.

And how can it be assured that the whole block is translated by the same macro? That's the task of the type-checker: By giving the bind-function the type


 bind(M<a>, A -> M<b>) -> M<b>

the result of a bind-function has to be of the same type as the input. The monad can be parametrized by a parameter and this parameter can be changed by the bind-function. But the monad is still the same. This ensures that all statements are evaluated 'in the same monad' - or in other words: That all statements in a do-block are subject to the same kind of code transformation.

The type 'M' is often also called 'the monad'. And it's reasonable: In Lisp the macro is chosen only by a name and so we would call the above the 'm2-macro'. But in Haskell the choosing is done by type and thus the type of the monad gives the macro/monad its name. Thus List-monad, Maybe-monad, etc. But the monad isn't just the type, its also the implementation of the bind function (and there is also have to be a 'return' function), because without a bind-function it simply wouldn't do anything. The combination of type, bind and return functions together are needed to build a macro - and so all those things together are called 'a monad' (like the Lisp macro is not only the name of the macro but also the code which does the code transformation).

So that's it: A monad is a code-transformer. Like a macro. It's simple to remember: Both names even start with a 'M'.

While the fundamental entity of the C preprocessor is simple text, in Lisp it's a s-expr. And in Haskell this fundamental entity is a function. Everything in pure functional programming can be expressed by it. And so it's no wonder that in Haskell 'macros' get functions as input. That's all the magic. Of course we could also use the bind function directly and write code using '>>=' manually, but 'do' makes it often much more easy to read and write the code.

I put the '' around the word 'macro' because Monads are not really macros by definition. So if you want to be picky you can find reasons why this picture isn't totally correct. But does this really matter? Or isn't it more important to have a useful way of thinking if we deal with an abstraction?

What are those differences? The main one is that input to 'real' macros hasn't to be valid code. The C-preprocessor accepts everything which is text and Lisp macros accepts all kinds of s-exprs. But monads only work on valid Haskell-code. They can only change the runtime semantics of it. And the syntax itself isn't changeable too, because one always has to obey the syntax of the do block. So a monad is not a macro in the sense that you can create a real new syntax (like in Lisp). You can only create new semantics.

Also all kinds of macros have their limitations. Lisp macros sometimes need a code-walker, the C preprocessor is quite limited, too and also Monads have their limits.

And those limits are in fact quite severe which lead to the invention of more able constructs like 'Arrows'. The core of limitation is that a bind-function can't look '
'into' its second argument. It we have a do statement the above:


do
  v1 <- stmt1    
  stmt2    
  v2 <- stmt3    
  stmt4

the compiler transforms it into the following code ("\v -> ..." is the syntax for a closures here):


bind(stmt1,
 \v1 -> bind(stmt2,
   \_ -> bind(stmt3,
     \v2 -> stmt4)))

(The '_' parameter is used in 'dummy-assignments' which are created if we used no explicit assignment) [Edit: Corrected error]

If we now look at the first 'bind', it takes 'stmt1' and a closure. Now this bind can do lots of things depending on the result value of stmt1 but if has no clue, what its second parameter (the closure) returns until it evaluates it. And thus it has no possibility to look into the later bind-functions. This is a severe limitation: It's impossible to create a transformation which analyzes the whole do-block before evaluating it.

So it's for example impossible to create an LALR-monad which transforms it's statements into a LALR-parser. It seems possible to simply define those statements as 'actions' which return a value to instruct the bind-functions to build the right code, but this would be quite limited because we can't add semantic actions this way: The result of the evaluation of the second bind-parameter has not only to contain the monad itself, but also the result of the semantic action of the parser. And this is only possible if we evaluate both in one turn.

The next problem is binding values to 'variables'. Monads simply use the normal functional way of doing binding via lambdas. The 'var <- ...' syntax simply creates a closure with 'var' as parameter which is then visible in all the levels 'below'. This works fine if the structure of the resulting code is similar to the structure of the input code, but it makes it for example impossible to transform a do-block into code which executes backwards from the last statement to the first one.

So while monads are quite powerful to create certain kinds of abstractions (= code transformations) they can't do everything. But nonetheless, that's really it: Monads are code transformations. This is also the reason why monads seem to be a bit difficult to understand: Each monad create a new language. The Maybe-monad creates a simple 'first-fail-fails-the-whole-computation'-language, the List-monad creates a simple 'try-out all combinations'-language, the State-monad a 'Haskell-with-mutation'-language, etc.

The transformations necessary from ordinary 'one statement after each other'-form written in a do-block to the resulting code can by quite difficult to comprehend: Simply because such a code transformation can be quite different from the input code in the end.

And because we have to learn new language semantics for every new monad, it's no wonder that the concept seems to be a bit hard to grasp. We may know the semantics of basic Haskell, but for every monad we have to learn a new language. Again and again. And if we don't even know that a monad creates a new language, understanding this becomes much more difficult, too.

But at least this shouldn't be a problem anymore now.

To deepen our image let's now look at some examples now. I will use only basic Haskell syntax here or even 'Algol-like' syntax to make it better understandable.

Some of the most simple monad is the 'Maybe monad'. Lets write something like


do
   x <- func1 ...
   y <- func2 ...
   func3 ...
   return (x + y)

Here all three functions 'func1, ..., func3' should returns a 'Maybe' value which can be 'Just x' or 'Nothing'. Because 'Maybe' is a monad and we've used the 'do' syntax, this code is transformed into something like this:


tmp1 = func1(...)
if isNothing(tmp1) then 
    return Nothing
else
    x = fromJust(tmp1)
    tmp2 = func2(...)
    if isNothing(tmp2) then 
        return Nothing
    else
        y = fromJust(tmp2)
        tmp3 = func3(...)
        if isNothing(tmp3) then 
            return Nothing
        else
            return Just(x + y)
        end
    end
end

This looks a bit ugly, but shows what's happening: Each statement in the monad is transformed into a if-then-else expression in a way that the first statement which return 'Nothing' aborts the whole block and let it return nothing too.

We could also say 'The maybe monad is an abstraction to represent computations which can fail'. True, thats what macros do: Create new abstractions. Without remembering that a monad is just a kind of macro this sentence would sound quite 'arcane'. But now as we know the secret name of monads, the esoteric flair is gone and we see that they are something which we all know quite well. So the maybe-monad is nothing more than a mechanism which translate those innocent looking statements in a do-bloock above into a nested if-then-else-chain like the one below.

This works for other monads too. The list-monads transforms a linear list of statements into into nested map-functions. Sure, a mathematician may say something like 'The list monad represents non-deterministic-computation'. But in the end all it does is to transform this:


do 
   a <- [1, 2, 3]
   b <- [4, 5]
   return (a + b)

into this:


  concatMap (\a -> concatMap (\b -> [a + b]) [4, 5]) [1, 2, 3]

concatMap maps list elements over a function like the normal map, but concatenates the results to a single list. This allows to return any number of values in each invokation. [Edit: fixed error here (mistakenly used fmap instead of concatMap)].

If you're not that familiar with Haskell, the above works like this imperative code:

     
   result = []
   foreach a in [1, 2, 3]
      foreach b in [4, 5]
      result = append(result, [a + b])
      end
   end

)

But we can do more complex ones, too. One of this 'a bit more complex' transformations is the state-monad. What we want to do is something like this:


do
   value <- get
   put (value + 1)
   value <- get
   return value

Here we have 'commands' which do something which isn't possible in plain functional programming: Reading and mutating a 'state'. With 'get' we get the actual state and with 'put' we can store a new value as the current state. The state in the above example is a simple numeric value, but since we can use arbitrary structures as values too, it allows that a state consists of multiple values.

But how does this work? We all know that pure functional programming don't allow something like mutation. But to store a new state we need exactly this. To make it possible we need to chain our state thru all relevant function calls. This can be done like this:


function get(state) 
    state
end

function put(state, value)
    value
end

let (state', value) = get(state) in
  let (state'', _) = put(state', value + 1) in
    let (state''', value') = get(state'') in
      value'

Wow, this looks ugly. We need a new 'fresh' name for each new version of 'state' and 'value' and we also have to chain it manually thru the calls of put and get. But this method allows the simulation of mutation without requiring real mutation.

To make this more readable we can now use a monad to transform the original code to the code above. We can't do the calculation directly this time because the in the example below, the code first needs some initial 'state'-value which is than chained thru the calls. So instead of calculating the result directly, we let the monad create a new function. This function can then take the initial state as a parameter and will call the generated code. And this creates the result then. So this is the first case of a monad doing real code-transformation.

To create a new monad we first need a new type. We simply call the type 'State':


data State st t = State (st -> (st, t))

This is out monad. It takes two type parameters: The type 'st' of the state to carry and the return type 't'. This return type can vary for each statement which is a constraint of the monad type.

The inhteresting part here is that the monad don't carries the state itself around, but a closure which takes the old state and returns a tuple of the new state and a result value. This closure is also called an 'action', because it encapsulates the action defined by the statement.

Now we create the bind and return functions for this type:


getCode (State code) = code

instance Monad (State st) where 
    (>>=) prev next = State $ \st -> let (st_next, res) = (getCode prev) st 
                                   in (getCode (next res)) st_next
       
    return v      = State $ \st -> (st, v)

The 'getCode' function simply returns the actual 'code' with is stored in our monad. 'return' is simple: It creates an action which takes a state and returns the same state and the return value. The bind function (here named '>>=') takes the previous monad 'prev' and a function 'next' which will return the next monad. It now creates an action which first evaluates the code in the prev-value with the actual state. Then it uses the result to call the 'next'-function which in turn creates the next monad in the chain. This next monad is then again evauluated, but this time with the new state, the prev-monad returned.

This chains the state thru the actions. First the initial state thru the 'first' monad creating a new state. And then this new state thru the result of the 'next'-monad, createing the final state (which is then evaluated by the calling bind-function etc.).

Now we can build out 'set' and 'get' functions. This is quite straight-forward. The 'get' simply uses the actual state-value as return value:


get :: State st st
get = State $ \st -> (st, st)

And the 'set'-function ignores the previous state and creates a new one. It also returns the state in turn to allow assignments like 'x <- set (x + 1)'. This isn't necessary but convenient.


set :: t -> State t t
set val = State $ \_ -> (val, val)

That's it. Out state-monad. Now lets create a simple do-block to use it:


test1 = do
    x <- get 
    set (x + 4)
    x <- get 
    set (x * 3)
    x <- get
    return (x-5)

Here we can see, that the first statement is a get. But where does the state come from which is returned by 'get'? Simple: If we call 'test1', we don't get the real return value, but a closure we have to evaluate first with the initial state. Let's do this:


main = (getCode test1) 0

'test1' returns a State-monad. We first have to get the code to call out of the monad by calling 'getCode' again. This code can now simply be evaluated by calling it with our initial state (a 0 in this case). As the result we will get a tupel with the value (12,7) in this case. The first value is our last state, the second is the real result (as returned by 'return (x - 5)'). Both values make sense, so our monad seems to work correctly.

Now lets take a look under the hood of the above:

The do-block above first creates the follwing code:


    bind(get, \x ->
      bind(set (x + 4), \_ ->
        bind(get, \x ->
          bind(set (x * 3), \_ ->
            bind(get, \x -> 
              return (x-5))))))

The bind function now does the real code-transformation. I've tried to write down the resulting closure if we only expand calls to bind, return, get and set, but it was simply to long and crumbersome. Lets do it instead for a simplified version of the above:


do 
   x <- get
   return x*2

this is rewritten into


   bind(get, x -> return(x*2))

which if we evaluate bind, get and return and pull out the code from the resulting monad, creates the following closure:


   \st -> 
       let (st', x) = \st -> (st, st)   -- our 'get' statement
       in (x -> (\st -> (st, x*2))) st'   -- our 'return' statement

Again we see, that the monad simply does code transformation in the end, so the image of looking at monads as code transformations holds. So even if it looks a bit wierd, in the end the state monad really does the transformation we started with.

Pages