Pages

Friday, August 17, 2007

Better exception handling

Exception handling seems to be a solved problem. Just use some kind of try/case/finally constructs and we're done. Are we? Exception handling creates several flaws and severe holes in the static checking of a program. While lots of people are researching type systems, exceptions open an ugly hole in the back of most of those systems. But if you try to do it right and type-check exceptions (like Java does) we get horrid and sometimes even hard to write constructs which blow up our code with lots of boilerplate. I tried to find a better approach to the problem.

Whats wrong with checked exceptions? Lets look at a simple example in Java (I use Java here because most languages use unchecked 'dynamic' exceptions). We want to open a file and read some data from it. If we are naive, we could think it could be implemented as

String readData(File file) {
BufferedReader in = new BufferedReader(new FileReader(file));
String data = in.readLine();
in.close();
return data;
}

Of course it's not so simple. The 'new FileReader(file)' can throw a FileNotFoundException. And since FileNotFoundException is a 'checked exception' the above code wouldn't compile - which is kind of a good thing because it prevents us to overlook the fact that the reading-process can fail and makes us think of a more sound solution.

So we have to handle this exception somehow. We may decide that this function should return a default value if there is an error opening the file. So we update our code accordingly:

String readData(File file, String default_value) {
String data;
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(file));
data = in.readLine();
in.close();
}
catch(FileNotFoundException e) {
return default_value;
}
return data;
}


Looks fine. But it still won't compile yet. First our 'readLine' could throw an IOException, so we replace our FileNotFoundException with a IOException to catch both kind of exceptions. Now it would compile - but it's still not correct. Look at the 'in.close()'. If an exception occurs in 'in.readLine()' the program will skip 'in.close()' and we may have a resource leak (sure, it will collected by the GC some time later but we want to be really correct here).

So we change our code into

String readData(File file, String default_value) {
String data;
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(file));
data = in.readLine();
}
catch(IOException e) {
return default_value;
}
finally {
if (in != null) in.close();
}
return data;
}

to make sure that in.close() will be called if an exception occurs. But again this won't compile because in.close() can throw again a checked exception.

String readData(File file, String default_value) {
String data;
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(file));
data = in.readLine();
}
catch(IOException e) {
return default_value;
}
finally {
if (in != null) {
try {
in.close();
}
catch(IOException e) {
return default_value;
}
}
}
return data;
}

Just look at the size of this code. Is it correct now? It will compile, but the behavior may be still wrong. If our 'in.readLine()' succeeded and then an error happens in 'in.close()' we get out default value even if we have some valid data. Of course this depends on the specification but to make sure that we get data in the moment even in this case we create a final version of the code:

String readData(File file, String default_value) {
String data = default_value;
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(file));
data = in.readLine();
}
catch(IOException e) {}
finally {
if (in != null) {
try {
in.close();
}
catch(IOException e) {}
}
}
return data;
}

Thats finally it. This code seems to work - and funny enough the 'catch' blocks are both empty now. Its all boilerplate. And it has required more thinking than one really wants to invest in such a simple piece of code.

So while checked exceptions give some safety because they don't let you miss a exception in your code so easy, they do create lots of hard to read and write code. Thats the reason the Java creators obviously started to question themselfs if its really a good idea and added unchecked exceptions to the language too. The most famous one (also available in most other OOP-languages) is the quite frequent 'NPE', the 'null pointer exception'. In principle every method call or field access in the language can create such an exception so Java don't require to check those explicitly. But is this really a sensible solution?

First: Every time we write code which could throw an exception we should be aware of this. Simply doing a core dump if we call a method on a null pointer would be much worse.

And for less frequent situations like opening files it may be a good idea to force programmers to think about handling of the possible error cases. The main reason someone uses a 'static typed' language is that the compiler can find runtime errors by using some typing rules. But if the compiler can only prevent 'no-method-found exceptions' while lots of other kinds of exceptions slip thru, we wonder if this is worth all the type-annotations and harder to understand language semantics. Java tried to solve this problem at least partially - but in the end the result wasn't that impressive.

But could we do it better? Or have we simply to accept the fact that exceptions can happen all over the place and we go happily back to unchecked exceptions with try/catch using lots of unit tests instead of relying on compiler checks. While some people seem to draw this conclusion, I think there is a better way: Create a language which can't throw any exceptions anymore!

Lets for example look at one of the most common exceptions, the NPE. The only reason it's possible to have such an exception is that we allow references to be 'null'. If we require that all references have to be non-null, we also don't need a NPE anymore. Ok, you may object that this is a bit to much, at least for a normal OOP language like Java, so lets tackle the problem a bit more cautiously and require that references which can be 'null' have to be explicitly declared as 'nullable' and that accesses via those references are only possible if the compiler can prove that it will be never null at the points where it matters.

In other words: If we want to call a method on a nullable reference 'ref' we have to use some if(ref != null)-expression first and handle the other case explicitly.

This isn't a new idea and would be a great extension to many static typed OOP-languages (I know that it's kind of possible in Java now by using annotations, but its far from comprehensive enough in the moment). It makes the NPE unnecessary and we would be able to remove it from the language completely. Why not think about concepts to do similar things with all kinds of exceptions?

This is of course hard and I suppose it won't work for most of the current languages. But lets start to think about it. One of the second most frequent exceptions is the "Array index out of range" exception. Can we eliminate this one, too? If we look at code which uses arrays or array-like collections, we notice that indexed access is in fact quite seldom really necessary. Most often data is appended or the array is simple iterated over. Those operations are 'index safe' because they simply don't need an explicit index. And if we look at list-handling functions in most functional languages we see that direct array access isn't seldom necessary there, so it seems possible to get around it in most cases.

But what if we really need some direct access via some index in one of the remaining cases? We can now simple require the same thing as from the above concept to remove NPEs from a language: If we want to access an array by providing an index we first have to make sure that the compiler can prove that the index is valid. Which again means that we may need to add a 'if (index < array.length) ...' check. This may look ugly but we have to check for bounds somehow anyway, so why not make it provable by the compiler?

To make this a bit more easy we can also add special annotations to method/function declarations. For example if we have a function with signature

int addTwoElements(int[] array, int index1, int index2)

and we require that both indexes are in the range of the array length, we add annotations to let the compiler know that 'index1' and 'index2' are indexes into 'array'. This could look like

int addTwoElements(int[] array, @index(array) int index1, @index(array) int index2)

and would allow the compiler to make sure that addTwoElements is called with valid parameters and we won't need to check it the function. This is a Java-esque solution and could be improved. But remember: The main idea is to write code in a way that indexed access isn't even necessary. So those annotations are only necessary for the remaining cases.

This way we can start to write programs in a way exceptions aren't possible. In most cases where a method or function can throw an exception means that we haven't thought enough about the problem. If we pull the ways a function can go wrong out of the code and put it into the type of the function we can make the compiler prove that our code has to succeed and that all necessary checking occurs before a function is called.

This is slightly similar to the concept of dependent types but it's more simple to do because the compiler only has to to the checks while the programmer creates the annotations himself. We don't require the compiler to prove everything. Where it fails to do so, we simply add a check or an annotation. Such a type system is much easier to create, to implement and to understand than a real, useful 'dependently typed' type system.

But what about the file-access problem like in the example above. File access can go wrong in any part of the operation (even in a 'close') and we somehow have to react to it. And since the success of an operation depends on the external state of the program it can't thus be proved or disproved by the compiler. So how to tackle this problem?

First idea: Just give up and add checked exceptions just for those kinds of cases where access to some some external resource can go wrong. By adding some language support (like C#'s 'using' construct or something similar) and we can even avoid most of the boilerplate code. But aren't there better solutions?

There are, but it's hard/impossible to do in a language like Java because it requires to write side-effect-free code. For example for the file access we can pull the file access 'out' of the body of a function and use lazy evaluation to process the data. We would now have a model where all I/O read all the data at once and if it fails, it fails in the read operation before the operation which uses the data. This way we would simply handle the check once here instead of distributing it to each and every 'readData' call alls over the code. But its impossible to implement it this way, first for performance reasons and second because input and output may happen simultaneously and may even interact to each other. So we do the reading and writing lazily but provide a model where it looks like it happens at once.

If now something goes wrong, the whole process is aborted and the operation is undone. The 'outer'-layer of the code detects the error and uses some ordinary if-then construct to do error handling. Since 'bubbling' of exceptions from a 'throw' to a 'catch' isn't necessary now, we can use simple and explicit ways to handle failure.

The whole code which does the operation (a complex parser for example) won't even need to know about the abortion because this is only of interest of the caller. Why is this impossible in a language like Java? Because Java could do operations with unlimited side-effects which aren't 'undoable' anymore. So at least the code which does the operation has to be pure functional or there has to be some kind of transaction-support.

This will work for many situations where we have a 'all or nothing' requirement. But what if we also want a partial result if an error occurs somewhere in between? Just add some 'error-token' to the data stream and let the operation handle it. This way we have two different kinds of IO-streams: One which will undo the whole operation if a failure happens and one which will simply provide an 'ERROR' token from the moment where some error happens but which won't undo the operation. By adding this behavior into the type of the IO-facility we have compile-time safety and still lots of flexibility.

Of course those three examples are only a little part of the huge number of exceptional situations a program can run into. To solve all the other ones we have to think about those and have to develop new idioms and patterns. But in the end it will always require that a programmer thinks a problem thru instead of relying on tests.

Bugs in programs will of course still exists. But by writing programs in the way described above (and having programming languages which support this style) we could get rid of lots of the most annoying bugs and write better and more stable programs. And by the way we could get rid of lots of ugly exception handling boilerplate code.

Wednesday, March 07, 2007

Discussion: Why do most people seem to use inferior programming-languages?

My last article created a relativly high number of comments, which I will try to discuss in this post.

The article was intended as my 'answer' to the topic which is very common in all places where discussions about programming languages takes place: Why do certain ones succeed and others not. I think this is an important question because most people who create new languages want to have some kind of success with it.

"'the market has to be right by definition.' is a wrong assumption" or "the best things seldom win"

I understand why people think this this way. They simply fail to see the 'big picture'. If you only look at certain aspects of a topic you can easily overlook important parts of the reason which are outside your field of view. This is particular true here: If you only look at programming languages by looking only at the language itself, you can easily get a wrong impression. We always have to look at the whole system, not only at a particular part of it.

A often used example is the win of VHS against Beta. Beta was technically better and still VHS succeeded. Isn't that a proof that the best thing will not automatically win? Only if you fail to see the whole picture. Technological superiority is only one part of the whole. Others are prices, license-fees, patents, available support etc. Beta failed because of Sony's licensing strategy. VHS was 'free' and could produced by companies without having to wear the straight jacked Sony tried to put them in. Whole People like 'better things', they don't pay every price for it. High-end fans may hate this, but most people simply don't care this much for quality that they will pay a much higher price for it.

So if a 'inferior' technology succeeds over a seemingly 'superior' one, then there are always good reasons for it. And believe it or not, but this this reason is never stupidity of the users.

"Companies are eager to adopt new things, but the problem is the scarcity of competent workforce."

This is the reason why it always takes some time to adopt new tools. But it can't explain why certain tools never catch on. Remember that the job market is also a market. Unemployment is a risk in most countries and people always want to get a good paid job with good working conditions. So the 'workforce' does compete for the best jobs like the companies compete for the best workers. But its always an optimization process: The average developer has to decide if it really pays to spend lots of time into learning a certain language. People always have to weight the options against each other. Of course this strongly depends on the area you're working in. It's a completely different picture if your working in academics, if you're doing programming as a hobby or if you're working in the industry. If people only look at one of those areas, they can easily come to wrong conclusions.

If now a company tries to adopt a new language, it's like the mentioned car manufacturer which starts to invest in robots: It costs a lot of money and it will take some time for the investments to pay back. Switching languages because of higher productivity also means that there is an initial higher cost for qualified workers and training. But only in the beginning. Remember that people learn their job not in school or in university, they learn it when they do their jobs. Sure, companies would really like to hire experts only, and some may even afford to pay the price for it (like Google in the moment). But thats not a model for the whole industry because most companies simply can't afford it or aren't 'attractive' enough. So companies have to do their share in training their workforce them self or they will never get the personal they need. Most companies really do this. And this is independent from the language you train people in. Some companies are even using obscure in-house languages which are totally unknown outside the company, because in certain domains this can be the most productive way of doing work - but you can't expect that they are able to find programmers with experience in this language, so they have to train them in-house.

And if you hire a 'fresh' Java-programmer for example, you can't expect that he/she is able to do good work from the start. Java may look simply to some people, but doing good work in it isn't that simple either. So the company has to do training and this is the reason why those people start to work as junior programmers which are supervised by senior programmers who know more about the job. And after some years, the junior starts to do good work and can eventually become senior himself. The reason why this seems to look different in other areas of programming (like Lisp, Haskell etc) is that companies which try to adopt a new language are often startups or companies which are in a phase of rapid growth or transition to a different market and need good workers quick. So they pay more to get the seniors directly without having to train them them selfs. But this is only a phase, it will never work for the market as a whole. And this has also happened with Java in the beginning.

And if the company succeeds because of using a different tool, their competitors will notice this and start to use this more productive tools also. This will in turn noticed by the Job marked, creating higher demand for this new language, creating more workers able do do their job in this language with lesser training. And in the end, the industry has again made a transition, even if there are of course still companies who use the old tools for various reasons.

"Lisp/etc. is simple? You must be joking."

No, I'm totally serious here. I know that there is the big fallacy that mainstream languages are more simple than less mainstream ones. This may really be true for some of them, but these are relativly rare exceptions.

Sure, to use a different language, you need time to adopt. But even switching from C++ to Java is much more difficult than certain people assume (and I suspect that those people don't know much about at least one of those languages). While the languages seem quite similar on the outside, to really master them you have to use quite different methods of programming. And if you even switch from C to Java, the step is at least as big as switching from C to Lisp or to Smalltalk. The syntactical similarities between C and Java are quite deceptive. It's a marketing gag, designed to give Java a bit more momentum in the beginning, but after you're over it, you see that both languages really haven't much in common.

Now users of certain non-mainstream languages seem to like the idea that their languages are 'to sophisticated for the masses' giving them a false sense of superiority. This way of thinking may give a good feeling, but it's nonetheless utterly wrong: People don't use Lisp because it's so difficult, they use it because they don't get their job done as good as in other languages. Because of missing tools and (especially some years ago) because of missing performance. The best language isn't useful if you have to reinvent all the wheels you can get for nothing if you use another language. This isn't of much interest for 'fans' of a language, but it's quite important for people who use a language as a tool to reach a certain goal.

If I compare Common Lisp and C++, Common Lisp is more simply in every area: Creating useful macros in C++ is much more difficult using cpp and templates than using Lisp-Macros (which is really a piece of cake compared to C++). Closures are also quite simple, in fact they are very similar to objects (just imagine the captured environment as fields of a class which is created on the fly). What's left? Maybe the syntax, but after you got accustomed to it (may take a week or two) it's hardly a problem anymore (especially if you compare it to the mess, heavy duty C++ template usage can create). In Lisp you don't have to care about memory allocation, you don't have to worry about pointers, you have simple access to basic data structures like lists etc. So if we only count language simplicity I would always choose Common Lisp before C++. So please, Lisp users: If you really think that your language is so difficult, please tell me the reason, why. I simply don't see it.

And Smalltalk? Yes, I've used it (some years ago). It's very simple to learn but like every language you need to learn all the patterns and idioms which takes a lot more time than learning the language itself. The main reason I've not continued using it were: Smalltalk-environments where proprietary and quite expensive at this time. There were only really ugly looking Smalltalky GUI-libs. It was hard to impossible to create stand-alone applications. And the resulting code was much to slow compared to C++ (which I primarily used at this time). Some of those things have changed in the meantime, but if you compare it with Java, Smalltalk is still behind (and it's old. I have the theory that languages make a breakthrough relatively fast or never - and Smalltalk's time is simply gone by).

"What about Ruby. It's slow but it's still a success"

Ruby is still far from being mainstream. The only reason for it's success in a very small domain is 'Ruby on Rails'. And again it's not the language which created the success, it's a tool (a framework in this case). But I think that this still isn't enough in the long term. While RoR can really give you some productivity benefit, that only works in it's small domain. And because of Rubys poor performance it also don't scales well. While using RoR for small scale web-apps can be quite successful, this is only a marginal market. And because the RoR crowd is quite 'loud', it may give wrong impressions on the real spread of RoR usage.

Something similar is true for Python. It's successfully in certain domains, but it's still far from really 'big'. Python also have a big performance problem which limits its success. And while Python IDEs of course exists, they're still far away from Java ones. But the main problem of Python is it's poor performance.

And remember: The perception created by all those bloggers and web-discussions can really lead one astray. Those people (me included) are not 'common programmers'. They do this stuff not only as job, but also as a hobby. And because of this, they are simply 'louder' than the broad masses of normal users who only read about it and do it primarily for work. So academic and hobby users simply have more time and fun to write about the topic and because those people are much more likely to try out new things instead of using the 'boring old stuff', it creates the impression that certain new things (which appeal more to a 'geeky' audience) are more successful than they are in reality.

"Use Erlang"

Some people mentioned Erlang. I think it's to early to talk about it. Erlang really shines in the moment you need massive concurrency - but this isn't really important in most domains yet. But it will change in the coming years and so maybe Erlang will become a 'Big Thing'. But we still need good IDEs and better performance until then: Whats the use of scaling an app to 1000 processors if the result can hardly compete performance wise with a C++ solution running on only 10 processors? Remember, that Erlang is quite slow as long as the program isn't using massive numbers of threads.

"Java isn't slow"

No, it's not slow, but it is still slower than C++. Even if it's in a relative close range. From my tests and other benchmarks (like the shootout) I think that well optimized Java code is about 1.5-2 times slower than well optimized C++ code in general. In certain areas Java can be faster than C++, but I consider this as an exception which seldom shows in a bigger program.

In Java the compiler (the JVM) has to do more runtime checking (Array overflows, Null refs, maintain stack-frames, etc) and this simply shows. Also because of the GC and the jit compilation it needs more memory which reduces performance too. The difference is not that big, but in combination of long startup-times and higher memory consumption, it's still enough to give Java a clear disadvantage in certain domains compared to C++.


"Language XYZ is much better than Java if you want to do ABC"

I know that most existing languages have their niche. But I'm not talking about those niches, I'm talking about the mainstream, about the 'Big Thing'. But Mainstream is something which is 'bad by definition' for certain people, so I'm sure, that every language which reaches the mainstream will always be bashed by those people. Just because it's mainstream and because bashing the mainstream let many people feel superior. Also if something becomes mainstream, it attracts lots of bad programmers. Like every kind of successful 'underground' music-style attracts bad musicians in the moment it starts to become mainstream which then start to produce more and more crap.


"IDEs and libraries followed Java, not the other way around"

Java 1 wasn't really used much and Java really started to catch on maybe from 1.2 upwards. The original idea of sun was to use Java applets to revive the server market which was crumbling because the desktop PC seemed to win. This haven't worked out (main reason was MS which fought back) and so they decided to switch to the server market instead (because of the now more and more successful Internet). Now Java could really start to gain market share and this was around 1999 with Java 1.2. At this time good Java IDEs where already available (NetBeans and VS integration) and IDEA (which later was copied heavily by Eclipse and Netbeans) was on the market around 2000. Creating an IDE for a 'new language' was of course risky, but if you want to make money finding niches is a good idea, even if you always have the risk that the language won't catch on as hoped.

But I think that Sun did a very good job in "bootstrapping Java". They provided comprehensive libs, tools, documentation etc. People or companies who want to make their language a success should take a good look at it.


"The real reason are network effects"

True, this can lead to the eventual success of a language and it's of course necessary in the end. But to make a language a success we first need tools, libs and docs. All those can created by several methods: By the language creator, by the community or by a 3rd party which is interested in promoting the language. Independent developed languages mostly use the 'community method' because the creator of the language isn't able to create all the necessary infrastructure himself. But first you need to have a community and for this you need something to attract them. And you need to attract the 'right' community. Having one which creates lots of 'tempest in a teacup' may work to let people notice your language, but in the end you need a community which actively creates libs, tools and docs or your language will fail.

But in the end network effects aren't so important for programming languages. It's much more important for libraries, interfaces, operating systems or hardware. A programming language is highly invisible: If you have created a product, nobody really cares in which language it was written as long as it does what it should do. Now you may argue, that network effects are more important in maintenance of a program. But I doubt this: Maintaining a product requires often much more time for the maintenance programmer to get to know the program good enough do do its maintenance thing, than it would take to get to learn a new programming language.

"Lisp and Smalltalk have IDEs too"

Sure. I've never implied, that a IDE is the one and only thing required to make a language successful. But it can help a lot. How much depends on the language. For Java an IDE is probably more important than for Ruby - but even Ruby can really profit from a good IDE. That Lisp and Smalltalk haven't succeeded has other reasons.

"Java isn't that productive"

Those big frameworks like J2EE may look a over-engineered today, but remember the time those frameworks where created. There were reasons for it. Of course many mistakes have been made, but that's because this stuff was quite new at those times. Now you may say 'Look how much crap it all is', but this is always easy if you have the advantage of knowing the outcome. Also many of the concepts of worked out well - and those are now integrated into many newer systems, Java based or based on other languages. Of course we've not reached end of history. There will be new systems, new languages and new frameworks, and people in 10 years will laugh about the stuff we call 'high-level' today. Thats natural.

Java was quite productive in the past and it still is. Sure, in certain domains you can even be more productive, for example by using RoR. But what if you expect the system to grow? Can you still recommend RoR then? What if you have to integrate legacy applications or huge data-bases which can't be easily refactored? Then RoR isn't as useful anymore. Similar things are true for many other systems. Java may be the least common denominator, but it works in a very broad range of domains. You may find a better solution in each of those domains, but what if your project touches more than one of those? Or if you're not sure, how you have to scale and extend if afterwards? So it's totally understandable if people choose to use solutions which may be not optimal in one domain but can work in all directions you have to travel later without the need to reimplementing the whole system (that's also the main reason why many people like platform independence so much).

There are lots of 'benchmarks' which try to measure productivity. But all those have one thing in common: They use simple toy examples instead of big real-world ones. And because of this they aren't useful if you need to decide what language to use for your big real-world project.

And people don't like to use many languages at once. It makes life harder and reduce productivity because of all of the additional interfacing you have to do. So if you start a project you choose the language which is overall the best fit and not one which is optimal only for a certain part of the project.

"There are big sites which don't use Java"

Sure. Why not. But having a big site doesn't mean that the language used to implement it has to be mainstream. It's like saying "XYZ has won the Formula 1-WM so XYZ must be the best car manufacturer".


"[This article] explains exactly why people use inferior programming languages, even though the intent was to say that they don't."

The intent of the article was to show that choosing a programming language not only depends on the language, but also on the whole infrastructure. Only looking at the language is like only looking at the engine if you want to choose the right car. So you can buy a car with an inferior engine and still get the best overall product. So people may use an inferior language but still use the superior system.


"People only use Java because they are forced to"

While I have to use Java for certain things because of external constraints in the moment, I'm free to choose the language I want for lots of other things. But despite having tried out lots of different languages, I always fall back to Java. I don't even like the language much because of all the well known limitations, but the system as a whole is still hard to beat. This may of course change in the future. Maybe even tomorrow. But I'm talking about the 'now and here', not about promises which may or may not be fulfilled sometimes.

"It's all the fault of those evil managers"

This is a very subjective view. It's true in certain very big companies, but loses its validity in smaller ones. Of course the 'manager' is always the 'bad guy' to a programmer. Like parents are often the 'bad guys' to the children because they have to order them around. But companies have hierarchical structures because they work best (at least until now, otherwise other structures had succeeded in the past). And this means that nearly every person in a company is ordered around by some other person. And of course mistakes will be made in this process. But everyone who knows it better has the freedom to build his/her own company. Just do it and you will start to experience constraints you may never thought of, when you worked as a programmer. Managers have those constraints too and they also have to live with them.

There are ways to overcome this, but this means that programmers have to 'face reality' and can't only concentrate on programming anymore. But this would also make them less productive as programmers and in the end this may be the reason why we have this separation between management and development. It may be a bad solution, but other solutions seems to work out even worse.


"The article is badly written, the grammar is poor"

Sorry, but I'm not going to write my articles in German and hire a professional translator to translate them into good English. I do my best, incl. using a spell checker to correct mistakes, but English really isn't a language which is easy to master (like probably every natural language). I would really like to see Esperanto or Lojban as 'world-language' instead (to level out the playing field), but in the moment it's English and the native speakers simply have to bear with us non-natives.


To come to an end: It may be possible, that in this moment, Java isn't the most productive language anymore. We will only know this in some years, but it's impossible to know today. And yes, in certain domains Java is for sure not the most productive language - and maybe never was. But that's not the point. The point is that most companies and programmers use Java, because it is (or at least was in the near past) the best overall solution for most programming tasks. If you really think that 'your' language is the 'next big thing' then ask yourself the following questions:

  • Can my language compete with Java productivity wise if Java is used by a good Java developer using all the available tools and libs?

  • Can my language compete with Java not only in a single domain, but in most domains important to general programming today?

  • Has my language comprehensive, easy available documentation which is directed at everyday-use and not only for beginner or 'esoteric' stuff? Which has both concise parts for reference purposes and also explains concrete usage for learning purposes.

  • Has my language a GUI lib which enables me to creates good looking GUIs which can compete with professional software at least on the main platforms (Windows, Mac, Linux - ordered by importance)?

  • Can I write code which is fast enough to use my program on a bigger scale or use it on the desktop.


If one of the answers is 'no', 'your' language can still become a success, but if there are more no's, it's highly improbable, even if the language is full of cute concepts, has a beautiful syntax and is a pleasure to use.

Monday, March 05, 2007

Why do most people seem to use inferior programming-languages?

If you read discussions about programming languages, one topic is quite common: "Why do people use an inferior language like Java/C++/what-the-hell and not a superior language like Lisp/Haskell/Python/you-name-it"?

The reason is: The language is only a small piece a the big scheme. Todays languages aren't that different at all, productivity wise. While some things have a certain impact (having a garbage collector for example), most things in the language don't really matter in practice to a big degree and are largely overshadowed by the real important things: The availability of tools, libraries, documentation and the performance of the resulting code.

Imagine two competing car-companies A and B. A decides to invest in better automation buying robots, while B shys away from it because if the initial investment and continues to use human workers. After 2 years the investment of A starts to pay out: Productivity increases, the quality of the products gets better and the production costs are reduced. To stay competitive, B now simply has to catch up or lose market share and eventually even disappear.

I don't see a reason why software production is different: If company A can be more productive with a certain language, then the competition has to catch up or lose. That's how capitalism works. As long as we have relativly free markets, all companies have to strife for the best tools to do a job. And this really shows: Not so much time ago, most programming was done in C and later C++. But today Java is the most used language. The industry is obviously able to change to a different language if it really gives an advantage. But why has it moved to Java and not to Lisp, Haskell or Smalltalk?

There are some people who seem to think that most people simply are to stupid to grasp the complexity of those languages and thus using the more simple Java. But this is a fallacy, because Lisp and Smalltalk's simply are not more difficult to use. In fact Smalltalk is a much more easy to use language than Java. You can learn Smalltalk in a day and be quite productive after 2 weeks, something which is impossible in Java.

And Lisp can be quite simple, too (at least in the 'Scheme flavor'). Especially if we take a look at C++, the most used language not long ago, it becomes obvious that simplicity has nothing to do with it: Even Common Lisp is a piece of cake compared to C++. Not only the language is more easy to use, also the concepts used by the language are more simple (People who can use templates won't have problems with Lisp-macros after a short time of accustomization for example).

So if simplicity was really the reason to choose a certain language, nobody would ever have used C++ and the world would probably program in Scheme or Smalltalk today.

The next fallacy always heard of is the stupid manager who choose a language based on the 'marked leader'. But again this doesn't seem to be correct: When Sun created Java, it was no big market leader. The leader was Microsoft and MS used C++ and VB. But still Java made an impact and became the 'big thing'. MS has to actively fight against it, but still partly failed and had to jump on the bandwagon later by creating it's own 'Java' to get market share back. So it's quite short-sighted to use conspiracy theories where the 'market-leaders' make out which the 'next big thing' is and everybody else has to follow. If only a single company would abandon this 'agreement' and start to use a more productive tool, the others have no choice than to catch up, like our car-company B above.

The real reason why Java is still the number one language in the market is that it gives the highest overall productivity. Of course this isn't true in all areas, but it has to be true in the areas where the majority of developers work. But what are the reason for this?

I look at many programming languages and also try out a lot. I've done this for years. And still I use Java as my primary language. You can call me a stupid Java-Joe or whatever you like, but then you will overlook the real reason why I and many others still prefer Java despite the fact that they know most of those new, fancy, shining languages.

Before I used Java, my primary language was C++. I've switched to Java relatively late when version 1.3 came out. But the main reason I switched wasn't the language. I've looked at Java directly after it becomes available - and found it horrible. But years later I tried out JetBrains IDEA and it was unbelievable how much productivity this IDE gave me compared to VC++ I used for C++ development. While VC++ had 'Wizards' for code generation, those Wizards only created 'code-skeletons' which were only useful as a starting point but are no big things over the whole live cycle of a project. But in IDEA all those little helpers were usable in all phases of development. Many of the uglities the Java language has simply disappeared when I used this IDE, and I suddenly could concentrate on the problem instead of fighting with the language.

But maybe the biggest advantage was that the IDE removed the 'documentation-problem' I've to face in almost every other language. If you write code you constantly have to remember names, parameter order etc. By using IDEA with Java this wasn't necessary anymore. The very intelligent code-completion and the integrated documentation-lookup removed the necessity to remember all those things. Finding the right method or class was most often only a key-press away. And because of the real-time error-checking compile errors were nearly eliminated, too. In the end IDEA (and later also Eclipse) made Java a completely different language. Yes, the IDE matters. A lot. Just consider to write Lisp-Code without a REPL or Smalltalk-Code without the Smalltalk-Browser if you're a user of one of those languages.

So the real reason why I switched over to Java was not the language itself, it was the combination of the language, the IDE and the available, well documented and comprehensive libraries. In fact the language was the least important one of those reasons. This hasn't changed: Java without the tools is total crap. Most languages are better. But Java is so 'IDE-friendly' that it was possible to create an IDE like IDEA.

Not every language allows this. The dynamic ones like Python, Ruby, Lisp and Smalltalk always have the problem that it's mostly undecidable at compile-time which methods are available for a certain variable. But hey, that's not a bug, that's the big feature. It's the reason those languages are called 'dynamic', and an Java-IDE is in principle nothing else than an interactive extension to the compiler.

But there are other problems in certain languages. For example Haskell: It's a really nice language to create small, cute programs. The language is like a puzzle, it makes your brain hurt - but in a good way, like your muscles 'hurt good' after a good workout. But if I work to build a house, muscle-ache from to much lifting heavy stuff isn't that funny anymore. And the same is true for doing 'real stuff' in Haskell: Just look at the code of a big project implemented in Haskell, like the sources of the GHC. I don't see 'beauty' anymore in this code. It's hard to read and it's BIG. The type-checker of GHC alone is nearly as big as the whole code of Hugs - but Hugs is written in C while GHC is written in Haskell. Sure, GHC has some more extensions compared to Hugs, but this still should make you think. Also Haskell has a big documentation problem: The Haskell docs are absolutely insufficient on many levels. This alone is enough to ensure that Haskell will never become mainstream unless it changes.

And for other languages there is the performance problem. Yes, performance does matter. People sometimes spend hundreds of bucks to get a 50% performance increase of their hardware. Do you really think they wouldn't care if a program is 100 times slower because of using a certain language? Sure, if you write a script which is used once a week, it really doesn't matter it it takes 0.1sec or 10sec to finish. But if you look at web-apps, a factor of 100 can easily cost millions because you need to buy and maintain 100 times the number of servers.

For desktop application even Java is often still ruled out compared to C++ because of performance reasons. And Java is in average only 1.5-2 times slower than C++! So how could a language like Python or Ruby compete here? The answer is: They simply can't. You may use them in parts of a project, for example for scripting, but the majority work is still done in one of those boring 'previous-generation' languages. And again it's not because the developers are stupid, it's because nobody would buy a program which is as slow as if you made a step back in time to a 486, even if you have invested in a modern dual-core CPU.

And there are the libraries. Libs can make or break a language. And one of the most important one is the dreaded 'GUI-lib': If you want to create a desktop application and a language has no GUI-libs which allows to create good-looking(!!) and easy to use GUIs, the language is ruled out. And since most developers have to create GUI-apps from time to time and don't like to switch languages over and over again, no or only poor GUI-libs often means that the language won't be considered, even for projects which don't seems to need a GUI. This may be inconceivable for some people, but it's a fact of life: A language without good GUI-libs will never become mainstream. This also nearly broke Java's neck, because AWT was horrible and Swing took it's time to become useful, but with 1.3 Swing was decent and from 1.4 up it was good enough (if you use 3rd party look&feel libraries) for many purposes. But even if Swing is relativly good, its deficiencies (especially the non-native looks) still hinders Javas breakthrough on the desktop.

So even the best language can become totally useless if it lacks on of those: Competitive speed, good documentation, comprehensive libraries, necessary tools (like IDEs). If language fails to address those points (and really all of them) it won't become the 'next big thing'.

That's the big secret. Nothing more. Create a language as beautiful as you want. But without the necessary libs and comprehensive documentation: No chance. Slow: No chance. Not competitive if compared to other languages including tools like IDEs: Again, no chance.

So if you really want that your favorite language becomes more used or even mainstream: Help to solve those problems. Lamenting in blog-posts or online-discussions is as useless as spending lots of time writing tutorials explaining concepts from 'your' language to the world. The best tutorial and the most cunning style of discussion is useless in the moment a potential user discovers that there is no library for a certain problem he has to solve (like creating a GUI), or if the novice has to face incomprehensible and lacking documentation in the moment he/she tries to do something real with the language.


[Update:] I've written a follow-up article which discusses some comments and objections to this article. You can read it here .

Friday, February 16, 2007

Understanding Monads. For real.

Again an article about the "what are monads?" topic? Aren't there enough on the net already? Maybe. But since I've read a lot of them and still had problems to really 'grok' the concept, I suspect that other may have those problems too.

I'm not a mathematician. I've studied physics and while I learned a lot mathematics in the process it's a totally different way of learning mathematics than directly studying mathematics: In physics mathematics is an important tool which always have to be backed by reality. If you simply calculate something strictly by using mathematical rules you often get physically nonsensical results. So you always need to have some 'image of reality' back in your mind.

In mathematics on the other hand those 'images' are less important and sometimes even counter-productive. If an image is to much tied to our image of the world, it can be quite limiting. Using such an image can even prevent finding more uses for a certain abstraction. So mathematicians have to learn to work with the abstractions itself without using an image, because every image could limit the use of an abstraction.

But if we want to apply mathematics in some domain we need the image back. For a mathematician listing the monadic rules may be enough to know about the structure which is created by those rules, but for non-mathematicians which haven't been trained to think this way, it's not. And after all there really is an image which shows what a monad (as used in functional programming) really is:

A monad is like a 'macro': All it does is a code-transformation.

I know, this notion has been used in some articles about the topic, but often only casually along the way. And if you're a mathematician this is really nothing more than 'uninteresting applicative stuff', but if you're a programmer which has to use monads in some real application, you simply need something more real, something you can relate to. Just a set of mathematical rules isn't enough. So why not think of a monad as some kind of 'functional-macro'? As programmer this should be a rather familiar picture. Lets elaborate a bit on this:

What is a macro? It's something which get some data and dices and splices them. But before it can do its dicing and splicing, there need to be some slicing - which is done by the macro-processor.

Let's look for example at the C-preprocessor as an example of a very primitive (but widely used) macro facility:

#define m1(x, y) x y y
m1(a, + 1)

Now m1 is read here by the preprocessor which recognizes it as a macro and slices the following text between the parenthesis at the ',' and feeds those text-slices into the macro. This creates the new text ("a + 1 + 1") from it which is then feed back into the preprocessor as new input (so that macro expansion can happen again on the resulting text).

In Lisp it's a bit more sophisticated because the macro processor works on s-exprs and not on plain text (yes, I know about reader macros, but let's concentrate on standard macros here). If we have a piece of code like:

(some-function (m2 x (+ y 1)) z)

and 'm2' is a macro, then the macro-processor does some slicing. The resulting pieces are s-exprs, namely "x" and "(+ y 1)" which are then feed into the macro 'm2'. The result is then put there where the (m2 ...) was and is evaluated again.

And in Haskell? There the 'slicer' is the 'do-statement'. If we write

do
v1 <- stmt1
stmt2
v2 <- stmt3
stmt4

then the compiler slices the content into separate statements and wraps them into closures. It then put function calls around it (using a function called 'bind' or in '>>=' in Haskell). The result of this transformation is than again used as new input.

The difference to the other macro processors is that the evaluation (the 'splicing and dicing') is done by the 'bind' function at runtime now. This has the advantage that the type checker has ran and by using type information the bind-function can be overloaded. So instead of providing an explicit name for the macro like in the C and Lisp examples above, the concrete macro is chosen by the type of the first statement in the do-block now.

And how can it be assured that the whole block is translated by the same macro? That's the task of the type-checker: By giving the bind-function the type

bind(M<a>, A -> M<b>) -> M<b>

the result of a bind-function has to be of the same type as the input. The monad can be parametrized by a parameter and this parameter can be changed by the bind-function. But the monad is still the same. This ensures that all statements are evaluated 'in the same monad' - or in other words: That all statements in a do-block are subject to the same kind of code transformation.

The type 'M' is often also called 'the monad'. And it's reasonable: In Lisp the macro is chosen only by a name and so we would call the above the 'm2-macro'. But in Haskell the choosing is done by type and thus the type of the monad gives the macro/monad its name. Thus List-monad, Maybe-monad, etc. But the monad isn't just the type, its also the implementation of the bind function (and there is also have to be a 'return' function), because without a bind-function it simply wouldn't do anything. The combination of type, bind and return functions together are needed to build a macro - and so all those things together are called 'a monad' (like the Lisp macro is not only the name of the macro but also the code which does the code transformation).

So that's it: A monad is a code-transformer. Like a macro. It's simple to remember: Both names even start with a 'M'.

While the fundamental entity of the C preprocessor is simple text, in Lisp it's a s-expr. And in Haskell this fundamental entity is a function. Everything in pure functional programming can be expressed by it. And so it's no wonder that in Haskell 'macros' get functions as input. That's all the magic. Of course we could also use the bind function directly and write code using '>>=' manually, but 'do' makes it often much more easy to read and write the code.

I put the '' around the word 'macro' because Monads are not really macros by definition. So if you want to be picky you can find reasons why this picture isn't totally correct. But does this really matter? Or isn't it more important to have a useful way of thinking if we deal with an abstraction?

What are those differences? The main one is that input to 'real' macros hasn't to be valid code. The C-preprocessor accepts everything which is text and Lisp macros accepts all kinds of s-exprs. But monads only work on valid Haskell-code. They can only change the runtime semantics of it. And the syntax itself isn't changeable too, because one always has to obey the syntax of the do block. So a monad is not a macro in the sense that you can create a real new syntax (like in Lisp). You can only create new semantics.

Also all kinds of macros have their limitations. Lisp macros sometimes need a code-walker, the C preprocessor is quite limited, too and also Monads have their limits.

And those limits are in fact quite severe which lead to the invention of more able constructs like 'Arrows'. The core of limitation is that a bind-function can't look '
'into' its second argument. It we have a do statement the above:

do
v1 <- stmt1
stmt2
v2 <- stmt3
stmt4

the compiler transforms it into the following code ("\v -> ..." is the syntax for a closures here):

bind(stmt1,
\v1 -> bind(stmt2,
\_ -> bind(stmt3,
\v2 -> stmt4)))

(The '_' parameter is used in 'dummy-assignments' which are created if we used no explicit assignment) [Edit: Corrected error]

If we now look at the first 'bind', it takes 'stmt1' and a closure. Now this bind can do lots of things depending on the result value of stmt1 but if has no clue, what its second parameter (the closure) returns until it evaluates it. And thus it has no possibility to look into the later bind-functions. This is a severe limitation: It's impossible to create a transformation which analyzes the whole do-block before evaluating it.

So it's for example impossible to create an LALR-monad which transforms it's statements into a LALR-parser. It seems possible to simply define those statements as 'actions' which return a value to instruct the bind-functions to build the right code, but this would be quite limited because we can't add semantic actions this way: The result of the evaluation of the second bind-parameter has not only to contain the monad itself, but also the result of the semantic action of the parser. And this is only possible if we evaluate both in one turn.

The next problem is binding values to 'variables'. Monads simply use the normal functional way of doing binding via lambdas. The 'var <- ...' syntax simply creates a closure with 'var' as parameter which is then visible in all the levels 'below'. This works fine if the structure of the resulting code is similar to the structure of the input code, but it makes it for example impossible to transform a do-block into code which executes backwards from the last statement to the first one.

So while monads are quite powerful to create certain kinds of abstractions (= code transformations) they can't do everything. But nonetheless, that's really it: Monads are code transformations. This is also the reason why monads seem to be a bit difficult to understand: Each monad create a new language. The Maybe-monad creates a simple 'first-fail-fails-the-whole-computation'-language, the List-monad creates a simple 'try-out all combinations'-language, the State-monad a 'Haskell-with-mutation'-language, etc.

The transformations necessary from ordinary 'one statement after each other'-form written in a do-block to the resulting code can by quite difficult to comprehend: Simply because such a code transformation can be quite different from the input code in the end.

And because we have to learn new language semantics for every new monad, it's no wonder that the concept seems to be a bit hard to grasp. We may know the semantics of basic Haskell, but for every monad we have to learn a new language. Again and again. And if we don't even know that a monad creates a new language, understanding this becomes much more difficult, too.

But at least this shouldn't be a problem anymore now.




To deepen our image let's now look at some examples now. I will use only basic Haskell syntax here or even 'Algol-like' syntax to make it better understandable.

Some of the most simple monad is the 'Maybe monad'. Lets write something like

do
x <- func1 ...
y <- func2 ...
func3 ...
return (x + y)

Here all three functions 'func1, ..., func3' should returns a 'Maybe' value which can be 'Just x' or 'Nothing'. Because 'Maybe' is a monad and we've used the 'do' syntax, this code is transformed into something like this:

tmp1 = func1(...)
if isNothing(tmp1) then
return Nothing
else
x = fromJust(tmp1)
tmp2 = func2(...)
if isNothing(tmp2) then
return Nothing
else
y = fromJust(tmp2)
tmp3 = func3(...)
if isNothing(tmp3) then
return Nothing
else
return Just(x + y)
end
end
end

This looks a bit ugly, but shows what's happening: Each statement in the monad is transformed into a if-then-else expression in a way that the first statement which return 'Nothing' aborts the whole block and let it return nothing too.

We could also say 'The maybe monad is an abstraction to represent computations which can fail'. True, thats what macros do: Create new abstractions. Without remembering that a monad is just a kind of macro this sentence would sound quite 'arcane'. But now as we know the secret name of monads, the esoteric flair is gone and we see that they are something which we all know quite well. So the maybe-monad is nothing more than a mechanism which translate those innocent looking statements in a do-bloock above into a nested if-then-else-chain like the one below.

This works for other monads too. The list-monads transforms a linear list of statements into into nested map-functions. Sure, a mathematician may say something like 'The list monad represents non-deterministic-computation'. But in the end all it does is to transform this:

do
a <- [1, 2, 3]
b <- [4, 5]
return (a + b)

into this:

concatMap (\a -> concatMap (\b -> [a + b]) [4, 5]) [1, 2, 3]

concatMap maps list elements over a function like the normal map, but concatenates the results to a single list. This allows to return any number of values in each invokation. [Edit: fixed error here (mistakenly used fmap instead of concatMap)].

If you're not that familiar with Haskell, the above works like this imperative code:
     
result = []
foreach a in [1, 2, 3]
foreach b in [4, 5]
result = append(result, [a + b])
end
end

)


But we can do more complex ones, too. One of this 'a bit more complex' transformations is the state-monad. What we want to do is something like this:

do
value <- get
put (value + 1)
value <- get
return value

Here we have 'commands' which do something which isn't possible in plain functional programming: Reading and mutating a 'state'. With 'get' we get the actual state and with 'put' we can store a new value as the current state. The state in the above example is a simple numeric value, but since we can use arbitrary structures as values too, it allows that a state consists of multiple values.

But how does this work? We all know that pure functional programming don't allow something like mutation. But to store a new state we need exactly this. To make it possible we need to chain our state thru all relevant function calls. This can be done like this:

function get(state)
state
end

function put(state, value)
value
end

let (state', value) = get(state) in
let (state'', _) = put(state', value + 1) in
let (state''', value') = get(state'') in
value'

Wow, this looks ugly. We need a new 'fresh' name for each new version of 'state' and 'value' and we also have to chain it manually thru the calls of put and get. But this method allows the simulation of mutation without requiring real mutation.

To make this more readable we can now use a monad to transform the original code to the code above. We can't do the calculation directly this time because the in the example below, the code first needs some initial 'state'-value which is than chained thru the calls. So instead of calculating the result directly, we let the monad create a new function. This function can then take the initial state as a parameter and will call the generated code. And this creates the result then. So this is the first case of a monad doing real code-transformation.

To create a new monad we first need a new type. We simply call the type 'State':

data State st t = State (st -> (st, t))

This is out monad. It takes two type parameters: The type 'st' of the state to carry and the return type 't'. This return type can vary for each statement which is a constraint of the monad type.

The inhteresting part here is that the monad don't carries the state itself around, but a closure which takes the old state and returns a tuple of the new state and a result value. This closure is also called an 'action', because it encapsulates the action defined by the statement.

Now we create the bind and return functions for this type:

getCode (State code) = code

instance Monad (State st) where
(>>=) prev next = State $ \st -> let (st_next, res) = (getCode prev) st
in (getCode (next res)) st_next

return v = State $ \st -> (st, v)

The 'getCode' function simply returns the actual 'code' with is stored in our monad. 'return' is simple: It creates an action which takes a state and returns the same state and the return value. The bind function (here named '>>=') takes the previous monad 'prev' and a function 'next' which will return the next monad. It now creates an action which first evaluates the code in the prev-value with the actual state. Then it uses the result to call the 'next'-function which in turn creates the next monad in the chain. This next monad is then again evauluated, but this time with the new state, the prev-monad returned.

This chains the state thru the actions. First the initial state thru the 'first' monad creating a new state. And then this new state thru the result of the 'next'-monad, createing the final state (which is then evaluated by the calling bind-function etc.).

Now we can build out 'set' and 'get' functions. This is quite straight-forward. The 'get' simply uses the actual state-value as return value:

get :: State st st
get = State $ \st -> (st, st)


And the 'set'-function ignores the previous state and creates a new one. It also returns the state in turn to allow assignments like 'x <- set (x + 1)'. This isn't necessary but convenient.

set :: t -> State t t
set val = State $ \_ -> (val, val)

That's it. Out state-monad. Now lets create a simple do-block to use it:

test1 = do
x <- get
set (x + 4)
x <- get
set (x * 3)
x <- get
return (x-5)

Here we can see, that the first statement is a get. But where does the state come from which is returned by 'get'? Simple: If we call 'test1', we don't get the real return value, but a closure we have to evaluate first with the initial state. Let's do this:

main = (getCode test1) 0

'test1' returns a State-monad. We first have to get the code to call out of the monad by calling 'getCode' again. This code can now simply be evaluated by calling it with our initial state (a 0 in this case). As the result we will get a tupel with the value (12,7) in this case. The first value is our last state, the second is the real result (as returned by 'return (x - 5)'). Both values make sense, so our monad seems to work correctly.

Now lets take a look under the hood of the above:

The do-block above first creates the follwing code:

bind(get, \x ->
bind(set (x + 4), \_ ->
bind(get, \x ->
bind(set (x * 3), \_ ->
bind(get, \x ->
return (x-5))))))

The bind function now does the real code-transformation. I've tried to write down the resulting closure if we only expand calls to bind, return, get and set, but it was simply to long and crumbersome. Lets do it instead for a simplified version of the above:

do
x <- get
return x*2

this is rewritten into

bind(get, x -> return(x*2))

which if we evaluate bind, get and return and pull out the code from the resulting monad, creates the following closure:

\st ->
let (st', x) = \st -> (st, st) -- our 'get' statement
in (x -> (\st -> (st, x*2))) st' -- our 'return' statement

Again we see, that the monad simply does code transformation in the end, so the image of looking at monads as code transformations holds. So even if it looks a bit wierd, in the end the state monad really does the transformation we started with.

Friday, February 02, 2007

Why monads are 'evil'

This is an heavily updated version of a previous article with a somehow similar name. From the comments to this article I learned where the original article was misleading and partly even wrong. So I try to correct that with this updated version.


What is functional programming? Many people tend to think that having closures in a language makes that language a functional one. But by this definition almost every modern language would qualify as 'functional'. Even Java (because of Javas anonymous inner classes which are closures too). So if this can't be the qualifying property of a language, what else is it?

Of course it's "referential transparency": A 'function' is an special kind of relation which assigns values of a domain-set to values of a result-('codomain') set. To qualify as a function this mapping has to be unambiguous: For every element of the domain-set the function always gives the single, same result.

If a 'function' isn't referential transparent, this property isn't fulfilled and it's simply not a function anymore. It's something which is often called 'procedure'. We can argue if this property really has to be fulfilled rigorously by a language to qualify as 'functional', but I would say 'Yes, it has!'. The reason is that we can use procedures to create functions in every language (just by making sure that there are no side-effects), but to really call a language 'functional' it has to be assured by the compiler that every function is really a function. BTW: we can make every procedure a function by simply including the whole 'environment' as input to the procedure. With this little trick every procedure would now be a function and every programming language would be functional. Yes, this is nonsense - remember that for later.

But with this rigorous definition of the notion 'functional', there aren't many functional languages lest. Ocaml for example is clearly non-functional now. But even Haskell isn't. The reason is the IO-monad.

Do to I/O, Haskell uses something which is called 'I/O-monad'. If we write

main = do
x <- getStr
putStr ("you entered " ++ x)

the following happens: First the 'do' statement transforms the code by using a function called '>>=' (pronounced as 'bind').

getStr >>= (\x -> putStr ("you entered " ++ x))

(If we use the name 'bind' instead of '>>=' and the prefix form of function calls this would look like this:

bind(getStr, (\x -> putStr ("you entered " ++ x)))
)

The getStr function (which is in fact a constant and not even a function because it don't takes any parameters) is just a parameter for the 'bind'-function. It returns an 'action', a value of type 'IO String'. 'IO' is a special type here which simply encapsulates something and is parametrized with the type 'String' (in Java/C++ we would write this as IO). But if 'getStr' always gives the same value, how can it be used to input Strings from the console?

The answer is that 'getStr' doesn't do this. Its only a command to do it. And this command goes as first input into the 'bind'-function which executes it. The second paramter of the call is the 'to-do'-function. Its the code which is associated with the action and has to be called with the result of the action. The bind-function returns a value itself which is also a action. This allows us to use the result of a bind-function as first parameter in another bind-function. So those functions can be chained arbitrarily - and this is what the 'do'-syntax does (just in a more convenient way).

Back to our example: The bind-operator received the 'getStr'-action as input. This action instructs it to fetch a String from the console and call the to-do-function with it. Now this function again returns an action, this time it's a 'putStr' action. This 'putStr' action is again a constant, but it was created 'on the fly' from the putStr function which takes one parameter: The String to write out. The next bind operation is invisible, because it happens outside the main function in the REPL or compiler. But it's executed like the first bind and it uses the 'putStr' action to write the data out.

So it's the bind-function which isn't really referential transparent: If you apply it to the same action twice, it can call it's 'to-do' function with a different value. Now Haskell clever hides that because it don't allows anybody to examine the content of the actions: Because 'bind' always returns such an opaque action, its (theoretically) impossible for a program to see that two results are in fact different. And because Haskell allows no mutation, the to-do-function can't write this result somewhere outside the I/O-monad. But isn't that enough to ensure referential transparency? I would say no.

The reason is the same that I don't consider ordinary procedures as referential transparent: It's just a trick. The operational semantics of the whole mechanism are simply non-referential transparent, if Haskell hides it well or not. We can in principle write the whole program in the I/O monad and there is no difference to an ordinary imperative language anymore. So we should go with the 'duck-paradigmization': If it mutates like the imperative paradigm, is non-referential transparent like the imperative paradigm and has a fixed execution order like the imperative paradigm, I would call the it the imperative paradigm.

Lets look at an alternative approach to the functional I/O problem: We create a 'world'-value which contains the relevant data from the 'outside' (likes files, console input etc) and feed this value into a function. This function can now create some output by processing informations from this 'world'. By evaluating the 'world'-value lazily we can even create interactive programs, because the function simply stops the evaluation in the moment there are no new input values and continues evaluation in the moment we provide them (for example by typing something on the keyboard). With this concept a main function would look like this:

main :: world -> output

In a simple console application both 'world' and 'output' would simply be lists of characters. But for a more complex applications the 'world' could contain all kinds of mouse/keyboard/etc-events while the 'output' would contain for example drawing-commands.

Whats the difference to the monadic-I/O concept of Haskell? Couldn't we simply use this approach to use it to implement our I/O-monad? The interpreter of the I/O-actions would simply use such 'world'- and 'output'-values and use them by the actions the program provides. Aren't both concepts now simply identically, but easier to use in Haskell because the difficult stuff is well hidden inside the I/O-monad?

While this is true 'in principle' it's only true on the same level that all Turing-complete languages are identically 'in principle':

What the I/O monads does is creating a new sub-language with imperative semantics. Every code which runs 'in' the I/O-monad is in fact evaluated imperatively. This transformation is done by the 'bind' functions and hidden from view by the 'do'-construct. The 'do'-construct slices all the statements in the body into small pieces and feeds them into those bind-functions. Now their execution order isn't anymore in the statement-order but is the bind functions which choose to evaluate them (in any order, multiple times, or not at all). And the values those statements give and take is also controlled by those bind-functions because they provide them (as long as they are assigned with the '<-' operation).

So every code we have inside such a do-block can have nearly arbitrary semantics. It's a bit similar to Lisp macros but the transformation happens at runtime. And because of the chaining of bind-functions, semantics of such a block only depends on the type of the first statement in the do-block.

Think about this: By writing the right 'bind' functions we can create in principle every semantics we want. For example we can create a language with all the semantics of the Java language right into Haskell - we 'only' have to create the right monad. Sure, the syntax would be different from Java because the code is still parsed by the Haskell parser and needs to follow certain rules, but the semantics of this code could be identically to Java. With this 'Java-monad' we're now able to program in Haskell like we're using Java. But Java is an imperative, object-oriented language so nobody would say that we're still writing code in a functional programming language.

Using the I/O-monad is similar: It provides a new language with new semantics by doing runtime code-rewriting. It's not a functional language anymore, even if it's implemented inside a functional language. We simply have left 'functional land' if we use the I/O monad - and we can never return from it because the I/O-monad is grounded in the compiler, the outermost layer of every program. We can only call functional code from this imperative layer but this functional code can't do any I/O anymore.

But whats the difference to the explicit way of doing I/O? It's that we still have full control about what we're doing: We're working on the level of functions instead of creating actions which are somehow evaluated by an invisible interpreter. We have to supply the input values manually and we call real functions instead of building some action-values which are evaluated somehow. If we want we can slice the 'world' into pieces, supply only parts of it to functions and the result of those functions can be something arbitrary. And we can use all the normal functions instead of creating new 'monadized' ones.

Sure, we have to think more about certain things - but this is part of doing work in a certain paradigm. If we want to have the advantages of the paradigm we can't simply create a new sub-language in a different paradigm and expect to still have the advantages which are the reason why we used this paradigm in the first hand. If we want do do I/O with the I/O-monad we have to switch programming languages. We stop using to program in a shiny new functional way and are back in the boring old land of the imperative. Even worse: Because Haskell don't provides a different way of doing I/O, it's like a confession that functional programming can't do I/O. And all this only because of the 'evil' I/O-monad.

And there are other problems which apply to monads in general:

  • The performance seem to lack because of the runtime-code-translation: The translation costs time and memory and can sometimes even kill important properties like tail recursion (because the program we thought we wrote is not the program which is executed because of the monadic translation). If you compare Haskell with the Clean-language (which the direct state-chaining approach instead of monads based code-translation to do I/O), Clean wins hands down in many benchmarks.

  • Code reuse gets more difficult: This is a common problems with domain specific languages: Because we create parts of code with different language semantics, it's hard to fit those parts of code together later. We've not only to worry about different interfaces - we have to consider even different semantics! In Haskell we can put monads into other monads (and create some kind of translation-chaining) but this won't works always and so sometimes code reuse gets impossible. And after we left 'plain functional land' and entered the land of some arbitrary new semantics we need special versions of our well known functional tools and need specialized ones.

  • The real semantics are much more difficult to understand: The sheer number of 'how do monads work'-articles speak volumes. And many are still missing the real point: Monads are code transformers. Because this they can do nearly 'everything' - and to understand them you have to understand every concrete monad on each own, because each of them creates a new language! This is it what makes monads so hard to grasp.

  • It can hurt readability: A concrete monad is choosen by the return type of a function. For example a simple 'x <- get' can switch the rest of the do-block into 'state-monad'-land. This is quite easy to overlook because the type of a function isn't always obvious. In Lisp macros often have at least lengthy, descriptive names, in Haskell it's far less obvious. Explicit type annotations are a must here to see whats really happening.


As more I understand the concept of monads the more I'm becoming skeptical about them. Like Lisps macros they simply are to powerful. Instead of creating tools to build new language inside a language, why not directly create a powerful language?

I know that many people will see this differently because they like to use languages as 'construction kits' for new languages. Yes, this this is a valid reason, but only in a very limited domain. In most areas we don't need a language to create another language but to solve a certain, concrete problem: Create a web-application, a word processor, a computer-game or something else. I prefer to have a language with fixed semantics, which I only need to learn once and which don't change (at least until the next language revision). This makes code much more easy to understand and to reuse and this enhances productivity.

As Haskell being some kind of 'research language' originally, monads are surly helpful in this domain. But for a language directed to build applications we need different properties.

Tuesday, January 30, 2007

Real functional programming or "Why the IO-monad is evil"

Edit: This article contains some errors and wasn't able to transport my intention correctly. So I've created a new version of this article which hopefully contains less errors and is better to read.




What is functional programming? Many people tend to think that having closures in a language makes that language a functional one. But by this definition almost every modern language would qualify as 'functional'. Even Java (because of Javas anonymous inner classes which are closures too). So if this can't really be the qualifying property of a language, what else is it?

Of course it's the "referential transparency": A 'function' is an special kind of relation which assigns values of a domain-set to values of a result-('codomain') set. To qualify as a function this mapping has to be unambiguous: For every element of the domain-set the function always gives the single, same result.

If a 'function' isn't referential transparent, this property isn't fulfilled, so it's not a function anymore. It's something which is often called 'procedure': A block of code which creates a result by applying some algorithm. We can argue if this property really has to be fulfilled rigorously by a language to qualify as 'functional', but I would say 'Yes, it has!'. The reason is that in every language we can use procedures to create functions (just by making sure that there are no side-effects), but those languages aren't still called 'functional' only because the programmer can force himself to use them in a functional way.

If this would be enough, we could also call assembler a functional language, because we can use it to write functional code too. The same would be true for every other paradigm, Assembler would be for example an object-oriented language too (And this is of course true for every Turing-complete language, not only of Assembler).

But with this rigorous definition of the notion 'functional', there aren't many functional languages anymore. Ocaml for example is clearly non-functional now - and even Haskell isn't. So should we maybe lift this requirement a bit? I don't think so.

Why isn't Haskell a functional language? The reason is the IO-monad. To do I/O in Haskell there are functions which aren't referential transparent, because they return a different value at each invocation. If we write

let v = getStr in ...

v is bound to a different value at each invocation. Even if this value is contained in the IO monad, it's still a different value, because we can write code which runs different on each invocation. This creates some kind of avalanche effect which can turn huge parts of the code into simple imperative code - and because of this, we can't talk of Haskell as a functional language. Even if we want to create pure functional code, it's not possible in Haskell in the moment we need I/O (and which program doesn't?).

That's the reason why I consider the IO-monad as 'evil': Haskell relies on it and thus isn't functional anymore.

But is it possible to do I/O in a really functional way? If it's not possible then the idea of functional programming would be deeply flawed and should maybe even abandoned for practical programming. What use is a concept which works only in theory and fails in the moment you want to do something as common as I/O?

But fortunately it is possible. The idea is to put all the external input of a program into a 'world'-value and then call our program with it:

result = app(world)

If 'app' is a command-line application for example, then 'world' would represent all the input the program receives and 'result' would be all the output the program generates. This is quite an easy concept, but how would it work if we need interaction? The idea is to use lazy evaluation: Instead of reading the whole 'world' before calling 'app', 'world' is only evaluated on demand. The same is true for 'result' which is written out in the moment data becomes available and not at the end of the program. So our program could give back a partial result (for example an input prompt), before even requiring any real input.

This would work for GUI-programs too. We can for example put all the mouse/keyboard/etc-events in the 'world' object and 'result' contains all the drawing-commands the applications issues. This approach would create a very distinct structure of the program, because input and output are now clearly separated.

This works well for some problems, but often there is some 'natural' interaction between both. Your simple command-line application may for example open, read and write some data-files, based on commands issued by the user. We can not create a simple function like 'readFile(file_name)' because this 'function' could give different data at each invocation and is thus no function. So how to solve this problem?

The answer is to put all 'external data' into the 'world' object: All files, all system resources etc. Of course this also works with lazy evaluation so those data is not really read and put into a value. The 'world'-value behave just as this is the case. And if we now start our application with all those data in the argument, it would again give the same result as long as the input value is the same too. If we only change a single character in a file which is contained in the 'world', the value isn't the same anymore and so we could get a different result.

The advantage of this approach is that we can limit out 'world-value' to the parts of the system which can be accessed by our 'app'. If we don't want that 'app' can read a file for example, we simply don't put the 'file-system' into the value. This allows very strict security constraints on data access in a very natural way. And our 'app' is of course not a big monolithic function but calls other functions which in turn call again other functions etc.. So we can give those other functions limited subsets of the world-value to limit their access, too.

This approach is truly functional and also very natural: An application simply transforms one 'world-state' into another.

And there is an additional advantage of this approach compared to monads: Higher performance.

To use the plain IO-monad, the language isn't referential transparent anymore. So many of the nice and clever optimization techniques for functional programs can't be used anymore. Using the IO-monad forces a certain order of evaluation and also disallow caching of already calculated values.

But even if you use other monads to simulate state, the result isn't really as fast anymore. The result is that monads are constructors and can only 'give' some value. But how can we create and modify state, if we can only return values and have no 'input'?

With state-monads it works this way: Instead of transforming input-data (which contains the old state) into output-data (which contains the new state), a state monad returns a new program. It doesn't do transformation on the data but on the program itself! In the end this transformation does nothing else then transforming the code into 'transformer-style' code which gets the state as input and creates a new state as output. It does this by putting chunks of code into closures and chaining them together using the 'bind'-function. So if you use a state monad in Haskell, the state monad does nothing more than a code-transformation and then evaluating this code which in turn chains a state-value thru the invocations.

But this transformation and evaluation is done at runtime! Sure, sometimes it can be cached or even unrolled at compile-time, but only in rare cases. The reason is that the 'to transformed' code is dependent on the input values of a function which can (for example with a if-then-else construct) create a different kind of code every time. The Haskell compiler can't resolve this in most non-trivial cases and thus the whole code-transformation and evaluation has do be redone over and over again. This costs time and also often even kills tail recursion. So by using a state monad we often have much less efficient code than by using the transformer based approach directly

A language which does I/O by the transformer approach (but in a different way as proposed here) is Clean, and if you look the benchmarks at the shootout, it seems clear that there is really a performance advantage here.

And whats the disadvantages? Of course there are some, or else nobody had even considered to use monads instead.

The main advantage of monads is that they are in principle code transformers. With monads we can create embedded DSLs, similar to Lisp-macros (But since monads do this code transformation at runtime, it costs time and memory, while Lisps macros are evaluated by the compiler before creating the real code). Having the ability to create new 'embedded languages' has some appeal on many people. With the transformer approach this isn't possible - but of course there are still other ways to do this, if it's really wanted.

The second disadvantage is that the transformer approach requires an additional parameter in each invocation. Instead of writing

do
a <- getStr
b <- getStr
putStr $ a ++ "," ++ b

we have to write

let
(a, state) = getStr state
(b, state) = getStr state
state = putStr state (a ++ ", " ++ b)
in
state

instead. This look pretty obfuscated compared to the Haskell approach. But the reason is that Haskell has syntactic support for monads but none for the other approach. If we had to write the first example without the syntactic support the 'do' construct provides, it would look even uglier than the second one (just try it, it's also a good practice if you are new to monads).

So why not simply add syntactic support for the transform-approach too? What about this:

with state let
a <- getStr
b <- getStr
putStr a ++ ", " ++ b

This is quite short too, and the 'with' syntax can be used by the compiler to use the function type to chain 'state' it thru all calls which have a parameter with a matching type and also return a value of the same type.

The last disadvantage is that the transformation approach need additional semantic support. Why? Because lazy evaluation alone isn't enough. We often need a certain order of evaluation to read and write data in a certain required order. Often this order is 'automatically correct', but sometimes it isn't.

For example (without the above proposed syntax to make the problem more clear):

let
state = putStr state "Please enter your name: "
(state, name) = getStr state
state = putStr state ("hello '" ++ name ++ "', please enter your age: ")
(state, age) = getStr state

If we evaluate this statements, the program will write

Please enter your name: hello '

and wait for input. Why? Because the 'getStr state' line isn't evaluated before 'name' is actually required. That's the funny thing with general lasy evaluation: Sometimes the evaluation-order can be quite unexpected.

How so solve this problem? There are multiple solutions. First we can replace the '++' after "hello '" by a strict variant which requires evaluation of both parameters before it continues. This would force the program to the wanted behavior, but it would also require additional thinking by the programmer.

A better way would be to create an internal dependency between data. For example 'putStr' and 'getStr' would create and check a dependency on 'state' which would force evaluation of all previous 'putStr' before a 'getStr' can occur (and vice versa). This would only fore evaluation order into a certain form, but each of the functions would remain referential transparent. But there has to be some compiler support to support this feature.

So I/O is possible, in a totally functional way and with some support from the language even in a relatively simple way. Without monads we lose the possibility to create embedded languages, but I think that this isn't really a big disadvantage. But for I/O monads aren't necessary and even harmful.

Saturday, January 06, 2007

What makes a programming language 'more productive'

First: What does 'productive' mean? I would loosely define it in the context of programming as "Given two equally skilled programmers or teams of programmers, the less time they need to create and maintain a program for a given problem, the more productive is the implementation language".

This definition may have some holes in it, but I first want to make clear, that productivity for me is not an abstract thing like "code beauty" or "conciseness", its simply outcome oriented. If you're able to create a program in a consistently shorter time by using a certain language or framework compared to another, I think it's a good idea to consider it as 'more productive'.

What's the secret of productivity? In fact its utterly simple: Code reuse.

That's it? Code reuse? What about powerful abstractions, concise code or clean and powerful language design? Doesn't matter (unless it leads to better code reuse)! Let me explain:

If you have to solve a problem you have to think about it first. This takes a lot of time. You also have to think about it while you're implementing a solution and later while debugging or maintaining your solution. Thinking about the problem generally requires the most part of the time in software development. There's only one way a programming language can solve it: By providing an existing solution. Sure, there are simple cases where you only have to type a simple solution in without thinking about it much. But even then: If you don't have to write the code yourself and reuse existing code it's simply the fastest way to a working result. Existing code is

It doesn't even matter how powerful a language is: If you can build your solution on existing code with only small and easy modification, you will be faster then using the most powerful language. Also existing code is already debugged and tested. And if it's from a 3rd party developer you don't even have to maintain it.

But code reuse isn't always identical to 'using a framework' or 'importing a lib'. Sometimes it's build right into the language: When I switched from C++ to Java I experienced a huge productivity gain. A big contributor to this was garbage collection. Instead of thinking about memory allocation, when to release an object, how to write a fast custom allocator, allocating data on the stack or on the heap etc. I no could allocate objects without thinking much about it. I could simply consider the problem as 'solved' by the language.

Something like this happens on many more occasions: I can create classes in C, but by having the language do it like in C++ or Java, I can consider the problem 'solved'. In assembler I have to create function calls and local variables myself - in a higher level language this problem is again solved. All this is code reuse: Somebody has looked at common programming situations ('patterns') and created a universal solution for it. Sometimes this solution is directly incorporated into the language, more often it's put into a library. But nearly always both ways are possible, so it doesn't make sense to call code reuse which is part of a language a 'abstraction' and code reuse from a library simple 'code reuse'.

So the reason why certain languages are more 'productive' that others is code reuse. C is more productive than assembler because we don't have to think about allocating variables or building stack-frames. By having the problem solved we don't need to think about it, we don't need to implement it again and again and we don't need to search for bugs which result from making mistakes by doing it ourself.

Now we can try to identify the reasons why certain languages are more productive than others, and why sometimes even more powerful looking languages don't deliver.

Lets first look at Java. I mentioned that my productivity increased a lot after switching from C++ to Java. Besides the already mentioned garbage collection, the huge number of available libraries are another part of the reason. A interesting question is: Why are there so many libs for Java? There are other languages which had huge commercial support but never had as many reusable code as Java. Why?

If I want to reuse code, the code has to fit into my existing code. But more important: It has to fit into the code of other vendors. If I create a program which uses 5 different libs and some 1000 lines of my own code, writing my own code in a way that it fits to the libs is possible - but this doesn't solve the problem of fitting those 5 libs together if all those are written independently. To make code 'fit' it has to use the same conventions, similar interfaces and standards. This works much better if the language enforces it.

One example is garbage collection: When I used C++ there where multiple methods of doing memory allocations. Many used reference counting, but since there was no standard implementation, each lib used their own one. But how can you combine two libs which both have their own ref-counting scheme? In Java this was a non-problem because the language has solve the gc problem.

But there are other examples. Like having standard libs for most of the basic stuff. If two libs use their own array implementations you can't simply use an array from one lib in the other. But if the standard libs provide something it's unlikely that every vendors creates it's own solution.

But it continues on a higher level: If two libraries use different methodologies to solve a similar problem you will get an impedance mismatch if you try to use them together. Maybe one is written in a more procedural and the other in a more object oriented way: Now you need lots of boilerplate code to make such code work together which in turn makes reuse harder.

Java tackled this problem by removing abilities from the language to enforce a certain way of programming. If there is only one sensible way to do something (because other ways are artificially made much more difficult) the programmer may curse the language for this in the moment, but at a later time he may be happy about it because it allowed him to reuse the code in a different application. And that possible because the language enforced a certain way of solving things.

And if independent programmer creates libraries without even knowing in the moment who will use the libraries later, it can really help if they are forced to use a certain methodology, even if it may hurt at the moment.

So while Java really has it's downsides, in the regard of code reuse it really made a lot of progress compared to many earlier languages. So if we want to create better languages we always have to consider this lesson we learned from Java: Having a powerful language alone isn't enough to gain productivity if the language neglects code reuse. Even a less powerful language can fly ahead a more powerful one if it encourages and eases the creation of reusable code.

But time has moved ahead and many ask if its possible to reach the goal of better code reuse AND have a more powerful and more concise language than Java? I'm sure it is, but only as long as we have the idea of code reuse in mind. Creating a language with only 'clarity', 'conciseness' or 'power' in mind isn't enough, we always have to think about how it's possible to enforce the creation of reusable code in this language. Yes, we need to enforce it. Not because programmers are stupid or deliberately write un-reusable code, but because only by enforcing we can be sure that two teams of developers who don't know about each other can create code which will fit together later if reused by a third team of developers. We simply need rules to make their work fit together.

But this leads immediately to a conclusion: Multi-paradigm-languages won't work. While it looks as a good idea to give programmers more freedom to express their ideas, this in turn leads do code which is to different to fit together.

(I suspect that this is the prime reason why Lisp never made a real breakthrough - but also why there are some success stories with Lisp. If you don't need to reuse code (for example if you work in a new field where simply no code exists) Lisp can give you a big productivity gain and create a nice success story. But if a language like Java can play out it's code-reuse card than the gains are irrelevant because the Java programmer simply puts some libs together while the Lisp developer is still thinking about which libs there are on the market and if it's possible to integrate them into one design or do a new implementation instead).

But multi-paradigm isn't the only 'no-no'. Making languages to customizable is another one: If a language can be easily customized by macros, reflection, template meta-programming etc., this can also reduce code reuse: If we want to integrate two libraries which both use and rely on lots of those customizations, it's quite probable that those customizations won't fit together. It can work but often it won't.

This is not only true for 'real' meta-programming like macros or reflection, it can also happen with 'to flexible abstractions'. Lets take a short look at Haskell's monads: They are very powerful - but this leads to problems. If you have code which uses a A-monad and want to fit it together with code which runs in a B-monad, you get problems. And if some 'monad-free' code requires to be run into some monad later you maybe have to rewrite it completely, even if only a very small portion need access to the monad. This can be quite annoying if you have to rewrite your own code - but if you have to reuse 3rd party code it can even render it impossible.

The problem here is not the monad concept itself, its the choice you have to use it or not. The choice creates the possibility to go different way - and using the wrong way can lead to lots of rewrites or reimplementations you have to do instead of simply reusing existing code.

So the secret of successful code reuse is to "removing choice". But thats something which seems unswallowable for many programmers, especially those who consider them selfs 'hackers'.

If you are one of those, let me ask you a question: Would you like a game like chess more if there are no fixed rules and you could do everything? I doubt it, the fixed rules are just the reason why chess is interesting: You have to solve problems in the context of a fixed and rather limited set of rules. If you could simply win by hitting your opponent over the head with a club, chess would loose lots of it's appeal, wouldn't it? So just look at the 'choice problem' in a different way: If there is a limited set of ways to solve a problem, can't this not even makes it more interesting to solve it?

And to the language designers: Isn't it an interesting problem to create features which are expressive AND lead the programmer in a certain direction? Simply putting everything into a language is not difficult (think of Homer in the Simpsons Episode where he designed a car). But it's also simply just to remove things: Creating a language based on a single easy concept is simple too. And a dynamically typed language is much more simple to design than a language with a good static type system. If you really like the challenge then design something new, something different, something difficult.

A language which allows for good code reuse don't have to be simple, it has to force the user to solving problems in a certain way without limiting him to much. This sound like a contradiction and yes it is, but thats the difference between theory and practice: We always have to do compromises or we create things which are good in theory but unusable in practice.