Monday, November 20, 2006

Rich languages vs. minimal languages

A common pattern in the design of programming languages seems to be to designing languages as 'slim' as possible. The idea is to create as few abstractions as possible directly in the language and then to implement all the useful things the programmer needs in the libraries. While there are obviously some differences in how strict this design pattern is applied, I think that it has to be scrutinized. Lets look at some examples:

In Java the language only implements rather simple collection types: Array with a fixed length. So all usable collections have to be created in the language itself which leads to lots of problems, inconsistencies and performance penalties. On the other hand, Java implements some things other languages use to implement in libraries directly in the language. A good example here is the 'synchronized' directive which makes the use of a certain concurrency pattern much easier to apply.

In Common Lisp nearly all features of the language are implemented in the language itself, but many of those features are part of the standard so they are available as a base to every programmer. The problem is still that those features are designed in a way that they can be implemented in the language itself (via macros) which leads to the well known s-expr-syntax which is hard to read for most programmer and one of the main reason, Lisp is criticized often.

Languages like Ruby and Python on the other hand have a very rich feature set build in directly into the language. I suspect that this richness is the main reason why those languages are considered more expressive by many, and not other features like object orientation or dynamic typing. In those languages you have good syntactic support for many common things like collection handling, regular expressions, big integers etc.

Then there is Haskell which is based on a very restricted central idea (a pure, lazy functional language) but extends that quite cleverly with lots of syntactic sugar and allows free definable operator definition. In Haskell you can not only overload existing operators you can also create arbitrary new operators with custom binding priorities. But things like the imperative looking support syntax for monads or the compact list comprehension syntax which both greatly enhance the usability of the language still needed to be implemented as special syntax because even with those free operator creation its not possible to express it as clearly in the base language directly.

I think that we can see a pattern here: Restricting the base language to strong can lead to limited expressiveness and to the need of lots of boilerplate code. Just compare a bit of Ruby:

a = [1, 2, 3, 4, 5, 6]
b = a[2..-2]

with the similar Java code

List a = new ArrayList();
... // some adds omitted
List b = a.sublist(2, a.size() - 2);

The compactness and readability of the Ruby example results primarily from the rich syntax of the language concerning array use and not from the dynamic typing. Even in a statically typed language you could create something similar, for example by using type-inference. But even without someone could invent something like this:

a:[int] = [1, 2, 3, 4, 5, 6]
b:int = a[2..-2]

which is much more compact then the Java example, even with explicit type annotations.

There are lots of similar examples, so what could be the reason why language designers choose to not implement language support for something? Lets look at some possible reasons.

  • Learnability. If the language has a small core someone could be proficient in it much faster.

  • Clear and simple and maybe even 'provable correct' design. The language could be desigend much faster and also much easier. Also the language model may allow abstract theoretical reasoning about important things.

  • Better extensibility. Because all higher-level features are implemented in the language itself they could also be implemented by a 3rd party which allows the creation of new features the designer hasn't thought of or allow to extend existing features with user defined ones.

  • Better compilation. The compiler for a simple language is more simple and could thus be more sophisticated.

There are of course more reasons, but I hope I got the most mentioned here. So lets have a further look at them:

  • Learnability

    Learning a language also means to learn the libraries. If you know Java without knowing at least the base-libs you know nothing. So if you put a feature in a lib or put it in the language itself don't really matter.

    You may suspect that libs have the advantage to be better searchable with some documentation facility, which can even done automatically to some extend (thing about code completion in todays IDEs). But you could also create a similar documentation facility which includes language features. For example just type 'array' and the documentation would give a short info about the supported array-syntax in the language.

    In fact there is no single reason why learning a languages to a degree which is necessary to do real-world-work would differ if you put more features directly in the language. In fact it may be even the opposite: Is it really more easy to know all what is necessary about Java collections then about Ruby-collections? I doubt that.

  • Clear and simple and maybe even 'provable correct' design.

    While it's true, I ask: Are those properties of a language really necessary? To a certain degree and in certain domains: For sure. But in general? I doubt it. A 'industrial strength' language (i.e. a language which is created to do lots of real-world-work) has to ease the life of the programmer. It simply doesn't matter if the language has a nice, clear and simple design unless it really helps the programmer to accomplish his or her tasks.

    Interesting theoretical properties of a language are for sure a valid thing for research languages but thats only a very limited domain - even if it looks much bigger to language designers. But for real world programming other properties are much more important, so we really have to differentiate here.

  • Better extensibility.

    This seems to be the 'killer argument' for putting stuff in the libs instead of the language. But is this really true? Can you for example really extend Javas collections so easily? The answer is of course: No. You can of course write your on list-class, but you can't use it as freely as you want because you simply can't substitute your new list-class into existing code without modifying the source. Sure, you can exchange the system-classes with your own (as long as your own classes use the same interfaces) but thats not really sensible in most cases.

    You may suspect that thats not a fault of the idea itself but only of a certain language (like Java). In Ruby for example you can substitute classes very easy at runtime. In Haskell it all depends on which module you import. And in Lisp you can always write a clever macro. So maybe its possible if you just use the right language? Yes, to a certain degree it is, but it comes at a price: You can introduce difficult to find bugs in already tested code, you have to design the language in a certain way to do it (which makes it harder to use) and you need dynamic typing or rather sophisticated type-systems (if you want static typing) to do it.

    In the end it makes the language semantics more difficult to understand - and not because of usability purposes, but only to allow somebody to create things in the language which are already part of the language (even if implemented in the libs). And do we really need this kind of extensibility? You may think that having something is better then having it not, but all things come to a price and why do we want to pay this price for something we don't really need?

    All the above extensibility schemes have their limitations. Even with Lisp-Macros you can't create real transparent continuation support. Even with Haskells clever type and module system they still had to create syntax to support monads in the way they have it now. And in Ruby you have to pay with slow execution and the loss of compiler-time validity checks which could lead to hard to find errors.

    In fact most programmers never ever have to change the system libraries. Extend them? Maybe, to a certain degree. But its always possible to add functionality to a language by writing libraries. Maybe you can't use those extensions in the same way you can use the build in features, but is this such a problem? In fact it even has the advantage to make code better maintainable because it's always easy to distinguish between 'safe' language base-stuff and 'maybe faulty' self-written-stuff.

    In the early days of computer science, the design of programming languages was new, new algorithms and data-structures were invented 'every day' and in those times it really seems to be risky and to prohibitive to put to much things directly into the language. So after having experienced the difficulties tointegrate new concepts in rigid languages like Fortran it's understandable that the trend went in the direction of creating better extensible languages. But the times have changed. When did you really have invented a new basic algorithm or data structure? All those basic use-cases are well known for some time now and in most domains they simply don't change this often.

    So we simply don't need this much extensibility today if the language provides enough features on their own.

  • Better compilation.

    In fact its the opposite: A program in a rich language can be much easier compiled to an efficient native program simply because the compiler has more knowledge about the program. Sure, the compiler gets bigger, but it don't have to guesstimate what the programmer really wanted and more optimizations are possible.

    Fortran is still in use because the use of its array and matrix features which are build into the language allows easy vectorization and thus high-performance programs on otherwise hard-to-build-for computer architectures.

    Also lots of novel optimization techniques are possible. What about a collection class which isn't fixed to a certain implementation but where the compile decides which implementation to use depending on usage patterns, maybe even with the possibility to change this implementation after profiling at runtime? This would be impossible if the collection-type is implemented in a lib, because lib-code is like every-code or the compiler and it simply don't know about lots of optimization opportunities. But if the collection is part of the language the compiler is able to chose how to implement it, depending on how the collection is used. May indexed accesses? Use a array-based implementation. Many insertions and deletes? Use a linked-list one.

This ideas here could and should be extended to more higher-level structures. Collections are very basic but also very often used and thus crucial for efficient implementation and performance.

But also things like concurrency, component based architectures, data-base-access/persistence etc. can really benefit from being part of the language. Instead of writing something like

result = db.query("select name, value from data");
name = result["name"];
value = result["value"];

which is very error prone and inefficient to compile, we could have something like this:

select name, value from;
// name,value are stored directly into variables above

This would be easier to write, the compiler could check for syntax errors at compilation time, could automatically cache queries, do optimizations etc.

I believe that the minimality principle which is applied in most of of todays languages is in fact obsolete and should be questioned by everyone who starts to design a new programming language. Many languages did this to a certain degree but why stop there? Why not try to find all the common patterns programmers are doing and reimplementing each and every day and putting them directly into the language? Instead of researching of 'domain specific' languages, why not simply invent a language suited for the domain of every-day programming?


Ivan Tihonov said...

Take a look at Forth language. Extremely minimal - core syntax contains only three elements (words, numbers and spaces). But syntax is fully extendable.

Karsten Wagner said...

I've used Forth on the old 8-bit systems years ago. I found it very difficult to read because of the UPN-syntax and because you always have to count the stack depth inside your head to understand the structure of the program. And it was also quite error-prone because missing a single element somewhere could make a bit impact elsewhere.

Also it's not about extensibility (read the previous article in this blog about why I don't think that extensible languages are a good idea). Its about having a language to getting things done. And this means to have all the basic stuff you need for your day-by-day work as easy to use as possible.

Extensibility doesn't really helps here, because extending a language and creating good abstractions is a hard thing to do. So why not let the language designer do his job and create a full fledged language with all the common abstractions we need every day instead of giving us a language where we have to create all those abstractions ourself?

If I look for example in the C++ world I can't even count how many libs I've seen which all solve the same basic stuff. Like basic linear algebra (2/3-dim vectors etc) for example. All those libs are quite similar but no one is really the same as the others. All have their small advantages and disadvantages. If those libs were created and then put into the language standard life would be much more easy if you need this stuff - like for example the C++ STL which has replaced nearly all home-grown collection libraries because of this.

But lets make a step further: If a lib is part of the standard, why use a lib when you can have much better abstractions by creating a new syntax. Imagine what the STL could be if it has language support and syntactic sugar. I suppose the better usability (think of array comprehensions, type inference etc), better error messages and better performance (just compare array performance of the STL with the array performance of FORTRAN on a vector processor) should be a good reason to do it.

Anonymous said...

Calling Common Lisp a 'minimal language' is a bit of stretch don't you think?

Mike Harris said...

I think you are discussing two distinct ideas

1. Syntactic Sugar
2. Language X being written mainly in X / having a small set of orthogonal features, etc.

They are very different.

Karsten Wagner said...

If you count the language as 'everything which isn't part of a lib or implementable as macro' Common Lisp is a minimal language.

But the point of the article is not to to pigeonhole languages. In fact most language are somewhere on a long range between minimal and rich, but I think there is a clear prevalence on the 'minimal' side ot the range and I also think that to many language designers trying to be 'as minimal as possible' instead of going for 'as productive as possible' - which in my opinion means to go a bit more to the 'rich'-side of the range.

@Mike Harris:
I think the term 'syntactic sugar' is a problem on itself: Its a bit derogatory because if sounds like 'not really necessary' and this can easily lead to false conclusions.

I think 'sugar' can make programs much more clear and concise. An example is the for-each statement in Java 1.5. While it's really just a simple rewrite for the compiler, it makes the code much more readable and eliminates certain kinds of errors. If a language have many of such 'sugar' it can make a whole new language out of it usage-wise.

But of course there are things which are well beyond of 'sugar'. For example for collections: You simple declare something as collections and the compiler decides which kind of implementation to choose, depending on static or runtime checks. Sugar is always local while those kinds of optimizations requires context analysis. If you have a whole 'collection language' build in, you can hardly speak of 'just sugar'.

Also building things into the language can simplify the design. If the language supports generic collections as part of the language it maybe don't need to have a complex type-system which allows the user to create those generic collections in the language himself (sure, inside the compiler we have again some kind of type system for those collections, but because of the restricted use it can be much simpler).

Even if the language is build upon a very simple core system and the compiler uses some rewriting passes to rewrite the program from the source to the core code which is then traditionally compiled: There are advantages compared to the idea of exposing the core language directly an allowing the user to use it to design his own high-level-language. The main reason is that there is only one language build upon the core language and not multiple ones. This eliminates lots of difficulties which normally arise in such a system (module system, dependencies, error handling). Also the core language don't needs to much comfort if it is only needed as an intermediary target in the compilation process.

Anonymous said...

I think some types of languages, mainly functional, seem more minimal because they (like lisp) generally work on lists, and thus is actually powerful in a setting were you need to do a lot of list manipulation.

In the C-family direction of languages, I think that higher productiveness can be reached with fairly feature rich languages (more so than C and Java, C++ might be there, but makes it too difficult to use the more advanced features). I say this based on empiric self experience from using the D Programming Language, which is designed using C++ experience to alleviate what is difficult there, plus features like dynamic arrays, simple associate arrays, contracts, etc.

It is feature rich, and the man making the reference compiler adds that several of the features will make it easier to make a good optimizer. See also the Lisp vs D article on that site.