A common pattern in the design of programming languages seems to be to designing languages as 'slim' as possible. The idea is to create as few abstractions as possible directly in the language and then to implement all the useful things the programmer needs in the libraries. While there are obviously some differences in how strict this design pattern is applied, I think that it has to be scrutinized. Lets look at some examples:
In Java the language only implements rather simple collection types: Array with a fixed length. So all usable collections have to be created in the language itself which leads to lots of problems, inconsistencies and performance penalties. On the other hand, Java implements some things other languages use to implement in libraries directly in the language. A good example here is the 'synchronized' directive which makes the use of a certain concurrency pattern much easier to apply.
In Common Lisp nearly all features of the language are implemented in the language itself, but many of those features are part of the standard so they are available as a base to every programmer. The problem is still that those features are designed in a way that they can be implemented in the language itself (via macros) which leads to the well known s-expr-syntax which is hard to read for most programmer and one of the main reason, Lisp is criticized often.
Languages like Ruby and Python on the other hand have a very rich feature set build in directly into the language. I suspect that this richness is the main reason why those languages are considered more expressive by many, and not other features like object orientation or dynamic typing. In those languages you have good syntactic support for many common things like collection handling, regular expressions, big integers etc.
Then there is Haskell which is based on a very restricted central idea (a pure, lazy functional language) but extends that quite cleverly with lots of syntactic sugar and allows free definable operator definition. In Haskell you can not only overload existing operators you can also create arbitrary new operators with custom binding priorities. But things like the imperative looking support syntax for monads or the compact list comprehension syntax which both greatly enhance the usability of the language still needed to be implemented as special syntax because even with those free operator creation its not possible to express it as clearly in the base language directly.
I think that we can see a pattern here: Restricting the base language to strong can lead to limited expressiveness and to the need of lots of boilerplate code. Just compare a bit of Ruby:
a = [1, 2, 3, 4, 5, 6]
b = a[2..-2]
with the similar Java code
... // some adds omitted
The compactness and readability of the Ruby example results primarily from the rich syntax of the language concerning array use and not from the dynamic typing. Even in a statically typed language you could create something similar, for example by using type-inference. But even without someone could invent something like this:
a:[int] = [1, 2, 3, 4, 5, 6]
b:int = a[2..-2]
which is much more compact then the Java example, even with explicit type annotations.
There are lots of similar examples, so what could be the reason why language designers choose to not implement language support for something? Lets look at some possible reasons.
- Learnability. If the language has a small core someone could be proficient in it much faster.
- Clear and simple and maybe even 'provable correct' design. The language could be desigend much faster and also much easier. Also the language model may allow abstract theoretical reasoning about important things.
- Better extensibility. Because all higher-level features are implemented in the language itself they could also be implemented by a 3rd party which allows the creation of new features the designer hasn't thought of or allow to extend existing features with user defined ones.
- Better compilation. The compiler for a simple language is more simple and could thus be more sophisticated.
There are of course more reasons, but I hope I got the most mentioned here. So lets have a further look at them:
Learning a language also means to learn the libraries. If you know Java without knowing at least the base-libs you know nothing. So if you put a feature in a lib or put it in the language itself don't really matter.
You may suspect that libs have the advantage to be better searchable with some documentation facility, which can even done automatically to some extend (thing about code completion in todays IDEs). But you could also create a similar documentation facility which includes language features. For example just type 'array' and the documentation would give a short info about the supported array-syntax in the language.
In fact there is no single reason why learning a languages to a degree which is necessary to do real-world-work would differ if you put more features directly in the language. In fact it may be even the opposite: Is it really more easy to know all what is necessary about Java collections then about Ruby-collections? I doubt that.
- Clear and simple and maybe even 'provable correct' design.
While it's true, I ask: Are those properties of a language really necessary? To a certain degree and in certain domains: For sure. But in general? I doubt it. A 'industrial strength' language (i.e. a language which is created to do lots of real-world-work) has to ease the life of the programmer. It simply doesn't matter if the language has a nice, clear and simple design unless it really helps the programmer to accomplish his or her tasks.
Interesting theoretical properties of a language are for sure a valid thing for research languages but thats only a very limited domain - even if it looks much bigger to language designers. But for real world programming other properties are much more important, so we really have to differentiate here.
- Better extensibility.
This seems to be the 'killer argument' for putting stuff in the libs instead of the language. But is this really true? Can you for example really extend Javas collections so easily? The answer is of course: No. You can of course write your on list-class, but you can't use it as freely as you want because you simply can't substitute your new list-class into existing code without modifying the source. Sure, you can exchange the system-classes with your own (as long as your own classes use the same interfaces) but thats not really sensible in most cases.
You may suspect that thats not a fault of the idea itself but only of a certain language (like Java). In Ruby for example you can substitute classes very easy at runtime. In Haskell it all depends on which module you import. And in Lisp you can always write a clever macro. So maybe its possible if you just use the right language? Yes, to a certain degree it is, but it comes at a price: You can introduce difficult to find bugs in already tested code, you have to design the language in a certain way to do it (which makes it harder to use) and you need dynamic typing or rather sophisticated type-systems (if you want static typing) to do it.
In the end it makes the language semantics more difficult to understand - and not because of usability purposes, but only to allow somebody to create things in the language which are already part of the language (even if implemented in the libs). And do we really need this kind of extensibility? You may think that having something is better then having it not, but all things come to a price and why do we want to pay this price for something we don't really need?
All the above extensibility schemes have their limitations. Even with Lisp-Macros you can't create real transparent continuation support. Even with Haskells clever type and module system they still had to create syntax to support monads in the way they have it now. And in Ruby you have to pay with slow execution and the loss of compiler-time validity checks which could lead to hard to find errors.
In fact most programmers never ever have to change the system libraries. Extend them? Maybe, to a certain degree. But its always possible to add functionality to a language by writing libraries. Maybe you can't use those extensions in the same way you can use the build in features, but is this such a problem? In fact it even has the advantage to make code better maintainable because it's always easy to distinguish between 'safe' language base-stuff and 'maybe faulty' self-written-stuff.
In the early days of computer science, the design of programming languages was new, new algorithms and data-structures were invented 'every day' and in those times it really seems to be risky and to prohibitive to put to much things directly into the language. So after having experienced the difficulties tointegrate new concepts in rigid languages like Fortran it's understandable that the trend went in the direction of creating better extensible languages. But the times have changed. When did you really have invented a new basic algorithm or data structure? All those basic use-cases are well known for some time now and in most domains they simply don't change this often.
So we simply don't need this much extensibility today if the language provides enough features on their own.
- Better compilation.
In fact its the opposite: A program in a rich language can be much easier compiled to an efficient native program simply because the compiler has more knowledge about the program. Sure, the compiler gets bigger, but it don't have to guesstimate what the programmer really wanted and more optimizations are possible.
Fortran is still in use because the use of its array and matrix features which are build into the language allows easy vectorization and thus high-performance programs on otherwise hard-to-build-for computer architectures.
Also lots of novel optimization techniques are possible. What about a collection class which isn't fixed to a certain implementation but where the compile decides which implementation to use depending on usage patterns, maybe even with the possibility to change this implementation after profiling at runtime? This would be impossible if the collection-type is implemented in a lib, because lib-code is like every-code or the compiler and it simply don't know about lots of optimization opportunities. But if the collection is part of the language the compiler is able to chose how to implement it, depending on how the collection is used. May indexed accesses? Use a array-based implementation. Many insertions and deletes? Use a linked-list one.
This ideas here could and should be extended to more higher-level structures. Collections are very basic but also very often used and thus crucial for efficient implementation and performance.
But also things like concurrency, component based architectures, data-base-access/persistence etc. can really benefit from being part of the language. Instead of writing something like
result = db.query("select name, value from data");
name = result["name"];
value = result["value"];
which is very error prone and inefficient to compile, we could have something like this:
select name, value from db.data;
// name,value are stored directly into variables above
This would be easier to write, the compiler could check for syntax errors at compilation time, could automatically cache queries, do optimizations etc.
I believe that the minimality principle which is applied in most of of todays languages is in fact obsolete and should be questioned by everyone who starts to design a new programming language. Many languages did this to a certain degree but why stop there? Why not try to find all the common patterns programmers are doing and reimplementing each and every day and putting them directly into the language? Instead of researching of 'domain specific' languages, why not simply invent a language suited for the domain of every-day programming?