Pages

Wednesday, August 23, 2006

Java enhancements I would like to see

After some critique of the new closure proposal (because of it's redundancy) I want to talk about some Java extensions I would like to see. Those extensions complement Java in useful ways to make some often used idioms a bit more easy to use without promoting a totally different programming style.

Counter for the for-each-loop:


I often write something like this:

int i = 0;
for(Something s: data) {
if (i++ > 0) out.print(", ");
out.print(s);
}

While this works, it would be more easy that a for-each loop creates it's own counter on demand:

loop: for(Something s: data) {
if (loop.count > 0) out.print(", ");
out.print(s);
}

We can also extend this idea to supplying 'isLast' and 'isFirst' variables to make the above code more readable:

loop: for(Something s: data) {
if (!loop.isFirst) out.print(", ");
out.print(s);
}

A 'isLast' is much more seldom used but has it's benefits sometimes. While it's not so important, it's a bit more difficult to implement because we have to delay the evaluation of the loop-body by one element to know if the actual element is really the last one. But if it's required it's quite useful and could make loops much easier readable.

Parallel iteration


Sometimes you have to compare two collections, add them element-wise etc. So the new for-each loop is useless and you need to use the old explicit one. This could be easily changed with an additional modification to the for-each loop:

for(Something s1: data1; Something s2: data2) {
...
}

This loop will loop until if one of the iterators has no elements left. It's a really straight forward extensions and should be quite easy to implement.

'on-break' extension for loops


This one is to solve constant annoyance: Figuring out if a loop terminated normally or by a break.

Imagine you have something like

boolean has_error = false;
for(Something v: data) {
if (v.containsError()) {
has_error = true;
break;
}
}
if (has_error) {
...
}

This is a very common idiom for handling breaks: Simply setting a flag to know if a loop breaks. But you have to do it by hand and thus it's error prone (imagine, setting has_error = false by accident). Sometimes it's possible to do the stuff in if (has_error) { ... } in the loop-body, but there are lots of cases where this won't work. Also it won't work if the break is created by the compiler like above in parallel-iteration.

So I propose the following syntax:

for(Something v: data) {
if (v.containsError()) break;
}
catch() {
... // executed on break only
}

This is really straightforward and don't use any new keyword.

Allow for each top iterate if supplied with an Iterator and not only an Iterable


Sometimes is useful to provide different kinds of iterators per collection. Maybe:

Iterator forwardIterator();
Iterator backwardIterator();
Iterator depthFirstIterator();
Iterator filterIterator(Filter filter);

In the moment we have to make all those Iterators self Iterable and add a

Iterator iterator() { return this; }

method to the iterator. It would be better if you could use iterators directly in a for-each loop and write for example:

for(Element e: tree.depthFirstIterator()) {
...
}

even if depthFirstIterator() only gives a normal Iterator.

Make iterators 'Closable' and close them in the for-loop automatically.


This is very important if you use for example iterators to iterate over a result set from an SQL query. In the moment it often prevents the usage of the for-each loop, because you have no access to the iterator object there. The problem: You have to create a new version of the iterator interface to prevent breaking existing code. So we have

interface Iterator2 extends Iterator, Closeable {
void close();
}

and a matching Iterable2 definition. Also the for-each loop has to be aware of this and generates a call to close() for Iterator2 iterators. The Iterator2 thing could be avoided if the below 'default-methods for interfaces' extension is implemented.

A 'using' statement like in C#


We all know how annoying correct exception handling and proper closing is in some cases. To make this easier I propose using 'try' for a 'using-like' construct:

try [type] [var] = [expr] { [body] }

catch blocks can optionally added to dispatch on exceptions like usual, but the 'try'-block will close [var] properly without requiring all those nested try-catch-finally mumbo-jumbo.

A 'instanceof' with autocasting


While it's often bad style to write something like

if (x instanceof SomeClass) {
SomeClass x1 = (SomeClass)x;
...
}

it's often unavoidable. This idiom has two problems: First you have to invent a new name for 'x' and second the compiler won't check if the cast is correct. Also it's bloat. So I propose this syntax:

for(x instanceof SomeClass) {
... // x is casted here to 'SomeClass' properly
}

This is very easy to implement, don't requires new keywords and solves the above problems.


And after all those (nice and easy) 'syntactic sugar' now to the more interesting, but also less easy ones:

Default implementations for interfaces


This is a bit controversial, but I think the benefits are bigger then the risks. Interfaces remains field-less, but the method can contain default implementations. If you create a class C which implements an interface I, all default method-implementations in I will be copied to the body of C unless C implements those method itself.

With this feature interfaces could be more rich without requiring too implement each and every method in each class which implements the interface. This would lead to better code reuse, because interfaces could be made more customizable. Also you get less legacy issues, it's no more problem to extend an interface without the need to update all classes who implements this interface.

Consumer/provider reversal ('yielding')


If you have to write a more complex iterator which iterates over recursive data structures you know the problem: You have to do many things by hand, the compiler does normally itself. You need to provide your own stack, store the state of the iteration etc. This is very cumbersome and error-prone.

This problem is most often avoided by using closures for iteration. With this the state of iteration is handled as usual and the code becomes much clearer. Lets look at a simple example using closures:

class Tree {
E value;
Tree left, right;

interface ForEach {
void eval(E value);
}

void forEach(ForEach func) {
func.eval(value);
if (left != null) left.forEach(func);
if (right != null) right.forEach(func);
}

void forEachDepthFirst(ForEach func) {
if (left != null) left.forEach(func);
if (right != null) right.forEach(func);
func.eval(value);
}

void print(final PrintWriter out) {
final boolean[] is_first = new boolean[] { true };

forEach(new ForEach() { public void eval(E value) {
if (is_first[0]) is_first[0] = false;
else out.print(", ");
out.print(value);
}});
}
}


With this implementation we can easily iterate over the tree in two different ways (depth first and depth last). But it shows two problems:
- First we have to maintain a 'is_first' state to place the ", " correctly.
- But really worse: It won't work with parallel iteration

If we have had an iterator implementation for 'forEach' and 'forEachDepthFirst' we could for example write

void isEqual(Tree t1, Tree t2) {
for(E v1: t1; E v2: t2) {
if (v1.equals(v2)) break;
}
catch {
return false;
}
return true;
}

The advantage with this approach is that it would work for every Iterable, but how would you solve this problem with using closures and 'forEach'? You have to provide additional implementations for two trees, three trees etc. And what if you want to compare the content of a List and a Tree element-wise? Here Iterators are much more powerful. And because they are a part of Java for some time now, I propose to make the implementation easier instead of switching to a different and even less powerful abstraction.

The idea is well known as 'yielding'. It simply allows to suspend the execution while saving the state for later resume. While this can implemented with threading, its a very expensive way to do it. So its better to do it by code rewriting directly in the compiler.

How would a iterator with yielding look? Consider the below method as part of the class above:

Iterator iterator() {
yield value;
if (left != null) for(E v: left) yield v;
if (right != null) for(E v: right) yield v;
}

That's it. As easy as it gets. 'yield' will suspend the execution and provide the result via the 'next()' method of the Iterator to the consumer. The problem here is the use of a new keyword. This I propose using 'throw return' for this. While this isn't totally nice, it somehow captures the idea of yielding and is short enough. To make the method as yielding the compiler could interfere this himself, or we could use the 'throw' keyword to mark those methods explicitly:

public throw Iterator depthFirst() {
if (left != null) for(E v: left) throw return v;
if (right != null) for(E v: right) throw return v;
throw return value;
}

With this extension, iterators are quite easy to implement, but I know that this extension is a bit difficult implement. But it could provide a huge gain in productivity and more people would provide useful implementations of iterators.

Including inheritance


This is a bit more complex but could bring lots of benefits in code reuse and maintenance. It's a bit similar to traits, and has quite simple semantics.

Lets have an example:

class MyListModel implements ListModel {
public int getSize() { ... }
public Object getElementAt(int index) { ... }

public void addListDataListener(ListDataListener l) { ... }
public void removeListDataListener(ListDataListener l) { ... }
}

The problem are those two method at the bottom. Often you want to use a standard implementation, and only want to implement the getSize and getElementAt method. To prevent this reimplementation there is a AbstractListModel class in the libs which have a standard implementation for addListDataListener and removeListDataListener. While this works often, it's problematic if MyListModel also should extends another class, or if you want a different standard implementation.

So I propose to simply include standard implementations for interfaces:

class DataListenerImpl implements ListModel {
public void addListDataListener(ListDataListener l) {
listenerList.add(ListDataListener.class, l);
}
public void removeListDataListener(ListDataListener l) {
listenerList.remove(ListDataListener.class, l);
}

private EventListenerList listenerList = new EventListenerList();
}

This class won't work on it's own, but with the new extenson you could include it in another one:

class MyListModel implements ListModel {
import DataListenerImpl;

public int getSize() { ... }
public Object getElementAt(int index) { ... }
}

that's it, MyListModel now have the methods and fields from DataListenerImpl which satisfy the ListModel interface. The fun part of this extension is that it's possible to split up implementations into several topics and the include them where you want them. So maybe you write a class

class MapModelImpl implements ListModel {
public int getSize() { map.size(); }
public Object getElementAt(int index) { ... }

private Map map;
}

you can then simply synthesize a new ListModel by writing

class MyListModel implements ListModel {
import DataListenerImpl;
import MapModelImpl;
}

This would work for much more complex cases then described above and the example is far from complete, but the idea should be clear. I can won't elaborate here on the details (constructor handling, collisions), maybe in a later post.

1 comment:

Anonymous said...

A bit off topic, but a cleaner way of doing the is_first logic in the ForEach callback is to add a member variable to the anonymous class, rather than using the hack of creating a boolean array, i.e.:
void print(final PrintWriter out) {
     forEach(new ForEach() {
         private boolean is_first = true;
         public void eval(E value) {
             if (is_first) is_first = false;
             else out.print(", ");
             out.print(value);
         }
      });
}