Tuesday, November 28, 2006

References violate the principle of encapsulation

Every language with allow the use of direct references to objects also makes encapsulation and thus code-reuse much more unlikely. And because every OOP-language has those references thats the reason why encapsulation in OOP doesn't work the way it's advertised by its promoters.

Let's elaborate on this: Any reference is a direct connection to a certain object. That's quite convenient: If we have a deep hierarchy of objects we can simple retrieve a reference to one of those 'deep' objects (lets call it 'B') to an outside object O, and now O is able to directly manipulate B without having to walk thru the hierarchy for every operation.

But what if our object O changes B directly but it's required that some other object C notice it, because it needs to reflect changes to B somehow? Think of a Gui for example: 'A' may be a Button and 'C' some container. If we change the text of 'B', 'C' has to be notified to layout its children to give 'B' enough space to display it's text.

But because 'O' can modify 'B' directly, 'C' won't notice it and our Gui won't work as expected. Sure, as seasoned programmers we already know a solution: The Observer-Pattern. Let 'B' send some notifications for any interesting change to other objects which then can do what they have to do.

But is this really a good solution? It works, but everybody who've done it on a certain scale knows how tricky it can get: We need notification-events, receivers, registration/unregistration, prevent cycles add lots of additional code for every method which has to send a notification etc. It's a real mess. They even invented aspect oriented programming because of this problem, but AOP is like trying to extinguish a fire with gasoline: Instead of solving the roots of the problem they put another layer of complexity on top of it.

So what's the root of the problem? I've outlined it above: It happens only because we have the ability to modify objects directly without its container having to notice it. And thats because of the existence of direct references.

Sure, we gain some convenience in the beginning because references allow us to talk to a object directly - but as a result we need huge amounts of additional code for notifications, observer, doc/view-patterns etc. And it also opens up lots of opportunities to make mistakes. Lots of. Also code reuse get much more difficult because we have much more dependencies on the internal structure of the problem then really necessary. And this is all because the use of references violates the principle of encapsulation.

Is there a solution to this problem? Sure: Getting rid of references altogether! They are simply to tempting.

It's like every low-level mechanism: They give you more power then needed and because of this they can and will be abused. Without references one always has to ask the container via some key for a certain contained object. And you will never get a reference to the object, you will only get the value of the object - which isn't mutable and thus can't be misused. Changing an object would always involve its container and so there would be no need for observers, notifications etc. It may be a little more difficult in the beginning but on the long run the profit would be huge. And with certain concepts build right into the programming language, most unpleasantcies can be avoided. But to solve a problem, it's necessary to first notice that it even exists.

Some people will say now: I as the programmer want all the freedom because I a able use it in the right way. Even if this is a correct way of thinking, having references in the language will in fact limit your freedom nonetheless. Why? Because if there are references in the language the compiler has to follow certain rules and can't do certain optimization which are necessary to create performant code without using references! In other words: By giving you references a language also forces you to use them because alternatives are to expensive. So your freedom is in fact limited in either way. And I for myself would rather be limited to a solid and more error-proof way of programming instead of a risky and error-prone one.

(About other problems with references and how to solve certain problems without them, I wrote about here).


Anonymous said...

First of all: Very good work! I have been reading your blog and you pinpoint very accurately the limitations of varius programming concepts and I agree with your views to an extend. But i think with references and immutability you gone a step too far.

In the example you introduced, think of the case were you have a value that has to be displayed in two different locations at the GUI and each component has to be informed about any change of that value. The solution where the component encapsulates the value doesn't really work in this case. In a pure functional environment you have to reconstruct the hole component with the new value instead of the old one witch is far more complex compared to the solution with the Observer-Pattern. Even if you allow some mutation in the shared-value case, you will end up with your own error prone implementation of Observer-Pattern. I think, it is always better to use a million-times-tested solution (eg. Observer-Pattern implementation by the java library) for you problem than to implement one yourself.

The problem I think is not in the existence of references and side-effects witch are welcomed when dealing with GUIs and MVC. Nor the use of design patterns. The problem, as you said in another post [Rich languages vs. minimal languages], is the lack of support of the Observer-Pattern by the syntax of the programming language itself. The programmer would have nothing to worry about if the language use a build-in implementation of the pattern and provided him with some special syntax.

BTW: I completely agree with you on AOP!

Thank you for your time.
Lefteris kritikos [el01049 at mail dot ntua dot gr]
Please hide my email.

Samuel A. Falvo II said...

Lefteris, I agree completely with Karsten. OO has liberated programmers insofar as modeling is concerned, but relational approaches have, are, and always will be, the most flexible, and safe, approach to realizing software.

Integrating relational constructs into programming languages would be the ideal situation, but even without them, I'm able to realize sophisticated software using Forth and plain vanilla C in times comparable (not quite as fast, but close) to coders working in OO languages of similar abstraction level.

You talk about using the observer pattern. What does the observer pattern consist of? A table -- a table of observers, a table with a single column, each of which is a reference to who is listening. But, can we implement this better, without relying on those references? The answer is Yes, if you don't mind the performance hit.

Which brings us to the real purpose for references: they serve the role of indices, and as such, objects which contain references to other objects essentially employ what Oracle calls an "index optimized table", or IOT. In other words, you're prematurely optimizing. Are those references really needed?

In the case of observer pattern, sure. In other cases? I'm not so sure.

So, it turns out that both you and Karsten are correct when viewed at the global scope -- you simply fail to delineate the necessary preconditions under which to consider when to, and when not to, employ references.

I will be blogging about this on my own blog shortly.