Wednesday, May 07, 2008

What makes programming so difficult - and can we make it easier?

I often thought about the reasons why programming seems to be so difficult and also so different to many other professions. And to what degree it's possible to simplify and quicken the process. Inspired by this blog post, I want to share my view of the topic here.

What's the 'process' behind programming? How do we do it? I think that we can break down the process of programming info three steps:

  • Step 1: Analyse the problem you want to write a program for and create a model of the problem which can be implemented as a program.

    This step has do be done no matter which programming language you use for implementation. It requires both domain specific knowledge as implementation specific knowledge. It also requires knowledge how people interact with computers, about which things are solvable with computer and which are not, etc. This step is something no program can do (at least unless we have real general AI with 'whole-world-knowledge').

    And it's also the reason why there are good and bad programmers. Bad programmers aren't able to really understand the problem in all it's details. Because of this they tend to 'emulate' or 'simulate' it step by step (for example by looking how a human solves the problems the program should do and than writing a program which does it similar). But simulation isn't the same as creating a solution based on thorough understanding. Simulation leads to more lengthy and at the same time less comprehensive programs. Also it's much more probable that the program isn't working correctly because by just simulating it's easy to overlook inconsistencies or limitations. Better programmers who really understand the problem can create an abstraction which solves the problem and detect inconsistencies or flaws of the problem description. I think this step is the main reason why it takes so long to get from a beginner programmer to a good one and why certain people never get it. It takes a lot of time and practice to really understand real world problems in a way deep enough to create an abstract model of the problem or process which can later translated into code. In normal life, people don't have to understand exactly how they do what they do: They know the general rules and if they come across an exceptions from this rules, they simple get creative. But to create a program which is able to do the same, you have to think about all those possibilities before even starting the implementation. This kind of analytical thinking, combined with the ability to acquire the necessary knowledge from a previously unknown domain is the real key to create a good solution.

  • Step 2: Recursively break down the model into sub-models until you reach a level where you can implement such a sub-model with a library or by writing code.

    This step starts to depend on the implementation-language/framework. The more powerful it is, the less breaking down we have to do. Bad programmers may have problems in this step too, because they don't know their tools (or the market with alternative tools) good enough to lead this process in the right direction. In this step also most of the algorithms, data-structures and 'patterns' have to be chosen. This also requires a lot CS-specific knowledge on those topics. In this step the quality of the programmers matters a lot, but it's mainly a question of education and training.

  • Step 3: Implementing the sub-models (the actual coding).

    This is the simplest part of programming. Beginners and non-programmer tend to overestimate this step tremendously, because this is where 'the real work' is being done. But in fact the success and amount of work in this step depends hugely on the quality of work in the previous steps. This step is also relatively easy to learn. In principle everybody can do it with a little training. On this level, there's also not much distinction between good and bad programmers. It's nothing but grunt work. But this is also the step where languages matter most, because they can remove much of the work in this step by providing better and faster ways to do it.

All this steps above are done repetitively in most projects. Even master programmers overlook things in step 1 so they have to go back after getting more insight after working with implementations from step 3. But the main points here are:
  • Step 1 is the most important part of programming, because the quality of work in this step determines how much work and what quality the following steps will bring.

  • Step 1 is also the hardest and less learn able skill of every programmer. Companies try to use 'software architects' to do quality work in this step because it's much easier do find grunt workers for step 3 than programmers which do good work in step 1

  • Step 2 shouldn't be underestimated because it has a big impact on the actual work to do in step 3. This step requires lots of 'high-level-knowledge' of frameworks, patterns, data-structures, libraries etc. but not much domain-specific knowledge. This step is where the expertise of experienced 'senior-programmers' can count much. It's also the step people learn to to if they study computer sciences.

  • Step 3 is the part where the hard but easy work is to be done. Companies often try to source it out into low-wage countries. But the problem is that there's always a strong coupling between all three steps and that the knowledge gained in step 3 is needed to improve the models in step 1 and step 2. That's why small teams with good programmers which do both analyzing and implementing have a big benefit here: It's simply less likely that they run into dead-ends because they have a much better overview over the project.

  • Step 3 is also the most work intensive part of the job. It often requires more than 90% of the total working time. But this is also the reason why better programming languages and better frameworks can bring huge productivity benefits. It's also the reason why good programmers are more productive: By creating better abstractions and structures in step 1 and step 2, they eliminate lots of the work they otherwise had to do in step 3. But productivity in this step don't depends that much on the quality of a programmer.

The above explains why it's so hard to teach 'good programming':

It is relatively easy to teach people how to to step 3. I think most people can learn this. At this step programming is like a craft.

Step 2 is much more difficult because the required knowledge is more abstract and also requires more experience and training. To be successful here, a programmer needs good knowledge of data-structures and algorithms, of frameworks and libraries. Much of this don't even depends on a certain language but is general abstract knowledge about programming. But learning abstract things is more difficult for many people than simply learn the concrete syntax and semantics of a programming language. So it takes more time to become successful in this step and there will be people who will never 'get it'. This is also the step where programming is very similar to engineering.

Step 1 is the hardest of them all. I don't even know how to teach this step. I think the only way is to work as programmer and acquire the necessary knowledge by time with experience. Learning the way to think to be able to do this step successfully is rather 'unnatural' because in normal life it would be considered as pedantic, nitpicking and overly-analytic. This may be the reason why good programmers are often looked at as a little bit strange - if they use their problem analysing skills in real life to often by their surroundings. It's also the area where programming becomes kind of an 'art' and where most of the 'design part' of programming lies. But since this step can't be isolated from the other steps, good programmes can't be just designers, they still have to be also engineers and craftsman.


This breakdown also explains why the language often doesn't matter as much as many people estimate: A good programmer with good step-1/2-skills will create models which are much easier and faster to implement in any language. So even if the implementation is done in language A with only 30% of step-3-productivity as language B, the better models, frameworks and libraries can easily outweigh this, creating a superior net productivity in the end.


Conclusion: In the end programming is at the same time craft, engineering and art. It only depends on which step you focus. And while it's possible to split up those steps between different people, the tight coupling between those steps imposes hard limits on how successful this split-up can be done. I think it would be the better way to improve the tools and languages to make step 3 as least cumbersome as possible. But all this won't change the fact that programming is much more than step 3 alone and that it still depends on people who are good at steps 1 and 2 to create good programs.

6 comments:

Jinal Jhaveri said...

Amazing Article!! Couldn't agree more with what you wrote!!

Peter said...

Most programs are not written in the "think very hard, then write down the correct answer" style you describe. Instead, they are hacked together bit by bit, and the process of building is itself a process of understanding the problem. The "think very hard" approach is only useful when the problem is specified to a degree that I have only encountered in homework assignments and never in the real world.

saprasad said...

Its well said. I have experienced bit of Step 1 whenever I try solving a problem which is very abstract and not obvious.
Once I get a knack and design the generic solution, I just go about thinking of implementation basically what programming constructs to be used to implement the same. Lot of times because of this approach the problem gets solved at the generic level and gets rid of most of the issues which mostly arises at the specific cases.

Karsten Wagner said...

@peter: While I kind of agree (I pointed out in the article, that those three steps are executed over and over because by doing steps 2 and 3 you learn more about the problem and have to redo step 1 also again and again) I also don't think that a chaotic "just let's start to write code and look where it leads" process you describe works well. In fact this way is a sure sign of "beginner at work". A "hacking pieces together" approach seems to work in certain cases, but the reason for this is that those people are either (1) good enough to do step 1 without really knowing it, (2) encounter a problem they have encountered and solved before or (3) simply don't have a clue and the resulting programm will turn out as crap.

The pure "hacking" approach seldom works because you have to understand the problem you want to solve first (seems obvious, doesn't it?). And this can be rather hard if the problem is complex or not well specified (which it nearly always is). I've experienced enough programmers who wrote programs without doing a solid step 1 analysis first - the results where mostly horrible. And it's not that you have to sit down for a week and think about the problem before writing the first line of code. Often (if you're good in it) it can be a rather quick process: You sit together with the customer for some hours and "simply" ask the right questions - and in the process you understand the problem better than the customer himself. Or if you don't have a customer, you try to ask yourself or the people you're working with those right questions. The problem here is that bad programmes aren't able to find those "right questions", while it looks like a piece of cake for good programmers.

To many programmers are much to facinated with solving small problems (writing "beautiful code") and underestimate the big picture at the same time. That's the reason so many programs out there are crap: Overly complicated and at the same time unnecessary hard to use.

Anonymous said...

I think it's hard to say that there is any one type of great programmer or development process. I know some hackers that spend two weeks thinking through a problem before touching code and hackers that start writing code right away and will revise 100 times.

Both ways seem to work equally well and quickly, and there's a lot of room in between. Perhaps these are the standards and process that work for you as a good programmer, but to qualify that as *the* definition may be overstating it.

Anonymous said...

This is called AGILE