The Art of Programming: October 2006

Saturday, October 28, 2006

Reasons to refactor your code

Whenever reading your code, if you stumble upon one of the following cases, it is probably better to stop and refactor that piece of code.

Duplicate code - There is no reason to have duplicate code. Try to respect the DRY principle(Don't repeat yourself). As Parnas said best, "Copy and Paste is a design error". Also, coding will become absolutely boring.
A routine is too long - In OOP, you rarely need a routine longer than one screen. Consider breaking it into multiple routines.
A loop is too long or too deeply nested - Consider refactoring part of the code as routines, or changing the algorithm. Nested loops are one of the biggest performance penalties.
A class has poor cohesion - if a class has unrelated responsibilities, consider changing it.
A parameter list has too many parameters - If you need to pass too many parameters, consider merging them in a cohesive class or rethinking the problem.
Changes in a class tend to be compartmentalized - this may be a sign that the class should be broken into smaller ones.
Changes require parallel modifications in different classes - this is a sign that they are tied together. Try cut most of the dependencies. This kind of refactoring can be a real challenge, but it is worthy.
Inheritance hierarchies need parallel changes - This is a special kind for the problem above.
Case statements need parallel changes - consider using inheritance with polymorphism instead of case.
Related data items that are used together are not tied into classes - the first time you code/design, you may overlook some classes. Take your time and create them.
A routine uses more features of another class than of its own - probably it should be moved into the other class
A primitive data type is overloaded - For example, you may use int to represent both money and temperature. It is better to create a Money and a Temperature class. By doing so, you will be able to impose custom conditions on the types. Also the compiler will not allow you to mix money with temperatures.
A class doesn't do much - Maybe it should be merge with another.
A chain of routines passes tramp data - if a routine takes some data only to pass it to another, you should probably eliminate it.
A middle man object does nothing - same as above.
One class is very intimate to another - this works against one of your most powerful complexity management tools: encapsulation.
A routine has a poor name - In the best case, you can rename it. In the worst, the problem is the design(see a previous post called "About routines"). The name is just a sign. Anyway, take your time to solve this one.
Data members are public - This is plain wrong. Today, you can use properties in many programming languages, so hiding data behind them is very easy.
A subclass uses a small percentage of the parent class - usually, this denotes wrong inheritance design.
Comments are used to explain difficult code - Comments are very good, but creating difficult-to-understand code and commenting it is plain wrong.
Global variables are used - There are few cases when global variables are the only logical option.
You need setup/cleanup code before/after calling a routine - try to merge this code into the routine.
A program contains code that might be needed someday - The only way to write code taking into account future releases is to write it as clear and obvious as possible, enabling quick understanding and modification.

Friday, October 27, 2006

We are humans, so we make errors. There is no perfect software, meaning that there is no perfect programmer. Yet, there are immense differences between us. At one end there are the ones that never release a software because it is to buggy to be of real use, at the other end the ones that release almost-perfect software.
If you want to go closer to perfection, the first step is to learn how to learn from your mistakes. Only then you'll be able to make progress. Every error is an opportunity to:

Learn about the framework you're using - a bug may appear because you do not understand correctly the underlying technology. So it is an opportunity to improve your knowledge about it
Learn about the kind of mistakes you make - there are patterns in your errors, like keeping forgetting to initialize an array, making typos etc. Learn about them, so you can pay more attention to those certain facts when developing
Learn about the quality of your code from a reader's perspective - while debugging, you must read the code. You will observe its readability and the points that need improvement
Learn about how you solve problems - debugging is problem solving. It requires systematic approach, deep thinking etc. You can learn more about the effectiveness of your method, gradually improving it

So, you should be very happy when encountering a error! Not really, but try to treat as opportunities rather than problems, because int the long run it makes all the difference in the world.

Thursday, October 26, 2006

Undo closed tab in Firefox2

Firefox 2 has a very hand shortcut that undoes the closing of a tab: CTRL+SHIFT+T. Very neat.

Sunday, October 22, 2006

Formal inspections

Formal inspections are an efficient and easy way to find errors in software. A formal inspection is a meeting where the code is reviewed. It is planned, moderated and must have a concrete follow-up. When doing an inspection, there are some principles to keep in mind:

the scope of an inspection is finding errors, not correcting them
an inspection is not a personnel evaluation, so it is better to keep management out of this

People that participate in the inspection are assigned roles. Here they are:
Moderator: keeps the inspection running at the required pace, not to be too slow or too fast to catch errors. Must be technically competent, not necessary an expert, but must understand the important details. He organizes the meeting, by providing checklists for the others, setting the date, preparing the work environment etc. Also, he must make sure that there is action following the inspection. He is not directly involved in the inspection, instead he makes sure that it runs as planned.
Author: The author of the software. In case the reviewers are not familiar with the project, he holds an introductory session, providing general knowledge of it. Besides that, he has the duty to explain parts of the code that are difficult to understand when asked and to explain things that are treated as errors and are actually acceptable.
Reviewer: The one that finds the errors. Must prepare beforehand by reading the materials that are supplied by the moderator. Must keep the focus on error finding, not error repairing.
Scribe: The person that records all the errors that are found.

It is recommended to never have less than 3 persons in an inspection(at least the roles reviewer, author and moderator should be played by different persons). More than 6 persons is a bad idea, because the group becomes hard to manage. Also, I repeat that management should stay out of this. The presence of a manger will make it seem like a evaluation with consequences for the author. Also, being a technical activity, the presence of a manager will add little if any value to the inspection.

The inspection has some predefined steps.

Planning - the author gives the code to the moderator. The moderator must select the reviewers and provide them with information that keeps them focused on the important aspects.
Overview - when the reviewers are not familiar with the project, the author should make an overview of the system. Yet, this step should be left out, because the overview creates a mindset and this mindset might hide errors. The code and design should speak for themselves and need no introduction.
Preparation - The reviewers take some time alone to read the materials handed over by the moderator. They also have a look at the code, to get an initial idea about what comes next. They should be assigned perspectives, e.g. one should keep an eye on security, another on data validation etc.
Meeting - The actual meeting. Its rhythm is crucial. A slow meeting will decrease concentration; a fast one will let errors slip away. Even if there is no widely accepted inspection rhythm, probably 150-200 non-blank, non comment lines of code per hour represent a good start. Yet it greatly depends on complexity, design goals, experience and many others.
Report - In less than 24 hours, the moderator creates a report with all the discovered errors and their importance, making it public. That ensures that there will be no forgotten errors.
Rework - The initial author or another is assign to fix the discovered errors.
Follow-up - The moderator must decide if another inspection is needed to re-test the program and look at the previous inspections and try to find patterns in the errors, finding causes for them.
Third-hour meeting - Even if solutions are not to be discussed in the inspection, some might feel the need to talk about them. In this case, the moderator should organize an informal meeting with this scope.

The main objective of an inspection is to find errors. The author should not feel threatened. So, the reviewers must be trained in making good comments ("I never seen something so stupid" is not a good comment). Their job is to find errors, not evaluating the programmer. They should also not suggest solutions, respecting the author's right to do this.
Mixing evaluation with formal inspection is a very bad idea. The author will try to hide the errors and he will probably succeed, minimizing the inspection’s primary objective.
Skipping steps or mixing roles will greatly affect quality. If you cannot measure the effect of a change, you should not do it. The formal inspection is a qualitative process, so decreasing its quality will render it useless.

Saturday, October 21, 2006

Measuring routine complexity McGabe's way

A well known tehnique of measuring a routine's complexity is proposed by Tom McGabe, by counting the "decision points". The tehnique is described below:

Start with 1 for the straight path through the routine.
Add 1 for each of the following keywords, or their equivalents: if, while, repeat, and, or.
Add 1 for each case in a case statement

If the score is:

0-5: the routine is probably fine
6-10: start to think how to simplify the routine
>10: break the routine into smaller ones

These begin said, I personally think this method is far from perfect. Yet it is easy to understand and apply. Also, the studies show an improve in code quality when using it. Wouldn't it be nice to be intergated in your IDE?

Wednesday, October 18, 2006

http://www.gotapi.com/

http://www.gotapi.com/ is a very nice AJAXED API reference for html, css, js, php and many others. Very very handy

Tuesday, October 17, 2006

Common sense programming

In this post I will talk a little about some common sense in organizing your code. There is nothing brilliant in this post, yet common sense seems to be the first thing we lose whenever coding something a little complicated. First, any code structure falls in one of the following:

sequential structures
logical structures (if, case)
repetitive structures (for, while, do)
others (goto, recursion, return )

I'll talk a little about each

1. Sequential structures

Most of your code is executed once in a while. Yet even this can be confusing.

First, make sure that whenever the order of the statements counts, this is obvious. Let's take an example:

calculateMarketingIncomes();
calculateSalesIncomes();
getCustomerData();

Now, suppose the sales incomes can be calculated only after the marketing ones (probably, the first procedure initializes some variables that are needed for the second). More, the customer data cannot be accessed until the sales incomes are calculated. It is obvious that the code sequence does not give a clue about this. There are a few options you can consider in cases like this:

give better names: you can rename calculateMarketingIncomes to InitializeDataAndCalculateMarketingIncomes. Clumsy name? Maybe you should break your code into 2 procedures, InitiailzeMarketingData and CalcultateMarketingIncomes
have some parameters that make the sequence logical. Have a value returned and use it in the next procedure. That should make clear that the order is important
if and only if you cannot change the structure of the code, have comments that explain the importance of ordering the code that way

In other cases, the order does not matter for a certain step. Yet,the statements should be grouped logically. Consider the following bad example:

object a1=new object();
object a2=new object();
object a3=new object();
a1.doStuff();
a2.doStuff();
a3.doStuff();
a1.clean();
a2.clean();
a3.clean();

The bad thing is that the code for each object is interlaced with the others. When debugging a1, you have to look at 3 times the number of lines that you normally should. A better version is:

object a1=new object();
a1.doStuff();
a1.clean(;
object a2=new object();
a2.doStuff();
a2.clean();
object a3=new object();
a3.doStuff();
a3.clean();

This way, you will have to deal with less lines when interested in one object and it will be also easy to separate the code into routines if needed.

Logical structures

Logical structures are needed whenever one needs to execute a certain piece of code depending on certain factors. First, consider using routines for checking. Instead of

if ((ch>="a" && ch<="z") ||(ch>="A" && ch<="Z"))

consider

if(isChar(ch))

where isChar is a routine defined elsewhere.

Another thing to watch for is testing for the logical condition. Instead of testing for not not false, test for true.

Whenever you have nested if conditions, make sure the normal path is easy to follow. Try to put the usual case in the if clause and the unusual in the else. This will improve both readability and performance.
Repetitive structures

First of all, make sure you use the logically right structure. It is known that in languages such as C++ you can achieve with a for everything you can achieve with a while or do...while. Yet there is no reason not to use while or do...while if that improves readability.

Try not to nest more than 3 repeatitive structures. Besides the big performance penalty, it is known that our brains have troubles in understanding them.

Beware of cross talk. It is common to use variables as i,j,k to walk through repetitive statements. Yet it is common to manipulate j instead of i. Try to give the variables more meaningful names whenever you have nested control structures.

Others

Beware of gotos. gotos are not evil, but they really decrease your code readability. Use them only if no other options are available. And think it through before deciding that you have no more options. Cases when you really need them are very, very rare.

Beware of recursion. It is an elegant solution, yet very resource intensive. First, make sure you have an exit condition. Then, make sure that there will be no stack overflow. And, most important, consider not using it. The usual examples, like computing a factorial using recursion, are plain wrong. You can do these operation with a loop and you will save lots of resources (by eliminating the overhead of a function call) in so doing.

If your language supports foreach or any other kind of looping through array structures, use it! It increases readability.

Not that everything in this post may be obvious. Yet remember these advices when you are deep into your source code.

About Routines

*By routine, I understand a function, method or a procedure.

Programming always comes down to writing routines. So doing it better might help :). Here are some steps that may improve their quality

Start from a good design. Know in advance what routines your program needs and be sure that every routine does one thing. A routine that initialises the form, calculates default values and displays a welcome message is really bad.
Give the routine a good name. A good name describes specifically what a routine does, For a function, the name should represent the returned value(e.g. getFont, sin). The name of a procedure should be a verb followed by an object that clearly defines what the procedure does. PrintReport, LogError, SaveCustomerData are good routine names. HandleStuff, SaveData, InitializeData are bad names. If you cannot find a good name for your routine, there is a problem with your design. Either the routine does more than one logical action, either less than one, either you understanding of the problem is wrong. Bad names=Bad Code.
Start by writing pseudo-code. Pseudo-code is written in plain English. Pseudo-code should be clearly understood by someone unfamiliar with the specific programming language you use. After you are finished, revise it and write it in more detail. Repeat until it is harder to revise the pseudo-code than writing the code itself.
Comment your pseudo-code, and after each line write the correspondent line of program code. Programming should be an almost mechanical action by now. The technique of writing pseudo-code helps by unleashing you from the programming language details and by providing the code comments even before writing the code
Compile only at the end. Challenge yourself to write code that compiles. Compiling often distracts your attention and tempts you into using hacks just to see your code compiled.
No hacking! Using hacks is bad practice. Think of the right solution. A hack has the chance to remain there even if you say you will change it ASAP. Hacks pile up and soon the project gets out of control. It is better to go slower than hack your way through

Design tehniques

Design, unlike coding itself, is a heuristic process. There is no "perfect" way. You have to know what, how, when to use and yet you will still be wrong sometimes. It is a wicked problem. You cannot find a solution till you've solved it. You return to revising your design several times in a project(hopefully, fewer and fewer times as your experience increases). There is no advice in design to be taken for granted. Everything should be weighted against your project and you should always remember that your primary goal is to Manage Complexity(for more about this, read Code Complete 2nd edition, it is the source for many of my posts). These being said, here are some guidelines for problem solving.

1) Seeing the problem form multiple points of view

Very often I see programmers writing lots of heavy code just because they overlooked a much simpler solution. Never stop at your first idea. Probably the next one will be better. Learn to think from multiple point of view. Be creative. Design is heuristic, so it benefits from creativity.

2) Divide and Conquer

Learn to break the problem in smaller subsets, then think about each one separately. This way you reduce complexity

3) Top down

Start with a very abstract model of the system, then work your way down. It is easier to understand the puzzle when you have the big picture. Yet there are times when the big picture is not obvious...

4) Bottom up

Start with concrete cases, then try to work your way to the big picture.

5) Prototyping

Very often, you need to test your theory. For example, if the client asks for a database than supports 1000 transactions per second, be sure you create a quick and dirty model that tests just that on the database engine of your choice. Lots of design errors come from untested theories. Think of your prototype code as throwaway code. This helps you ensuring that you do not go into unnecessary detail. Prototype only what it is needed.

6) Team work

Often, someone else can see something you are overlooking. Be sure to ask for opinions as often as you can, it works wonders.

It is clear that the techniques I described should be used in a mixed way.

Kinds of names to avoid

Misleading name or abbreviations. Some time ago, I thought it was cool to name a variable that represents a Fuel Action Low Sequence Enzyme as FALSE. It is so easy to remember! Yet so confusing for the others or even for you after a while. The bottom line is: Never use this kind of abbreviations
Names with similar meanings. To have fileIndex and fileNumber in the same place is a bad idea. There is no logical way to tell which is which.
Different meanings, similar names. clientReps and clientRecs are hard to distinguish. They will probably create lots of confusion. Try to abbreviate differently or use synonyms.
Similar-sounding names. Goal_Donor and Gold_Owner are tricky. It is hard to distinguish them in speech, so consider changing them.
Numerals in names. There are few times when file1 and file2 are the most appropriate names. Almost in any case, you can do better than that.
Names that are easily misspelled.I find myself typing cleint instead of client very often. This may not be a problem in a strong typed language used with a IDE that has auto-complete features, but it is a big problem in a weakly typed one. Because you never know in what program language you will program in 10 years, it is better to build healthy habits. Most English handbooks contain lists of commonly misspelled words.
Do not differentiate only by letter capitalization. Even if most programming languages support this, it is probably a bad idea to do so. The "psychological" difference is too small.
Avoid multiple languages. Decide on the natural language your program is written in. It is common to find names in many languages in multinational programs. And this is a big problem.
Avoid using keyword names. Even if it sounds obvious, sometimes it happens. I had problems with SQL statements that used field names which where reserved keywords multiple times.
Unrelated names. It is fun to see variable names like little_johnny, pookie, lex_luther. But this guarantees that nobody will understand what you mean. So express your humor in other ways.

Managing Complexity

Software is complex. It is normal for you not to be able to know all the ins and outs of your program. Yet, you have to if you plan to be a successful programmer. The primary imperative in any kind of design is Managing Complexity. So, whenever the program it's close to get out of control, you probably need more design. You do not need design when you can understand it. In any moment of your development process, remember that your main goal is to Manage the complexity. Everything is secondary to this.

The Art of Programming