@ Loup's Impossible? Like that would stop me.

June 2009. Rewritten in July 2010 (first version).

Assignment Statement Considered Harmful

In his essay “Go To Statement Considered Harmful” Edsger W. Dijkstra demonstrated how the use of goto made programming harder. Now, goto is considered harmful, and has been replaced by more reasonable constructs. I attempt here to demonstrate the same about the assignment statement.

This is old news. Any programmer who has been exposed to functional languages and practices knows about that. I just didn’t found it formulated in this way, as a direct attack of this seemingly fundamental feature.

What makes a good program

A good program solves your problem, has no error, and is easy to understand and modify.

We humans have certain limitations. The greatest here is our short term memory. We can’t work with a whole program. We can only deal with it piece by piece.

Therefore, to be easy to understand, a program must be divided into pieces small enough so they fit in our short term memory. Moreover, each of those pieces must stand alone, or require as little external knowledge as possible (they must be loosely coupled).

Assignment, functions and procedures

Most programming languages are build around two features: the assignment statement and functions.

Functions are very simple: given some parameters, they produce a result, which depends only on the parameters. Same parameters, same result. Note that a function exposes a well defined, and typically small interface: its parameters and its result.

The assignment statement is even simpler. It puts a value in a variable. Note that it introduces the notion of time: before the assignment, the variable holds one value. After, it holds another. This is the most basic form of side effect. (I prefer to say just “effect” because most of the time, we want it.)

With both assignment and functions, we can build procedures. Procedures are like functions, but more capable. Like functions, they take parameters and may return a result. Unlike functions, they can directly interact with the outside world, and have effects beyond their result.

This comes with a price, however: a bigger and less explicit interface. Procedures expose more than just their arguments and result. They may depend on things that can change over time (mutable state), and may mutate state themselves. These additional dependencies are often implicit. For instance, a procedure can take no argument, return no result, yet have loads of implicit dependencies and effects.

The conclusion is obvious: with their smaller and more explicit interface, functions are easier to deal with than procedures. Therefore, procedures should be avoided whenever possible, and insulated otherwise. And so should the assignment statement (for it makes procedures possible).

Concrete drawbacks

Pervasive use of the assignment statement also have concrete, readily visible drawbacks: it encourages the confusion between values and variables, makes program analysis and refactoring harder, and can even hurt performance.

Confusing values and variables

The assignment statement is not directly at fault here. Its pervasive use, however, influenced many programming languages and programming courses. This resulted in a confusion akin to the classic confusion of the map and the territory.

Compare these two programs:

(* Ocaml *)        │    # most imperative languages
let x = ref 1      │    int x = 1
and y = ref 42     │    int y = 42
in x := !y;        │    x := y
   print_int !x    │    print(x)

In Ocaml, the assignment statement is discouraged. We can only use it on “references” (variables). By using the “ref” keyword, the Ocaml program makes explicit that x is a variable, which holds an integer. Likewise, the “!” operator explicitly access the value of a variable. The indirection is explicit.

Imperative languages don’t discourage the use of the assignment statement. For the sake of brevity, they don’t explicitly distinguish values and variables. Disambiguation is made from context: at the left hand side of assignment statements, “x” refer to the variable itself. Elsewhere, it refers to its value. The indirection is implicit.

Having this indirection implicit leads to many language abuses. Here, we might say “x is equal to 1, then changed to be equal to y”. Taking this sentence literally would be making three mistakes:

  1. x is a variable. It can’t be equal to 1, which is a value (an integer, here). A variable is not the value it contains.

  2. x and y are not equal, and will never be. They are distinct variables. They can hold the same value, though.

  3. x itself doesn’t change. Ever. The value it holds is just replaced by another.

The gap between language abuse and actual misconception is small. Experts can easily tell a variable from a value, but non-specialists often don’t. That’s probably why C pointers are so hard. They introduce an extra level of indirection. An int * in C is roughly equivalent to an int ref ref in Ocaml (plus pointer arithmetic). If variables themselves aren’t understood, no wonder pointers look like pure magic.

Program analysis an refactoring

In high school, a definition like “let a = x + 1” meant any occurrence of “a” or “(x + 1)”, can be replaced by the other without changing the meaning of what is written. They are equivalent, and therefore substitutable. Imperative programs are more complicated:

int x = 42
...
x := 7
...
print(x);

Q: What does that print?

A: So that prints (looking for the definition of x) 42! right? Oh, crap, I forgot that x has been modified —err, I mean, the value it initially held has been replaced by another. So, that should be 7. But, I don’t know this code very well, x may be referenced from elsewhere and modified behind my back… Ahhrrg!! (ripping my hair off)

Same problem with refactoring. If x was immutable, any of its occurrences could be replaced by 42. Like you would naturally do in pen-and-paper mathematics. Unfortunately, x is not immutable. Tread carefully, and mind your hair.

Remember: using the assignment statement has a cost. When allowed, many assumptions about the program have to be dropped. Algebraic properties are lost. Some transformations don’t preserve meaning any more. Think twice before you use it.

Performance

Another thing you lose when you allow assignment is sharing. It can matter when you manipulate big containers or other complex data structures. Basically, there are three ways to manipulate a data structure:

  1. Directly modify it (assignment allows that).

  2. Copy the whole thing, then work on the copy.

  3. Share the parts of the old structure that you didn’t want to change in the first place.

Each way have its problems. Way 1 is effectively an assignment. This is often most efficient, but, like I said, that’s also Bad™. Way 2 wastes unspeakable amounts of time and memory. Way 3 is reasonably efficient, but is wildly unsafe if you ever allow yourself to use way 1 (modifying shared parts is rarely a good idea).

You often need to take snapshots of the state of a big data structure. Maybe you run an algorithm with backtracking. Maybe you produce intermediate results. Maybe you use the same data structure for different purposes. If assignment is allowed, the only safe way to take your snapshot is to perform a deep copy. That can kill performance. If assignment is not allowed, then taking a snapshot is instantaneous, way 3 becomes safe, and the program is overall more efficient.

If you don’t believe in efficient immutable data structures, you may want to check Okasaki’s thesis.

Now please