Modern Static Typing: Less Code, Better Code

or, "How Java/C++/C# Ruin Static Typing for the Rest of Us"
16 JUN 2003

In a recent piece called Strong Typing vs. Strong Testing, noted programmer and author Bruce Eckel makes an argument that dynamically typed languages such as Python are superior to statically typed languages such as Java and C++. I've done quite a bit of Python and Java programming, and even a little C++, so I can appreciate his position, but I think the conclusion goes too far. Whether Python is more productive than C++ or Java is one thing, whether static typing in general should be abandoned is quite another.

Mr. Eckel asks this provocative question:

... if strong static type checking is so important, why are people able to build big, complex Python programs (with much shorter time and effort than the strong static counterparts) without the disaster that I was so sure would ensue?
The "surprise" answer: Because they use unit testing! He makes the (correct) point that you can't know your program is correct without testing it, and those tests are in effect the specification for the program. What's unfortunate here is that there's an implication that people in favor of static typing don't use unit testing because their compilers automatically check their types. I don't think any static typing advocate would suggest that a compiler (any compiler) ensures program correctness. Testing is a fundamental requirement, with or without static typing. It is true that static typing as most programmers know it (from C-family languages like C++ and Java) is very restrictive, yet often inappropriate or (paradoxically) too weak. However, there are alternatives, and a good static type system is something no programmer should want to do without.

Mr. Eckel is primarily concerned with programmer productivity, a topic I'm tremendously interested in as well. He appreciates the productivity of Python's terse yet eminently readable syntax. He also is impressed with its powerful built-in data structures. Unfortunately, the article conflates the favorable traits of excellent syntax and useful data types with the unrelated issue of static typing. By comparing snippets of Java with the equivalent Python, Java is shown to be verbose and unwieldy. I'm certainly not going to argue here in favor of Java's syntax, and indeed the syntactic overhead of static typing in Java, C++, or C# is fearsome. However, these languages are hardly examples of good static typing. Consider Haskell, or perhaps Nice. The idea that Python, an especially productive dynamic language, is more productive than C++, an especially unproductive static language, is certainly not an argument against a good system of static typing.

And because I can get a Python program up and running in far less time than it takes you to write the equivalent C++/Java/C# program, I can start running the real tests sooner: unit tests, tests of my hypothesis, tests of alternate approaches, etc. And if a Python program has adequate unit tests, it can be as robust as a C++, Java or C# program with adequate unit tests (although the tests in Python will be faster to write).
Again, there's no question that Python is more concise than Java, C++, or C#. When one programs in Python after programming in C-family languages for a while, there's a tremendous sense of relief, a lifting of a great burden. I think some of that comes from the excellent syntax, and some from not having to write type declarations anymore. It's important to note, though, that neither of these things has anything to do with static typing. Firstly, much of the pleasantness of programming in Python comes from its expressiveness, which has nothing to do with static typing, for example the fact that Python's someString[1:-1] is so much nicer than the equivalent Java, someString.substring(1, someString.length() - 1). Secondly, the Java, C#, and C++ compilers need type declarations any time a type is used, but languages which support type inference, such as Haskell and ML, do not impose this burden. Just as avoiding type declarations boosts productivity in Python, many modern statically typed languages reap the same benefit.

This shook my unquestioning acceptance of strong type checking (acquired when moving from C to C++, where the improvement was dramatic) enough that the next time I examined the issue of checked exceptions in Java, I asked "why"? which produced a big discussion wherin I was told that if I kept advocating unchecked exceptions, cities would fall and civilization as we know it would cease to exist
This quote begins a section discussing checked exceptions in Java, as an example of what's wrong with static typing. Let me say that I find checked exceptions tedious and nearly useless. I agree they are indeed a misfeature of Java's typing system. They are not, however, an argument that static typing is inherently a waste of time. Many statically typed languages do not use checked exceptions. For instance, the Nice language is a language which runs on the Java virtual machine. It is statically typed, and even though its type system is much stronger and more flexible than Java's, it doesn't include checked exceptions. So just because you don't like checked exceptions, you don't have to give up on static typing altogether.

Argument types and return types? Let the language sort it out!
Indeed, I think we should. You can write a Haskell or Objective Caml program without specifying types for one single argument, return value, or local variable. So there's no keypunching burden on the programmer. Some folks may not even recognize this is static typing, because there are no type declarations, but the compiler really will check your types. The difference between letting the compiler figure out your types for you and figuring them out yourself is the compiler will actually get it right.

Even if you have a language that requires some type declarations, it needn't be comparable to the verbosity of Java or C++. For instance, the Nice language requires type declarations for the first time a method is declared, but it doesn't require them when you override the method, nor does it require type declarations for local variables in most cases.

Static type systems actually prove the correctness of some important aspects of your program. You still need to test, but the benefit of static typing is, you don't have to write the tests you would have written just for checking types. This is an important point: A good type system (and remember, Java/C++/C# are not what I'd call good type systems) imposes a little bit of discipline, not undue hardship, and in return the compiler saves you from writing pages and pages of tests (which also need to be debugged, and maintained) just to ensure you're not trying to do something silly like take the square root of a hash table. I suspect that the sheer amount of test code that would be required to duplicate the safety of a type checker is so great that people using dynamically typed languages don't write many of these tests at all. Presumably though, they do write at least some of them. This means that someone using a static type system has fewer tests to write, and is therefore free to spend more time writing the important tests, because the compiler has checked the fundamentals already. Plus, the fact that an entire category of errors is guaranteed not to occur when your clients run your software should be a great relief to any programmer.

There's an interesting passage in the article suggesting that static typing is too restrictive because it doesn't allow one to write code which works with unrelated classes that happen to have methods with the same signature. Mr. Eckel begins with an example in Java showing how two classes, Cat and Dog, can be related by a Pet interface, and then shows code that exercises both Dogs and Cats via methods on their common interface, Pet. He argues that the requirement that Cats and Dogs have a common interface in order to write code that works with both of them is too restrictive. He offers this example of how a dynamic language will allow us to write code which works with instances of any class, so long as they support the right methods:

But the interesting part is this: because the command(pet) method doesn't care about the type it's getting, I don't have to upcast. So I can rewrite the Python program without using base classes:
# Speaking pets in Python, but without base classes:
class Cat:
    def speak(self):
        print "meow!"

class Dog:
    def speak(self):
        print "woof!"

class Bob:
    def bow(self):
        print "thank you, thank you!"
    def speak(self):
        print "hello, welcome to the neighborhood!"
    def drive(self):
        print "beep, beep!"

def command(pet):
    pet.speak()

pets = [ Cat(), Dog(), Bob() ]

for pet in pets:
    command(pet)
Since command(pet) only cares that it can send the speak() message to its argument, I've removed the base class Pet, and even added a totally non-pet class called Bob which happens to have a speak() method, so it also works in the command(pet) function.
For a long time after I learned Python I was very impressed with so-called duck typing, this notion that "If it walks like a duck, and talks like a duck, it must be a duck". It's very simple, and it's immediately intuitive - if the object has a method with the right signature (which only means, "same name and right number of parameters" in Python), you can call it, just like the speak() methods above. The class pedigree of the object doesn't matter. The problem is, just because a method has the right signature doesn't mean it should be used. Consider:
class Artist:
    def draw(self):
        print "Sketch!"

class Gunslinger:
    def draw(self):
        print "Bang!"

for obj in [Artist(), Gunslinger()]:
    obj.draw()
Artist.draw and Gunslinger.draw are clearly not even conceptually related, and our Artist has wandered unwittingly into a gunfight. I considered this problem to be fairly unimportant from a practical standpoint, because it usually doesn't happen very often. Eventually though, I realized that this problem could be solved in the context of a statically typed language - not Java or C++ of course, but a language with a real type system, and if you can solve a problem without much effort, why not do so?

One of the reasons duck typing is so attractive is because it's an escape from the pain you may have experienced working with C++ or Java. The pain of not being able to explain to the compiler than even though Cat and Dog have no common superclass or interface, they both have speak() methods, and they mean the same thing. You'd like to be able to write something like

class Dog {
  public void speak() { System.out.println("Woof!"); }
}

class Cat {
  public void speak() { System.out.println("Meow!"); }
}

interface Speaker {
  public void speak();
}

public void command(Speaker s) {
  s.speak();
}

public void makeThemSpeak() {
  command(new Dog());
  command(new Cat());
}
but you can't, because even though Cat and Dog plainly could implement the Speaker inteface, they weren't written that way. For instance, they may be from a binary-only third-party library, and therefore can't be changed. "Curses," one might say, and "this static typing just gets in my way!" Again, we've fallen into the trap of equating C++ and Java with static typing. In Nice, you can do just what you wanted to do:

//Declarations for Dog and Cat
class Dog {
  public void speak() { println("Woof!"); }
}

class Cat {
  public void speak() { println("Meow!"); }
}

//A type annotation for things which can speak
abstract interface Speaker {
  public void speak();
}

//Here we make these classes implement Speaker, after the fact.
class test.Dog implements Speaker; 

speak(obj@Dog) { obj.speak(); }

class test.Cat implements Speaker;

speak(obj@Cat) { obj.speak(); }

//And now we can implement our command function:
<Speaker S>command(S thing) {
  thing.speak();
}

void makeThemSpeak() {
  command(new Dog());
  command(new Cat());
}
We got the flexibility we wanted, without losing the safety of static type checking. We also got additional power in the bargain, because now we can selectively bless any class we like into the Speaker interface, without including classes we don't want - we can ensure that our Artist doesn't end up in a gunfight by mistake! This is one of the most important points about static typing - solving the problem within the system, or by moving to a new system, is usually preferable to throwing out static typing altogether. Note that a type system like Java's won't support a method like command() unless Dog and Cat have a common supertype. You'll have to give up and use reflection. Don't judge static typing by Java's example.

If you consider the example with the Artist and the Gunslinger, you'll see that there are two scenarios that need to be handled when you abandon static type checking - first, an exception may be thrown at runtime, or (as in the gunfight) there may be no exception, and it will take a human tester to notice something's gone wrong - and there may well be no way to write a unit test for this error! Note that static typing protects against both problems. Even in the case of a runtime exception, the source location of the exception may have very little to do with the source location of the problem. Consider:

# in file A.py
run_queue = []

# in file B.py
A.run_queue.append(RunnableThing())

# in file C.py
A.run_queue.append(eval(raw_input("Enter an expression")))

# in file D.py
A.run_queue.append(0)

# in file E.py
while len(A.run_queue):
  runnable_thing = A.run_queue[0]
  del A.run_queue[0]
  runnable_thing.run()  
So when we run E.py, we'll get an AttributeError at runtime, something like 'int' object has no attribute 'run', and a stack trace that will point to the line in E.py where we tried to call run() on an int. This location is not the location we care about however. We want to know, how did something other than a RunnableThing get into our list (in this example, it could be either of two places)? Short of rooting through all your source code (not practical or fun) or periodically testing the contents of run_queue to try to deduce when the bad item gets added (tedious and dubious), what can you do? Well, one reasonable way to handle it would be to restrict what can be added to the list. After all, that's where the error is really occurring, regardless of how long it takes before an exception is thrown. You might write something like:
#in file A.py

class MySafeList(UserList.UserList):
  def append(self, thing):
    if not isinstance(thing, RunnableThing):
      raise TypeError, "Only RunnableThings can go in the queue!"
    UserList.UserList.append(self, thing)

  #override other methods which change contents of the list ...

run_queue = MySafeList()
Now you have a run_queue that will report the error where the mistake was made, which is a very desirable property. Of course, there are several other methods which can change the contents of a sequence object in Python, and you'll have to override those too to ensure safety. Incidentally, you'd have to write the same code in Java, despite the fact it's a statically typed language. This is what similarly safe code looks like in Nice:
var List<RunnableThing> runQueue = new ArrayList();
In this case it's even shorter in Haskell (this is type inference at work):
-- Just like Python, but safe!
run_queue = []
Even now that we've written these extra lines of boilerplate code in our dynamically typed language (our goal was to raise programmer productivity, recall), it's still not as safe as with a type checker, because you still only find out about violations at runtime, and only if you actually run the code that contains the error. There's no unit test you can write to prevent this error, because the problem isn't with run_queue, it's with the clients that use run_queue! A good type system can test all this for you, and for the code which uses your code, virtually for free.

When people who are accustomed to dynamic languages encounter strong static typing, often they will make a remark like, "What? I can't make a list that will hold arbitrary objects? How oppressive! I'll never get anything done!" I know I thought things like that after working with Python for a while. It doesn't hold up when you encounter a modern static type system, however. The simple fact is, the number of times I require a list of arbitrary objects is exactly zero. Any time I make a list, it's because I want to do something with the contents, which means I'm going to be storing things that are related in some way, and if I'm going to be working with objects that are conceptually related, why not relate them explicitly with a type, to let the type checker help me write less code and fewer trivial tests?

Of course, it should be mentioned that even the strongest of static typing systems offers an escape hatch for those situations where you really, really need to do something the type checker can't handle. In Java, C++, or C#, it happens all the time. In Haskell, or Nice, or Ocaml it's rarely needed at all. The facility is there if you really need it, though. For instance, Nice offers both the cast() function, and regular Java reflection. Haskell has unsafePerformIO and dynamic types. Objective Caml has Obj.magic and dynamic types. So if you really need to do something unusual and difficult, a static type checker won't stop you. The other 95% of your program that doesn't need to do strange and magical things gets the benefit of type checking, which means you can concentrate your attention on the interesting part.

At this point, a strong, statically-typed language would be sputtering with rage, insisting that this kind of sloppiness will cause disaster and mayhem. Clearly, at some point the "wrong" type will be used with command() or will otherwise slip through the system. The benefit of simpler, clearer expression of concepts is simply not worth the danger. Even if that benefit is a productivity increase of 5 to 10 times over that of Java or C++.
But you don't have to forfeit static typing to get that kind of productivity boost. You just need to stop using C++ or Java.

I'm not contesting the fact that static typing imposes a terrible burden on the programmer in languages like Java and C++, but please, let's say something like "Python is a more productive language than C++", and not "In fact, what we need is strong testing, not strong typing." We need both strong testing and strong typing.

It's interesting that it takes an earth-shaking experience -- like becoming test-infected or learning a different kind of language -- to cause a re-evaluation of beliefs.
I couldn't agree more. I hope more people will take the step of learning a language with a powerful static type system, like Nice, Haskell, or Ocaml before deciding against the very idea of static typing.