The C++ Resource Management Model (a.k.a. Why I Don’t Want Your Garbage Collector)

My Language of Choice

“What’s your favorite programming language, Dan?”

“Oh, definitely C++”

Am I a masochist? Well, if I am, it’s irrelevant here. Am I just unfamiliar with all those fancy newer “high-level” languages? Nope, I don’t use C++ professionally. On jobs I’m writing Swift, Java, Kotlin, C# or even Ruby and Javascript. C++ is what I write my own apps in.

Am I just insane? Again, if I am, it’s not the reason for my opinion on this matter (at least from my possibly insane perspective).

C++ is an incredibly powerful language. To be fair, it has problems (what Bjarne Stroustrup calls “barnacles”). I consider 3 of them to be major. C++20 fixed 2 of them (the headers problem that makes gratuitous use of templates murder your compile time and forces you to distribute source code, fixed with modules, and the duck typing of templates that makes template error messages unintelligible, fixed with concepts). The remaining one is reflection, which we were supposed to get in C++20, but now it’s been punted to at least C++26.

But overall, I prefer C++ because it is so powerful. Of all the languages I’ve used, I find myself saying the least often in C++ “hmm, I just can’t do what I want to do in this language”. It’s not that I’ve never said that. I just say it less often than I do in other languages.

When this conversation comes up, someone almost always asks me about memory management. It’s not uncommon for people, especially Java/C# guys, to say, “when is C++ going to get a garbage collector?”

C++ had a garbage collector… or, rather, an interface for adding one. It was removed in C++23. Not deprecated, removed. Ripped out in one clean yank.

In my list of problems/limitations of C++, resource management (not memory management, I’ll explain that shortly) is nowhere on the list. C++ absolutely kicks every other language’s ass in this area. There’s another language, D, that follows the design of C++ but frees itself from the shackles of backward compatibility, and is in almost every way far more powerful. Why do I have absolutely no interest in it? Because it has garbage collection. With that one single decision, they ruined what could easily be the best programming language in existence.

I think the problem is a lot of developers who aren’t familiar with C++ assume it’s C with the added ability to stick methods on structs and encapsulate their members. Hence, they think memory management in C++ is the same as in C, and you get stuff like this:

Programmers working in languages without garbage collection (like C and C++) must implement manual memory management in their code.

Even the Wikipedia article for garbage collectors says:

Other languages were designed for use with manual memory management… for example, C and C++

I have a huge C++ codebase, including several generic frameworks, for my own projects. I can count the number of deletes I’ve written on two hands, maybe one.

The Dark Side of Garbage Collection

Before I explain the C++ resource management system, I’m going to explain what’s wrong with garbage collection. Now, “garbage collection” has a few definitions, but I’m talking about the most narrow definition: the “tracer”. It’s the thing Java, C# and D have. Objective-C and Swift don’t have this kind of “garbage collector”, they do reference counting.

I can sum up the problem with garbage collector languages by mentioning a single interface in each of the languages: IDisposable for C#, and Closeable (or Autocloseable) for Java.

The promise garbage collectors give me is that I don’t have to worry about cleaning stuff up anymore. The fact these interfaces exist, and work the way they do, reveals that garbage collectors are dirty liars. We might as well have named the interfaces Deletable, and the method delete.

Then, remember that I told you I can count the number of deletes I’ve written in tens of thousands of lines of C++ on one or two hands. How many of these effective deletes are in a C#/Java codebase?

Even if you don’t use these interfaces, any semantically equivalent “cleanup” call, whether you call it finish, discard, terminate, release, or whatever, counts as a delete. Now tell me, who has fewer of these calls? Java/C# or C++?

C++ wins massively, unless you’re writing C++ code that belongs in the late 90s.

Interestingly, I’ve found most developers assume when I say I don’t like garbage collectors that I’m going to start talking about performance (i.e. tracing is too slow/resource intensive), and it surprises them I say nothing about that and jump straight to these psuedo-delete interfaces. They don’t even know how much better things are in my world.

If you doubt that dispose/close patterns are the worst possible way to deal with resources, allow me to explain how they suffer from all the problems that manual pointers in C suffer from, plus more:

  • You have to clean them up, and it’s invisible and innocuous if you don’t
  • If you forget to clean them up, the explosion is far away (in space and time) from where the mistake was made
  • You have no idea if it’s your responsibility. In C, if you get a pointer from a function, what do you do? Call free when you’re done, or not? Naming conventions? Read docs? What if the answer is different each time you call a function!?
  • Static analysis is impossible, because a pointer that needs to be freed is syntactically indistinguishable from one that shouldn’t be freed
  • You can’t share pointers. Someone has to free it, and therefore be designated the sole true owner.
  • Already freed pointers are still around, land mines ready to be stepped on and blow your leg off.

Replace “pointer” and free with IDisposable/Closeable and Dispose/close respectively, and everything carries over.

The inability to share these types is a real pain. When the need arises, you have to reinvent a special solution. ADO.NET does this with database connections. When you obtain a connection, which is an IDisposable, internally the framework maintains a count of how many connections to are open. Since you can’t properly share an IDisposable, you instead “open” a new connection every time, but behind the scenes it keeps track of the fact an identical connection is already open, and it just hands you a handle to this open connection.

Connection pooling is purported to solve a different problem of opening and closing identical connections in rapid succession, but the need to do this to begin with is born out of the inability to create a single connection and share it. The cost of this is that the system has to guess when you’re really done with the connection:

If MinPoolSize is either not specified in the connection string or is specified as zero, the connections in the pool will be closed after a period of inactivity. However, if the specified MinPoolSize is greater than zero, the connection pool is not destroyed until the AppDomain is unloaded and the process ends.

This is ironic, because the whole point of IDisposable is to recover the deterministic release of scarce resources that is lost by using a GC. By this point, you might as well just hand the database connection to GC, and do the closing in the finalizer… except that’s dangerous (more on this later), and it also loses you any control over release (i.e. you can’t define a “period of inactivity” to be the criterion).

This is just reinvented reference counting, but worse: instead of expressing directly what you’re doing (sharing an expensive object, so that the last user releasing it causes it to be destroyed), you have to hack around the limitation of no sharing and write code that looks like it’s needlessly recreating expensive objects. Each time you need something like this, you have to rebuild it. You can’t write a generic shared resource that implements the reference counting once. People have tried, and it never works (we’ll see why later).

Okay, well hopefully we can restrict our use of these interfaces to just where they’re absolutely needed, right?

IDisposable/Closeable are zombie viruses. When you add one as a member to a class A, it’s not uncommon that the (or at least a) “proper” time to clean up that member is when the A instance is no longer used. So you need to make A an IDisposable/Closeable too. Anything holding an A as a member then likely needs to become an IDisposable/Closeable itself, and on and on. Then you have to write boilerplate, which can usually be generated by your IDE (that’s always a sign of a language defect, that a tool can autogenerate code you need but the compiler can’t), to have your Dispose/close just call Dispose/close on all IDisposable/Closeable members. Except that’s not always correct. Maybe some of those members are just being borrowed. Back to the docs!

Now you’re doing what C++ devs had to do in the 90s: write destructors that do nothing but call delete on all pointer members… except when they shouldn’t.

In fact, IDisposable/Closeable aren’t enough for the common case of hierarchies and member cleanup. A class might also hold handles to “native” objects that need to be cleaned up whenever the instance is destroyed. As I’ll explain in a moment, you can’t safely Dispose/close your member objects in a finalizer, but you can safely clean up native resources (sort of…). So you need two cleanup paths: one that cleans up everything, which is what a call to Dispose/close will do, and one that only does native cleanup, which is what the finalizer will trigger. But then, since the finalizer could get called after someone calls Dispose, you need to make sure you don’t do any of this twice, so you also need to keep track of whether you’ve already done the cleanup.

The result is this monstrosity:

protected virtual void Dispose(bool disposing)
{
    if (_disposed)
    {
        return;
    }

    if (disposing)
    {
        // TODO: dispose managed state (managed objects).
    }

    // TODO: free unmanaged resources (unmanaged objects) and override a finalizer below.
    // TODO: set large fields to null.

    _disposed = true;
}

I mean, come on! The word “Dispose” shows up as an imperative verb, a present participle, and a past participle. It’s a method whose parameter basically means “but sort of not really” (I call these “LOLJK” parameters). Where did I find this demonry? On Microsoft’s docs, as an example of a pattern you should follow, which means you won’t just see this once, but over and over.

Raw C pointers never necessitated anything that ridiculous.

For the love of God keep this out of C++. Keep it as far away as possible.

Now, the real question here isn’t why do we have to go through all this trouble when using IDisposable/Closeable. Those are just interfaces marking a uniform API for utterly manual resource management. We already know manual resource management sucks. The real question is: why can’t the garbage collector handle this? Why don’t we just do our cleanup in finalizers?

Because finalizers are horrible.

They’ve been completely deprecated in Java, and Microsoft is warning people to never write them. The consensus is now that you can’t even safely release native resources there . It’s so easy to get it wrong. Storing multiple native resources in a managed collection? The collection is managed, so you can’t touch it. Did you know finalizers can get called on objects while the scope they’re declared in is still running, which means they can get called mid-execution of one of their methods? And there’s more. Allowing arbitrary code to run during garbage collection can cause all sorts of performance problems or even deadlocks. Take a look at this and this thread.

Is this really “easier” than finding and removing reference cycles?

The prospect of doing cascading cleanup in finalizers fails because of how garbage collectors work. When I’m in a finalizer, I can’t safely assume anything in my instance is still valid except for native/external objects that I know aren’t being touched by the garbage collector. In particular, the basic assumption about a valid object is violated: that its members are valid. They might not be. Finalizers are intrinsically messages sent to half-dead objects.

Why can’t the garbage collector guarantee order? This is, I think, the biggest irony in all of this. The answer is reference cycles. It turns out neglecting to define an ordered topology of your objects causes some real headaches. Garbage collectors just hide this, encourage you to neglect the work of repairing cyclical references, and force you to always deal with the possibility of cycles even when you can prove they don’t exist. If those cyclical references are nothing but bags of bits taken from the memory pool, maybe it will work out okay. Maybe. As soon as you want any kind of well-ordered cleanup logic, you’re hosed.

It doesn’t even make sense to try to apply garbage collectors to non-memory resources like files, sockets, database connections, and so on, especially when you remember some of those resources are owned by entire machines, or even networks, rather than single processes. It turns out that “trigger a sequence to build up a structure, then trigger the exact opposite sequence in reverse order to tear it town” is a highly generic, widely useful paradigm, which we C++ guys call Resource Allocation Is Initialization.

Anything from opening and closing a file, to acquiring and releasing a mutex, to describing and committing an animation, can fall under this paradigm. Any situation you can imagine where there is a balanced “start” and “finish” logic, which is inherently hierarchical: if “starting” X really means to start A, B then C in that order, then “finishing” X will at least include “finishing” C, B then A in that order.

By giving up deterministic “cleanup” of your objects in a language, you’re depriving yourself of this powerful strategy, which goes way beyond the simple case of “deleting X, who was constructed out of A, B and C, means deleting C, B and A”. Deterministically run pairs of hierarchical setup/teardown logic are ubiquitous in software. Memory allocation and freeing is just a narrow example of it.

For this reason, garbage collection definitely is not what you want to have baked into a language, attempting to be the one-size-fits-all resource management strategy. It simply can’t be that, and then since the language-baked resource management is limited in what it can handle, you’re left totally out to dry, reverting to totally manual management, of all other resources. At best, garbage collection is something you can opt into for specific resources. That requires a language capability to tag variables with a specific resource management strategy. Ideally that strategy can be written in the language itself, using its available features, and shipped as a library.

I don’t know any language that could do this, but I know one that comes really close, and does allow “resource management as libraries” for every other management technique beside tracing.

What was my favorite language, again?

The Underlying Problem

I place garbage collection into the same category as ORMs: tools that attempt to hide a problem instead of abstract the problem.

We all agree manual resource management is bad. Why? Because managing resources manually forces us to tell a system how to solve a problem instead of telling it what problem to solve. There’s generally two ways to deal with the tedium of spelling out how. The first is to abstract: understand what exactly you’re telling the system to do, and write a higher level interface to directly express this information that encapsulates the implementation details. The other is to hide: try to completely take over the problem and “automagically” solve it without any guidance at all.

ORMs, especially of the Active Record variety, are an example of the second approach applied to interacting with a database. Instead of relieving you from wrestling with the how of mapping database queries to objects, it promises you can forget that you’re even working with a database. It hides database stuff entirely within classes that “look” and “act” like regular objects. The database interaction is under-the-hood automagic you can’t see, and therefore can’t control.

Garbage collection is the same idea applied to memory management: the memory releases are done totally automagically, and you are promised you can forget a memory management problem even exists.

Of course not really. Beside the fact, as I’ve explained, that it totally can’t manage non-memory reosurces, it also really doesn’t let you forget memory management exists. In my experience with reference counted languages like Swift, the most common source of “leaks” aren’t reference cycles, but simply holding onto references to unneeded stuff for too long. This is especially easy to do if you’re sticking references in a collection, and nothing is ever pruning the collection. That’s not a leak in the strict sense (an object unreachable to the program that can’t be deleted), but it’s a semantic leak with identical consequences. Tracers won’t help you with that.

All of these approaches suffer from the same problems: some percentage, usually fairly high (let’s say 85-90%) of problems are perfectly solved by these automagic engines. The remaining 10-15% are not, and the very nature of automagic systems that hide the problem is that they can’t be controlled or extended (doing so re-exposes the problem they’re attempting to hide). Therefore, nothing can be done to cover that 10-15%, and those problems become exponentially worse than they would have been without a fancy generic engine. You have to hack around the engine to deal with that 10-15%, and the result is more headaches than contending directly with the 85% ever would have caused.

Automagic solutions that hide the problem intrinsically run afoul of the open-closed principle. Any library or tool that violates the open-closed principle will make 85-90% of your problems super easy, and the remaining 10-15% total nightmares.

The absolute worst thing to do with automagic engines is bake them into a language. In one sense, doing so is consistent with the underlying idea: that the automagic solution really is so generic and such a panacea that it really deserves to be an ever-present, and unavoidable all-purpose approach. It also significantly exacerbates the underlying problem: that such “silver bullets” are never actually silver bullets.

I’ve been dunking on garbage collectors, but baking reference counting into a language is the same sort of faulty reasoning: that reference counting is the true silver bullet of resource management. Reference counting at least gives us deterministic release. We don’t have to wrestle with the abominations of IDisposable/Closeable. But the fact you literally can’t create a variable without having to manage an atomic integer is a real problem inside tight loops. As I’ll get into shortly, reference counting is the way to handle shared ownership, but the vast majority of variables in a program arent shared (and the ones that are usually don’t need to be). This causes a proliferation of unnecessary cyclic references and memory leaks.

What is the what, and the how, of resource management? Figuring out exactly what needs to get released, and where, is the how. The what is object lifetimes. In most cases, objects need to stay alive exactly as long as they are accessible to the program. The case of daemons that keep themselves alive can be treated as a separate exception (speaking of which, those are obnoxious in garbage collected languages, you have to stick them into a global variable). For something to be accessible to the program, it needs to be stored in a variable. In object-oriented languages, variables live inside other objects, or inside blocks of code, which are all called recursively by the entry function of a thread.

We can see that the lifetime problem is precisely the problem of defining a directed, non-cyclical graph of ownership. Why can there not be cycles? Not for the narrow reason garbage collectors are designed to address, which is that determining in a very “dumb” manner what is reachable and what is not fails on cycles. Cycles make the order of release undefined. Since teardown logic must occur in the reverse order of setup (at least in general), this makes it impossible to determine what the correct teardown logic is.

The missing abstraction in a language like C is the one that lets us express in our software what this ownership graph is, instead of just imagining it and writing out the implications of it (that this pointer gets freed here, and that one gets freed there).

The Typology of Ownership

We can easily list out the types of ownership relationships that will occur in a program. The simplest one is scope ownership: an object lives, and will only ever live, in a single variable, and therefore its lifetime is equal to the scope of that one variable. The scope of a variable is either the block of code it’s declared in (for “local” variables), or the object it’s a instance member of. The ownership is unique and static: there is one owner, and it doesn’t change.

Both code blocks and objects have cascading ownership, and therefore trigger cascading release. When a block of code ends, that block dies, which causes all objects owned by it to die, which causes all objects owned by those objects to die, and so on. The cascading nature is a consequence of the unique and static nature of the ownership, with the parent-child relationship (i.e. the direction of the graph) clearly defined at the outset.

Slightly more complex than this is when an object’s ownership remains unique at all times (we can guarantee there is only ever one variable that holds an object), but the owner can change, and thereby transfer from one scope to another. Function return values are a basic example. We call this unique ownership. The basic requirement of unique ownership is that only transfers can occur, wherein the original variable must release and no longer be a reference to the object when the transfer occurs.

The next level of complexity is to relax the requirement of uniqueness, by allowing multiple variables to be assigned to the same object. This gives us shared ownership. The basic requirement of shared ownership is that the object becomes unowned, and therefore cleaned up, when the last owner releases it.

That’s it! There’s no more to ownership. The owner either changes or it doesn’t. If it does change, either the number of owners can change or it can’t (all objects start with a single owner, so if it doesn’t change, it stays at 1). There’s no more to say.

However, we have to contend with the limitation of being directed. The graph of variable references is generally not directed. This is why we can’t just make everything shared, and every variable an owner of its assigned object. We get a proliferation of cycles, and that destroys the well-ordered cleanup logic, whether we can trace the graph to find disconnected islands or not.

We need to be able to send messages in both directions. Parents will send messages to their children, but children need to send messages to parents. To do this, a child simply needs a non-owning reference back to its parent. Now, the introduction of non-owning references is what creates this risk of dangling references… a problem guaranteed to not exist if every reference is owning. How can we be sure non-owning references are still valid?

Well, the reason we have to introduce non-owning references is to send messages up the ownership hierarchy, in reverse direction of the graph. When does a child have to worry if its parent is still alive? Well, definitely not in the case of unique ownership. In that case, the fact the child is still alive and able to send messages is already proof the (one, unique) parent is still around. The same applies for more distant ancestors. If an ownership graph is all unique, then a child can safely send a message to a great-grandparent, knowing that there’s no way he could still exist to send messages if any of his unique ancestors were gone.

This is no longer true when objects are shared. A shared object only knows that one of its owners is still alive, so it cannot safely send a message to any particular parent. And thus we have the partner to shared ownership, which is the weak reference: a reference that is non-owning and also can be safely checked before access to see if the object still exists.

This is an important point that does not appear to be well-appreciated: weak references are only necessary in the context of shared ownership. Weak references force the user to contend with the possibility of the object being gone. What should happen then? The most common tactic may be to do nothing, but that’s likely just a case of stuffing problems under the rug (i.e. avoiding crashing when crashing is better than undefined behavior). You have to understand what the correct behavior in both variations (object is still present, and object is absent) when you use weak references.

In summary, we have for ownership:

  1. Scope Ownership
  2. Unique Ownership
  3. Shared Ownership

And for non-owning references:

  1. “Unsafe” references
  2. Weak references

What we want is a language where we can tell it what it needs to know about ownership, and let it figure out from that when to release stuff.

Additionally, we want to be able to control both what “creating” and “releasing” a certain object entails. The cascading of scope-owned members is given, and we shouldn’t have to, nor should we be able to, modify this (to do so breaks the definition of scope ownership). We should also be able to add additional custom logic.

Once our language lets us express who’s an owner of what, everything else should be take care of. We should not have to tell the program when to clean stuff up. That should happen purely as a consequence of an object becoming unowned.

The Proper Solution

Let’s think through how we might try to solve this problem in C. A raw C pointer does not provide any information on ownership. An owned C pointer and a borrowed C pointer are exactly the same. There are two possibilities about ownership: the owner is either known at compile-time (really authorship time, which applies to interpreted languages too), or it’s known only at run-time. A basic example is a function that mallocs a pointer and returns it. The returned pointer is clearly an owning pointer. The caller is responsible for freeing it.

Whenever something is known at authorship time, we express it with the type system. If a function returns an int*, it should instead return a type that indicates it’s an owning pointer. Let’s call it owned_int_ptr:

struct owned_int_ptr
{
    int* ptr;
};

When a function returns an owned_int_ptr, that adds the information that the caller must free it. We can also define an unowned_int_ptr:

struct unowned_int_ptr
{
    int* ptr;
};

This indicates a pointer should not be freed.

For the case where it’s only known at runtime if a pointer is owned, we can define a dynamic_int_ptr:

struct dynamic_int_ptr
{
    int* ptr;
    char owning;
};

(The owning member is really a bool, but C doesn’t have a bool type, so we use a char where 0 means false and everything else means true.)

If we have one of these, we need to check owning to determine if we need to call free or not.

Now, let’s think about the problems with this approach:

  • We’d have to declare these pointer types for every variable type.
  • We have to tediously add a .ptr to every access to the underlying pointer
  • While this tells us whether we need to call free before tossing a variable, we still have to actually do it, and we can easily forget

For the first problem, a C developer would use macros. Macros are black magic, so we’d really like to find a better solution. Ignoring macros, none three of these problems can really be solved in C. We need to add some stuff to the language to make them properly solvable:

  • Templates
  • User-overridable * and -> operators
  • User-defined cleanup code that gets automatically inserted by the compiler whenever a variable goes out of scope

You see where I’m going, don’t you? (Unless you’re that unfamiliar with C++)

With these additions, the C++ solution is:

template<typename T> class auto_ptr
{

public:

  auto_ptr(T* ptr) : _ptr(ptr) 
  {
 
  }

  ~auto_ptr()
  {
      delete _ptr;
  }

  T* operator->() const
  {
      return _ptr;
  }

  T& operator*() const
  {
      return *_ptr;
  }

private:

  T* _ptr;
}

Welcome to C++03 (that’s the C++ released in 2003)!

By returning an auto_ptr, which is an owning pointer, you’ll get a variable that behaves identically to a raw pointer when you dereference or access members via the arrow operator, and that automatically deletes the pointer when the auto_ptr is discarded by the program (when it goes out of scope).

The last part is very significant. There’s something unique to C++ that makes this possible:

C++ has auto memory management!

This is what all those articles that say “C++ only has manual memory management” fail to recognize. C++ does have manual memory management (new and delete), but it also has a type of variable with automatic storage. These are local variables, declared as values (not as pointers), and instance members, also declared as values. This is usually considered equivalent to being stack-allocated, but that’s neither important nor always correct (a heap-allocated object’s members are on the heap, but are automatically deleted when the object is deleted).

The important part is auto variables in C++ are automatically destroyed at the end of the scope in which they are declared. This behavior is inherited from C, which “automatically” cleans up variables at the end of their scope.

But C++ makes a crucial enhancement to this: destructors.

Destructors are user defined code, whatever your heart desires, added to a class A that gets called any time an instance of A is deleted. That includes when an A instance with automatic storage goes out of scope. This means the compiler automatically inserts code when variables go out of scope, and we can control what that code is, as long as we control what types the variables are.

That’s the real garbage collection, and it’s the only garbage collection we actually need. It’s completely deterministic and doesn’t cost a single CPU cycle more than what it takes do the actual releasing, because the instructions are inserted (and can be inlined) at compile-time.

You can’t have destructors in a garbage collected language. Finalizers aren’t destructors, and the pervasive myth that they are (encouraged at least in C# by notating them identically to C++ destructors) has caused endless pain. You can have them in reference counted languages. So far, reference counted languages are on par with C++ (except for performance, nudging those atomic reference counts are expensive). But let’s keep going.

Custom Value Semantics

Why can’t we build our own “shared disposable” as a userland class in C#? Something like this:

class SharedDisposable<T> : IDisposable
{
  private class ControlBlock
  {
    T source;
    AtomicInt count;
  }

  SharedDisposable(IDisposable source)
  {
    _controlBlock = new()
    {
      source = source;
      count = 1;
    }
  }

  SharedDisposable(SharedDisposable other)
  {
    _controlBlock = other._controlBlock;
    _controlBlock.increment();
  }

  T get()
  {
    return _controlBlock.source;
  }

  void Dispose()
  {
    if(_controlBlock.decrementAndGet() == 0)
    {
      _controlBlock.source.Dispose()
    }
  }
}

One problem, of course, is that if the source IDisposable is accessible directly to anyone, they can Dispose it themselves. Sure, but really that problem exists for any “resource manager” class, including smart pointers in C++. The bigger problem is that if I do this:

function StoreSharedDisposable(SharedDisposable<MyClass> Incoming)
{
  this._theSharedThing = Incoming;
}

The thing that’s supposed to happen, namely incrementing the reference count, doesn’t happen. This is just a reference assignment. None of my code gets executed at the =. What I have to do is write this:

function StoreSharedDisposable(SharedDisposable<MyClass> Incoming)
{
  this._theSharedThing = new SharedDisposable(Incoming);
}

Like calling Dispose, this is another thing you’ll easily forget to do. We need to be able to require that assigning one SharedDisposable to another invokes the second constructor.

This is where C++ pulls ahead even of reference counted languages, and where it becomes, AFAIK, truly unique (except direct derivatives like D). A C++ dev will look at that second constructor for SharedDisposable and recognize it as a copy constructor. But it doesn’t have the same effect. Like most “modern” languages, C# has reference semantics, so assigning a variable involves no copying whatsoever. C++ has primarily value semantics, unless you specifically opt out with * or &, and unlike the limited value semantics (structs) in C# and Swift, you have total control over what happens on copy.

(If C#/Swift allowed custom copy constructors for structs, it would render the copy-on-write optimization impossible, and since you can only have value semantics, unless you tediously wrap a struct in a class, losing this optimization would mean a whole lot of unnecessary copying).

Speaking of this, there’s a big, big problem with auto_ptr. You can easily copy it. Then what? You have two auto_ptrs to the same pointer. Well, auto_ptrs are owning. You have two owners, but no sharing logic. The result is double delete. This is so severe a problem it screws up simply forwarding an auto_ptr through a function:

auto_ptr<int> forwardPointerThrough()
{
    auto_ptr<int> result = getThePointer();
    ...
    return result; // Here, the auto_ptr gets copied.  Then result goes out of scope, and its destructor is called, which deletes the underlying pointer.  You've now returned an auto_ptr to an already deleted pointer!
}

Luckily, C++ lets us take full control over what happens when a value is copied. We can even forbid copying:

template<typename T> class auto_ptr
{

public:

  ...

  auto_ptr(const auto_ptr& other) = delete;
  auto_ptr& operator=(const auto_ptr& other) = delete;

  ...
}

We also suppressed copy-assignment, which would be just as bad.

C++ again let’s us define ourselves exactly what happens when we do this:

SomeType t;
SomeType t2 = t; // We can make the compiler insert any code we want here, or forbid us from writing it.

This is the interplay of value semantics and user-defined types that let us take total control of how those semantics are implemented.

That helps us avoid the landmine of creating an auto_ptr from another auto_ptr, which means the underlying ptr now has two conflicting owners. Our attempt to pass an auto_ptr up a level in a return value will now cause a compile error. Okay, that’s good, but… I still want to pass the return value through. How can I do this?

I need some way for an auto_ptr to release its control of its _ptr. Well, let’s back up a bit. There’s a problem with auto_ptr already. What if I create an auto_ptr by assigning it to nullptr?

auto_ptr ohCrap = nullptr;

When this goes out of scope, it calles delete on a nullptr. auto_ptr needs to check for that case:

~auto_ptr()
{
    if(_ptr)
        delete _ptr;
}

With that fixed, it’s fairly obvious what I need to do to get an auto_ptr to not delete its _ptr when it goes out of scope: set _ptr to nullptr:

T* release() const
{
    T* ptr = _ptr;
    _ptr = nullptr;
    return ptr;
}

Then, to transfer ownership from one auto_ptr from another, I can do this:

auto_ptr<int> forwardPointerThrough()
{
    auto_ptr<int> result = getThePointer();
    ...
    return result.release();
}

Okay, I’m able to forward auto_ptrs, because I’m able to transfer ownership from one auto_ptr to another. But it sucks I have to add .release(). Why can’t this be done automatically? If I’m at the end of a function, and I assign one variable to another, why do I need to copy the variable? I don’t want to copy it, I want to move it.

The same problem exists if I call a function to get a return value, then immediately pass it by value to another function, like this:

doSomethingWithAutoPtr(getTheAutoPtr())

What the compiler does (or did) here is assign the result of getTheAutoPtr() to a temporary unnamed variable, then copy it to the incoming parameter into doSomethingWithAutoPtr. Since a copy happens, and we have forbidden copying an auto_ptr, this will not compile. We have to do this:

doSomethingWithAutoPtr(getTheAutoPtr().release())

But why is this necessary? The reason to call release is to make sure that we don’t end up with two usable auto_ptrs to the same object, both claiming to be owners. But the second auto_ptr here is a temporary variable, which is never assigned to a named variable, and is therefore unusable to the program except to be passed into doSomethingWithAutoPtr. Shouldn’t the compiler be able to tell that there’s never really two accessible variables? There’s only one, it’s just being transferred around.

This is really a specific example of a much bigger problem. Imagine instead of an auto_ptr, we’re doing this (passing the result of one function to another function) with some gigantic std::vector, which could be megabytes of memory. We’ll end up creating the std::vector in the function, copying it when we return it (maybe the compiler optimizes this with copy elision), and then copying it again into the other function. If the function it was passed to wants to store it, it needs to copy it again. That’s as many as three copies of this giant object when really there only needs to be one. Just as with the auto_ptr, the std::vector shouldn’t be copied, it should be moved.

This was solved with C++11 (released in 2011) with the introduction of move semantics. With the language now able to distinguish copying from moving, the unique_ptr was born:

template<typename T> class unique_ptr
{

public:

  unique_ptr(T* ptr) : _ptr(ptr) 
  {
 
  }

  unique_ptr(const unique_ptr& other) = delete; // Forbid copy construction
  
  unique_ptr& operator=(const unique_ptr& other) = delete; // Forbid copy assignment

  unique_ptr(unique_ptr&& other) : _ptr(other._ptr) // Move construction
  {
    other._ptr = nullptr;
  }

  unique_ptr& operator=(unique_ptr&& other) // Move assignment
  {
    _ptr = other._ptr;
    other._ptr = nullptr;
  }

  ~unique_ptr()
  {
    if(_ptr)
      delete _ptr;
  }

  T* operator->() const
  {
    return _ptr;
  }

  T& operator*() const
  {
    return *_ptr;
  }

  T* release()
  {
    T* ptr = _ptr;
    _ptr = nullptr;
    return ptr;
  }

private:

  T* _ptr;
}

Using unique_ptr, we no longer need to call release when simply passing it around. We can forward a return value, or pass a returned unique_ptr by value (or rvalue reference) from one function to another, and ownership is transferred automatically via our move constructor.

(We still define release in case we need to manually take over the underlying pointer).

We had to exercise all the capabilities of C++ related to value types, including ones that even reference counted languages don’t have, to build unique_ptr. There’s no way I could build a UniqueReference in Swift, because I can’t control, much less suppress, what happens when one variable is assigned to another. Since I can’t define unique ownership, everything is shared in a reference counted language, and I have to be way more careful about using unsafe references. What most devs do, of course, is make every unsafe reference a weak reference, which forces it to be optional, and make you contend with situations that may never arise and for which no proper action is defined.

C++ comes with scope ownership and unsafe references out of the box, and with unique_ptr we’ve added unique ownership as a library class. To complete the typology, we just add a shared_ptr and the corresponding weak_ptr, and we’re done. Building a correct shared_ptr similarly exercises the capability of custom copy constructors: we don’t suppress copying like we do on a unique_ptr, we define it to increment the reference count. Unlike the C# example, that changes the meaning of thisSharedPtr = thatSharedPtr, instead of requiring us to call something extra.

And with that, the typology is complete. We are able to express every type of ownership by selecting the right type for variables. With that, we have told the system what it needs to know to trigger teardown logic properly.

The vast majority of cleanup logic is cascading. For this reason, not only do we essentially never have to write delete (the only deletes are inside the smart pointer destructors), we also very rarely have to write destructors. We don’t, for example, have to write a destructor that simply deletes the members of a class. We just make those members values, or smart pointers, and the compiler ensures (and won’t let us stop) this cascading cleanup happens.

The only time we need to write a destructor is to tell the compiler how to do the cleanup of some non-memory resource. For example, we can define a database connection class to adapts a C database library to C++:

class DatabaseConnection
{

public:

  DatabaseConnection(std::string connectionString) :
    _handle(createDbConnection(connectionString.c_string())
  {

  }

  DatabaseConnection(const DatabaseConnection& other) = delete;

  ~DatabaseConnection()
  {
    closeDbConnection(_handle);
  }

private:

  std::unique_ptr<DbConnectionHandle> _handle;
}

Then, in any class A that holds a database connection, we simply make the DatabaseConnection a member variable. Its destructor will get called automatically when the A gets destroyed.

We can use RAII to do things like declare a critical section locked by a mutex. First we write a class that represents a mutex acquisition as a C++ class:

class AcquireMutex
{

public:

  AcquireMutex(Mutex& mutex) :
    _mutex(mutex)
  {
    _mutex.lock();
  }

  ~AcquireMutex()
  {
    _mutex.unlock();
  }

private:
  
  Mutex& _mutex;
}

Then to use it:

void doTheStuff()
{
  doSomeStuffThatIsntCritical();

  // Critical section
  {
    AcquireMutex acquire(_theMutex);

    doTheCriticalStuff();
  }

  doSomeMoreStuffThatIsntCritical();
}

The mutex is locked at the beginning of the scope by the constructor of AcquireMutex, and automatically unlocked at the end by the destructor of AcquireMutex. This is really useful, because it’s exception safe. If doTheCriticalStuff() throws an exception, the mutex still needs to be unlocked. Manually writing unlock after doTheCriticalStuff() will result in it never getting unlocked if doTheCriticalStuff() throws. But since C++ guarantees that when an exception is thrown and caught, all scopes between the throw and catch are properly unwound, with all local variables being properly cleaned up (including their destructors getting called… this is why throwing exceptions in destructors is a big no-no), doing the unlock in a destructor behaves correctly even in this case.

This whole paradigm is totally unavailable in garbage collected languages, because they don’t have destructors. You can do this in reference counted languages, but at the cost of making everything shared, which is much harder to reason correctly about than unique ownership, and the vast majority of objects are really uniquely owned. In C# this code would have to be written like this:

void DoTheStuff()
{
  DoSomeStuffThatIsntCritical();

  _theMutex.Lock();

  try
  {
    doTheCriticalStuff();
  }
  catch(_)
  {
    throw;
  }
  finally
  {
    _theMutex.Unlock()
  }

  doSomeMoreStuffThatIsntCritical();
}

Microsoft’s docs on try-catch-finally show a finally block being used for precisely the purpose of ensuring a resource is properly cleaned up.

In fact, this isn’t fully safe, because finally might not get called. To be absolutely sure the mutex is unlocked we’d have to do this:

void DoTheStuff()
{
  DoSomeStuffThatIsntCritical();

  _theMutex.Lock();

  Exception? error = null;

  try
  {
    doTheCriticalStuff();
  }
  catch(Exception e)
  {
    error = e;
  }
  finally
  {
    _theMutex.Unlock()
  }

  if(error)
    throw error;  

  doSomeMoreStuffThatIsntCritical();
}

Gross.

C# and Java created using/try-with-resources to mitigate this problem:

void DoTheStuff()
{
  DoSomeStuffThatIsntCritical();

  _theMutex.Lock();

  Exception? error = null;

  using(var acquire = new AcquireMutex(_mutex))
  {
    doTheCriticalStuff();
  }

  doSomeMoreStuffThatIsntCritical();
}

That solves the problem for relatively simple cases like this where a resource doesn’t cross scopes. But if you want to do something like open a file, call some methods that might throw, then pass the file stream to a method that will hold onto it for some amount of time (maybe kicking off an async task), using won’t help you because that assumes the file needs to get closed locally.

Adding using/try-with-resources was a great decision, and it’s not the garbage collector and receives no assistance from the garbage collector at all. They are special language features with new keywords. They never could have been added as library features. And they only simulate scope ownership, not unique or shared ownership. Adding them is an admission that the garbage collector isn’t the panacea it promised to be.

Tracing?

The basic idea here is not to bake a specific resource management strategy into the language, but to allow the coder to opt each variable into a specific resource management strategy. We’ve seen that C++ gives us the tools necessary to add reference counting, a strategy sometimes baked directly into languages, as an opt-in library feature. That begs the question: could we do this for tracing as well? Could we build some sort of traced_ptr<T> that will be deleted by a separate process running in separate thread, that works by tracing the graph of references and determines what is still accessible?

C++ is still missing the crucial capability we need to implement this. The tracer needs to be able to tell what things inside an object are pointers that need to be traced. In order to do that, it needs do be able to collect information about a type, namely what its members are, and their layout, so it can figure out which members are pointers. Well, that’s reflection. Once we get it, it will be part of the template system, and we could actually write a tracer where much of the logic that normally would happen at runtime would be worked out at compile time. The trace_references<T>(traced_ptr<T>& ptr) function would be largely generated at compile time for any T for which a traced_ptr<T> is used somewhere in our program. The runtime logic would not have to work out where the pointers to trace are, it would just have to actually trace them.

Once we get reflection, we can write a traced_ptr<T> class that knows whether or not T has any traced_ptr type members. The destructor of traced_ptr itself will do nothing. The tracer will periodically follow any such members, repeat the step for each of those, and voila: you get opt-in tracing. This is interesting because it greatly mitigates the problem that baked in tracing has, which is the total uncertainty about the state of an object during destruction. What can you do in the destructor for your class if you have traced_ptrs to it? Well, you can be sure everything except the traced_ptr members are still valid. You just can’t touch the traced_ptr members.

Since it is now your responsibility, and decision, to work out which members of a class will be owned by the tracer, you can draw whatever dividing line you want between deterministic and nondeterministic release. A class that holds both a file handle and other complex objects might decide that the complex objects will be traced_ptrs, but the file handle will be a unique_ptr. That way we don’t have to write a destructor at all, and the destructor the compiler writes for us will delete the file handle, and not touch the complex objects.

There may be problems with delegating only part of your allocations to a tracer. The other key part of a tracer is it keeps track of available memory. To make this work you’d probably also need to provide overrides of operator new and operator delete. But you may also be okay with relaxing the promises of traced references: instead of the tracer doing nothing until it “needs to” (when memory reaches a critical usage threshold), it just runs continuously in the background, giving you assurance you can build some temporary webs of objects that you know aren’t anywhere close to your full memory allotment, and be sure they’ll all be swept away soon after you’re done with them.

While this is a neat idea, I would consider it an even lazier approach to the ownership problem than a proliferation of shared and weak references. This is also more or less avoiding the ownership problem altogether. It may be a neat tool to have in our toolbelts, but I’d probably want a linter to warn on every usage just like with weak_ptrs, to make us think carefully about whether we can be bothered to work out actual ownership.

Conclusion

I have gone over all the options in C++ for deciding how a variable is owned. They are:

  • Scope ownership: auto variables with value semantics
  • Unique ownership: unique_ptr
  • Shared ownership: shared_ptr
  • Manual ownership: new and delete

Then there are two options for how to borrow a variable:

  • Raw (unowned) pointers/references
  • Weak references: weak_ptr

This list is in deliberate order. You should prefer the first option, and only go to the next option if that can’t work, and so on. These options really cover all the possibilities of ownership. Neither garbage collected nor reference counted languages give you all the options. Really, the first two are the most important. Resource management is far simpler when you can make the vast majority of cases scope or unique ownership. Unique ownership (that can cross scope boundaries) is, no pun intended, unique to C++.

For this reason, I have far fewer resource leaks in C++ than in other languages, far less boilerplate code to write, and the vast majority of dangling references I’ve encountered were caused by inappropriate use of shared ownership, due to me coming from reference counted languages and thinking in terms of sharing everything. Almost all my desires to use weak references were me being lazy and not fixing cyclical references (it’s not some subtle realization after profiling that they exist, it’s obvious when I write the code they are cyclical).

I wouldn’t add a garbage collector to that code if you paid me.

Do You Really Need Dynamic Dispatch?

Introduction

Polymorphism through dynamic dispatch (a base class, plus multiple subclasses that each override a base class method) is the standard way to handle variation in object-oriented code.  But it’s not the only choice, and in many cases it offers too much dynamism.  The point of dynamic dispatch is to have the code select a branch based on runtime conditions.  It is designed to allow the same code to trigger a different pathway each time it is run.  If you don’t actually need this type of late (runtime) binding, then you have other options for dispatching earlier than runtime.  The advantage of using early binding when it is appropriate is that the program correctly expresses constness: if something is the same on every execution, this should be reflected in the code.  Doing a runtime check in such a situation is needlessly pushing a potential bug out from compile/static analysis time to runtime.

A good example is situations where you have multiple implementations of some library available.  Maybe one implementation is available on one platform and another is available on another platform.  Or, one is a different manufacturer’s implementation and you’re experimenting with adopting it but not yet sure it’s ready for adoption.  In these situations, you have variation: your code should either call one implementation or another.  But is this variation really happening at runtime?  Or, is it happening at build time?  Are you choosing, when you build your program, which implementation is going to be used, for every execution?

The Dynamic Dispatch Solution

The obvious, or I might say naive, way to handle this is with typical polymorphism.  First I’ll create a package LibraryInterface that just contains the interface:

public interface LibraryFacade
{
	void doSomething();
	void doSomethingElse(); 
}

Then I’ll make two packages.  OldLibrary implements the interface with the implementation provided by the manufacturer CoolLibrariesInc.  NewLibrary implements it with one provided by the manufacturer SweetLibrariesInc.  Since these both reference the interface, both packages need to import LibraryInterface: 

import LibraryInterface;

class OldLibraryImplementation : LibraryFacade
{
	override void doSomething()
	{
		// Call CoolLibrariesInc stuff
	}

	override void doSomethingElse()
	{
		// Call CoolLibrariesInc stuff
	}

	private CoolLibrariesClass UnderlyingObject;
}
import LibraryInterface;

class NewLibraryImplementation : LibraryFacade
{
	override void doSomething()
	{
		// Call SweetLibrariesInc library stuff
	}

	override void doSomethingElse()
	{
		// Call SweetLibrariesInc library stuff
	}

	SweetLibrariesClass UnderlyingObject;
}

Then, somewhere, we have a factory method that gives us the right implementation class.  Naively we would put that in LibraryInterface:

public static class LibraryFactory
{
	public static LibraryFacade library();
}

But we would have to make it abstract.  We can’t do that because it’s static (it wouldn’t make sense anyways).  If we try to compile LibraryInterface with this, it will rightly complain that the factory method is missing its body.  The most common way to deal with this I’ve seen (and done myself many times) is to move the factory method into the application that uses this library.  Then I implement it according to which library I want to use:

import OldLibrary;
import NewLibrary;

public static class LibraryFactory
{
	// We’re using the new library right now
	public static LibraryFacade library()
	{
		return new NewLibraryImplementation();
		// return new OldLibraryImplementation();
	}
}

Then in my application I use the library by calling the factory to get the implementation and call the methods via the interface:

LibraryFacade library = LibraryFactory.library();
library.doSomething();

If I want to switch back to the old library, I don’t need to change any of the application code except for the factory method.  Cool!

This is already a bit awkward.  First of all, in order for the factory file to compile, we have to link in both the old and new libraries because of the import statements.  Importing both doesn’t break anything.  In fact, we have all the mechanisms needed to use both implementations in different parts of the app, or switch from one to the other mid-execution.  Do we really need this capability?  If the intention is that we pick one or the other at build time and stick with it, then wouldn’t it be ideal for the choice to be made simply by linking one library in, and better yet signal a build error if we try to link in both?

When It Starts to Smell

Let’s say our library interface is a little more sophisticated.  Instead of just one class, we have two, and they interact with each other:

public interface LibraryStore
{
	LibraryEntity FetchEntity(string Id);
	…
	void ProcessEntity(LibraryEntity Entity);
}

public interface LibraryEntity
{
	string Id { get; }
}

Now, we have the old library’s implementation of these entities, and presumably each one is wrapping some CoolLibrariesInc class:

internal class OldLibraryStore: LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		CoolLibraryEntity UnderlyingEntity = UnderlyingStore.RetrieveEntityForId(Id);
		return new OldLibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		if(Entity is not OldLibraryEntity OldEntity)
			throw new Exception(“What are you doing!?  You can’t mix library implementations!”);

		CoolLibraryEntity UnderlyingEntity = OldEntity.UnderlyingEntity;

		UnderlyingStore.DoWorkOnEntity(UnderlyingEntity);
	}

	…

	private CoolLibraryStore UnderlyingStore;
}

internal class OldLibraryEntity: LibraryEntity
{
	string Id => UnderlyingEntity.GetId();

	…

	internal CoolLibraryEntity UnderlyingEntity;
}

Notice what we have to do in ProcessEntity.  We have to take the abstract LibraryEntity passed in and downcast it to the one with the “matching” implementation.  What do we do if the downcast fails?  Well, the only sensible thing is to throw an exception.  We certainly can’t proceed with the method otherwise, and it certainly indicates a programming error.

Now, if you know me, you know the first thing I think when I see an exception (other than “thank God it’s actually failing instead of sweeping a problem under the rug”) is “does this need to be a runtime failure, or can I make it a compile time failure?”  The error is mixing implementations.  Is this ever okay?  No, it isn’t.  That’s a programming error every time it happens, period.  That means it’s an error as soon as I write the code that takes one implementation’s entity and passes it to another implementation’s store.  I need to strengthen my type system to prevent me from doing this.  Let’s make the Store interface parameterized by the Entity type it can work with:

public interface LibraryStore<Entity: LibraryEntity>
{
	Entity fetchEntity(string id);
	…
	void processEntity(Entity entity);
}

Then the old implementation is:

public class OldLibraryStore: LibraryStore<OldLibraryEntity>
{
	OldLibraryEntity FetchEntity(string Id)
	{
		CoolLibraryEntity UnderlyingEntity = UnderlyingStore.RetrieveEntityForId(Id);
		return new OldLibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(OldLibraryEntity Entity)
	{
		CoolLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.DoWorkOnEntity(UnderlyingEntity);
	}

	…

	private CoolLibraryStore UnderlyingStore;
}

I need to make the implementing classes public now as well.

Awesome, we beefed that exception up into a type error.  If I made that mistake somewhere, I’ll now get yelled at for trying to send a mere LibraryEntity (or even worse a OldLibraryEntity) to the NewLibraryStore.  I’ll probably need to adjust my client code to keep track of the fact it’s now getting an NewLibraryEntity back from FetchEntity instead of just a LibraryEntity.  Well… actually, I’ll first get yelled at for declaring a LibraryStore without specifying the entity type.  First I have to change LibraryStore to LibraryStore<NewLibraryEntity> .  And if I do that, I might as well just change them to NewLibraryStore.

Okay, but my original intention was to decouple my client code from which library implementation I have chosen.  I just destroyed that.  I have now forced the users of LibraryStore to care about which LibraryEntity they’re talking to, and by extension which LibraryStore.

Remember, we have designed a system that allows me to choose different implementations throughout my application code.  I can’t mix them, but I can switch between them.  If that’s what I need, then it is normal and expected that I force the application code to explicitly decide, at each use of the library, which one it’s going to use.  Well, that sucks, at least if my goal was to make selecting one (for the entire application) a one-line change.  This added “complexity” of the design correctly expresses the complexity of the problem.

The problem with this problem is it isn’t actually my problem.

(I’m really proud of that sentence)

I don’t need to be able to pick one implementation sometimes and another other times.  I don’t need all of this complexity!  What I need is a build-time selection of an implementation for my entire application.  That precludes the possibility of accidentally mixing implementations.  That simply can’t happen if the decision is made client-wide at build time.  My design needs to reflect that.

An Alternative Approach

The fundamental source of frustration is that I chose the wrong tool for this job.  Dynamic dispatch?  I don’t need or want the dispatch to be dynamic.  That means doing the implementation “selection” as implementations of an interface is the wrong choice. I don’t want the decision of where to bind a library call to be made at the moment that call is executed. I want it to be bound at the moment my application is built.

Let’s try something else.  I’m going to get rid of the LibraryInterface package altogether.  Then OldLibrary will contain these classes:

public class LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		CoolLibraryEntity UnderlyingEntity = UnderlyingStore.RetrieveEntityForId(Id);
		return new LibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		CoolLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.DoWorkOnEntity(UnderlyingEntity);
	}

	…

	private CoolLibraryStore UnderlyingStore;
}

public class LibraryEntity
{
	string Id => UnderlyingEntity.GetId();

	…

	internal CoolLibraryEntity UnderlyingEntity;
}

NewLibrary will contain these classes:

public class LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		SweetLibraryEntity UnderlyingEntity = UnderlyingStore.ReadEntity(Id);
		return new LibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		SweetLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.ApplyProcessingToEntity(UnderlyingEntity);
	}

	…

	private SweetLibraryStore UnderlyingStore;
}

public class LibraryEntity
{
	string Id => UnderlyingEntity.UniqueId;

	…

	internal SweetLibraryEntity UnderlyingEntity;
}

I’ve totally gotten rid of the class hierarchies.  There are simply two versions of each class, with each equivalent one named exactly the same.  That means they’ll clash if they are both included in a program.  That’s perfect, because if we accidentally link both libraries, we’ll get duplicate definition errors.

It took me a long time to realize this is even an option.  This is a “weird” way to solve the problem.  For one, the actual definition of the “interface” doesn’t exist anywhere.  It is, in a way, duplicated and stuffed into each package.  And, to be sure, there is an interface.  An “interface” just means any standard for how code is called.  In OO code, it means the public methods and members of classes.  If we want to be able to switch implementations of a library without changing client code, a well-defined “interface” for the libraries is precisely what we need.  So it’s natural we might think, “well if I need an interface, then I’m going to create what the language calls an interface!”  But “interface” in languages like C# and Java (and the equivalent “Protocol” in ObjC/Swift) doesn’t just mean “a definition for how to talk to something”.  It also signs you up for a specific way of binding calls.  It signs you up for late binding, or dynamic dispatch.

Defining a Build-Time Interface

The fact there is no explicit definition of the library interface isn’t just inconvenient.  We lose any kind of validation that we’ve correctly implemented the interface.  We could make a typo in one of the methods and it will compile just file, and it won’t be until we switch to that implementation and try to compile our app.  And the error we get won’t really tell us what happened.  It will tell us we’re trying to call a method that doesn’t exist, but it should really tell us the library class is supposed to have this method but it doesn’t.

This is the general problem with “duck typing“. If I pass a wrench to a method expecting a duck, or duck-like entity, instead of getting a sensible error like “you can’t pass this object here because it isn’t duck-like”, you get a wrench complaining that it doesn’t know how to quack. You have to reason backward to determine why a wrench was asked to quack in the first place. Also, this error is fundamentally undetectable until the quack message is sent, which means runtime, even though the error exists from the moment I passed a wrench to a method that declares it needs something duck-like. The point of duck typing is flexibility. But flexibility is equivalent to late error detection. That’s why too much flexibility isn’t helpful.

One way to solve this is to “define” the interface with tests.  What we’re testing is the type system, which means we don’t need to actually run anything.  We just need to compile something.  So we can write a test that simply tries to call each expected method with the expected parameters:

// There’s no [Test] annotation here!
void testLibraryStore()
{
    var Entity = (LibraryEntity)null!;

    string TestId =  Entity.Id;
} 

// There’s no [Test] annotation here!
void testLibraryStore()
{
    var Store = (LibraryStore)null!;

    Entity TestFetchEntity = Store.FetchEntity((string)null!);
    ProcessEntity((Entity)null!);
} 

You might think I’m crazy for typing that out, but really it isn’t that ridiculous, and it does exactly what we want (it also indicates I’m truly a C++ dev at heart).  I’ve highlighted that there are no [Test] annotations, which means this code won’t run when we hit the Test button.  That’s good, because it will crash.  We don’t want it to run.  We just want it to compile.  If that compiles, it proves our classes fulfill the intended interfaces.  If it doesn’t compile, then we’re missing something.

(If you aren’t already interpreting compilation errors as failed tests, it’s never too late to start)

What would be nice is if languages could give us a way to define an interface that isn’t dynamically dispatched.  What if I could do this:

interface ILibraryStore<Entity: ILibraryEntity>
{
	Entity FetchEntity(string Id);
	void ProcessEntity(Entity Entity);
}

interface ILibraryEntity
{
	string Id { get; }
}

I could put these in a separate LibraryInterface package.  Then in one library package:

public class LibraryStore: static ILibraryStore<LibraryEntity>
{
	LibraryEntity FetchEntity(string Id)
	{
		SweetLibraryEntity UnderlyingEntity = UnderlyingStore.ReadEntity(Id);
		return new LibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		SweetLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.ApplyProcessingToEntity(UnderlyingEntity);
	}

	…

	private SweetLibraryStore UnderlyingStore;
}

public class LibraryEntity: static ILibraryEntity
{
	string Id => UnderlyingEntity.UniqueId;

	…

	internal SweetLibraryEntity UnderlyingEntity;
}

By statically implementing the interface, I’m only creating a compile-time “contract” that I need to fulfill.  If I forget one of the methods, or misname one, I’ll get a compiler error saying I didn’t implement the interface.  That’s it.  But then if I have a method like this:

void DoStuff(ILibraryEntity Entity)
{
	string Id = Entity.Id; // What does the compiler do here?
}

I can’t pass a LibraryEntity in.  I can only bind a variable of type ILibraryEntity to an instance of a class if that class non-statically implements ILibraryEntity, because such binding (up-casting) is a form of type erasure: I’m telling the compiler to forget exactly which subtype the variable is, to allow code to work with any subtype.  For that to work, the methods on that type have to be dispatched dynamically.  The decision by language designers to equate “interfaces” with dynamic dispatch was quite reasonable!

That means in my client code I still have to declare things as LibraryStore and LibraryEntity.  In order to get the “build-time selection” I want, I still have to name the classes identically.  That is a signal to the compiler both that they cannot coexist in a linked product, and that they get automatically selected by choosing one to link in.  Then, there’s the problem with importing.  Since the packages are named differently, I’d have to change the import on every file that uses the library (until C# 10!).  Same with Java.  In fact, it’s a bit worse in Java.  Java equivocates namespaces with packages, so if the packages are named differently the classes must also be named differently, and they’ll coexist just fine in the application.  In either case, you can name the packages identically.  Then your build system will really throw a fit if you try to bring both of them in.

Is the notion of a “static” interface a pipe dream?  Not at all.  C++20 introduced essentially this very thing and called them conceptsCreating these compile-time only contracts is a much bigger deal for C++ than for other languages, because of templates.  In a language like C#, if I want to define a generic function that prints an array, I need an interface for stringifying something:

interface StringDescribable
{
	string Description { get; }
}

string DescribeArray<T: StringDescribable>(T[] Array)
{
	var Descriptions = Array
		.Select(StringDescribable.Description);

	return $”[{string.Joined(“, “, Descriptions)}]”;
}

This requires the Description method to be dynamically dispatched because of how generics work.  This code is compiled once, so the same code is executed each time I call this function, even with different generic types.  It therefore needs to dynamically dispatch the Description call to ensure it calls T’s implementation of it.  Fundamentally, a different Description implementation can get called each time this function is executed.  It’s a runtime variation, so it has to be late-bound.

The equivalent code with templates in C++ looks like this:

template<typename T> std::string describeArray(T[] array, size_t count)
{
	std::string descriptions[count];

	std::transform(
		array.begin(),
		array.end(),
		descriptions,
		[] (const T& element) { return element.description(); }
	);

	return StringUtils::joined(“, “, descriptions); // Some helper method we wrote to join strings
}

Notice the absence of any “constraint” on the template parameter T.  No such notion existed in C++ until C++20.  You don’t need to require that T implement some interface, because for each type T with which we instantiate this function, we get a totally separate compiled function.  If our code does this:

SomeType someTypeArray[10];
SomeOtherType someOtherTypeArray[20];
…
describeArray(someTypeArray);
describeArray(someOtherTypeArray);

Then the compiler creates and compiles the following two functions:

std::string describeArray(SomeType[] array, size_t count)
{
	std::string descriptions[count];

	std::transform(
		array.begin(),
		array.end(),
		descriptions,
		[] (const SomeType& element) { return element.description(); }
	}

	return StringUtils::joined(“, “, descriptions);
}

std::string describeArray(SomeOtherType[] array, size_t count)
{
	std::string descriptions[count];

	std::transform(
		array.begin(),
		array.end(),
		descriptions,
		[] (const SomeOtherType& element) { return element.description(); }
	}

	return StringUtils::joined(“, “, descriptions);
}

These are totally different.  They are completely unrelated code paths as far as the compiler/linker is concerned.  We could even specialize the template and write completely different code for one of the cases.

Both of these will compile fine as long as both SomeType and SomeOtherType have a const method called description that returns a std::string (it doesn’t even need to return std::string, it can return anything that is implicitly convertible to a std::string).  This is literal duck typing in action. Furthermore, declaring an “interface”, which in C++ is a class with only pure virtual methods, forces description to be dynamically dispatched, which forces every class implementing it to add a vtable entry for it.  If any such class has no virtual methods except this one, we suddenly have to create a vtable for those classes and add the vpointer to the instances, which changes their memory layout.  We probably don’t want that.

If I accidentally invoke describeArray with some class that doesn’t provide such a function, I get a very similar problem as our example with libraries.  It (correctly) fails to compile, but the error message we get is, uh… less than ideal.  It just tells us that it tried to call a method that doesn’t exist.  Any seasoned C++ dev knows that when you have template code several layers deep (and most STL stuff is many, many layers deep), mistakenly instantiating a template with a type that doesn’t fulfill whatever “contracts” the template needs results in some crazy error deep in the implementation guts.  You literally have to debug your code’s compilation (walk up a veritable stack trace of template instantiations) to find out what went wrong and why.  It sucks.  It’s the metaprogramming version of having terrible exception diagnostics and being reduced to looking at stack traces.

This even suffers a form of late error detection. Let’s say I write one template method where the template parameter T has no requirement to be “describable”, but in it I call describeArray with an array of T. This is an error that should be detected as soon as I compile after writing this method. But it won’t be detected until later, when I actually instantiate this template method with a non-describable parameter. It’s a compile-time error, but it’s still too late, and still in a sense detected when something is executed instead of when it is written (C++ templates are a metaprogramming stage: code that writes code, so templates are executed at build time to produce the non-templated C++ code that then gets compiled as usual).

And just like some sort of “static interface” like I proposed would help the compiler tell us the real error, so too does a compile-time contract fix this problem.  We need to introduce “constraints” like C#, but since templates are all compile-time, it’s not an ordinary interface/class.  It’s something totally different: a concept:

template<typename T> concept Describable = requires(const T& value)
{
	{ value.description() } -> std::string
}

Then we can use a concept as a constraint:

template<typename T> requires Describable<T> std::string describeArray(T[] array, size_t count)
…

But as long as we don’t have such, uh, concepts (heh) in other languages, we can use other techniques to emulate them, like “compilation tests”.  The solutions I gave are all a little awkward.  Naming two packages identically just to make them clash? Tests that just compile but don’t run?  Well, it’s either that, or all the force downcasting (because you know the downcast won’t fail, right?  Right?) you’ve had to do whenever you wrote an adapter layer.  I know which one I’d rather deal with.

Testing

Now, let’s talk about testing our application.  In our E2E tests, we’ll want to link in whatever Library the app is actually using.  But in more isolated unit tests, we’ll likely want to mock the Library.  If you use some mocking framework like Moq or Mockito, you know that the prerequisite to something being “mockable” is that it’s dynamically dispatched (an interface, abstract or at least virtual method).  You can try PowerMock or something similar, but that kind of runtime hacking isn’t always available, depending on the language/environment.  The problem is essentially the same.  When running for real, we want to call the real library.  When running in tests, we want to call a mock library.  That’s a variation.  Variations are typically solved with polymorphism, and that’s exactly what mocking frameworks do.  They take advantage of a polymorphic boundary to redirect your application’s calls to the mocking framework.

How do we handle mocking (or any kind of test double, for that matter) if we’re doing this kind of static compile/link time binding trick?

Interestingly enough, if you think about it, runtime binding is unnecessarily late in most cases here as well.  How often do you find yourself sometimes mocking a class and sometimes not?  If that class is part of your application, you’ll do that.  But an external class?  We’ll always mock that.  Every time our unit test rig runs, we’ll be mocking Library.  So when is the variation selected?  At build time!  We select it when we decide to build the test rig instead of building the application.

The solution looks the same.  We just create another version of the package that has mock classes instead of real ones:

public class LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		// Mocking stuff
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		// Mocking stuff
	}
}

public class LibraryEntity
{
	string Id 
	{ 
		get
		{
			// Mocking stuff	
		}
	}
}

Unfortunately this would mean we have to reinvent mocking inside our classes.  A mocking framework exercises late binding in a more powerful way to allow it to record invocations and inject behavior.  We don’t need dynamic dispatch to do that, but we might still want it if we’re not going to rewrite or code-generate the mocking logic.  What if we made the “test” version of the library flexible enough to be molded by whatever test framework we want to use?

public class LibraryStore
{
	virtual LibraryEntity FetchEntity(string Id)
	{
		throw new Exception(“Empty stub”);
	}

	virtual void ProcessEntity(LibraryEntity Entity)
	{
		throw new Exception(“Empty stub”);
	}
}

public class LibraryEntity
{
	virtual string Id
	{
		throw new Exception(“Empty stub”);
	}
}

The “cleaner” choice would be to make the classes abstract, or even just make them interfaces:

public interface LibraryStore
{
	LibraryEntity FetchEntity(string Id);
	void ProcessEntity(LibraryEntity Entity);
}

public interface LibraryEntity
{
	string Id;
}

This will work fine unless we’re calling constructors somewhere, most likely for LibraryStore. In that case, the app won’t compile with this library linked in because it will be trying to construct interface instances. But what if we make it part of the contract that these classes can’t be directly constructed? Instead, they provide factory methods, and their constructors are private? That will grant us the flexibility to swap in abstract versions when we need them.

To add this to our interface definition “tests”, we would need to somehow test that the classes are not directly constructible. Testing negatives for compilation is tricky. You’d have to create a separate compilation unit for each negative case, and have your build system try to compile them and fail if compilation succeeds. Since you have to script that part, you might as well script the whole thing. You can write a “testFailsCompilation” script that takes a source file as an input, runs the compiler from the command line and checks whether it failed. In your project, the test suite would be composed of scripts that call the “testFailsCompilation” script with different source files.

That’s fine, but it probably doesn’t integrate with your IDE as well. There won’t be a convenient list of test cases with play buttons next to each one and a check mark or X box indicating which ones passed/failed in the last test run. Some boilerplate can probably fix that. If you can invoke the scripts from code, then you can write test cases to invoke the script and assert on its output. Where that might cause trouble is embedded programming (mobile development is an example) where tests can or must run on a different device, that may or may not have shell or permission to use it. Really, those tests are for the build stage, so you ought to define your test target to run on your build machine. If you can set that up, even your positive test cases will integrate better. Remember that what I suggested before were “test” methods that are actual [Test] cases. So they won’t show up in the list of tests either. If you split each one into its own source file, compile it with a script, then write [Test] cases that invoke each script, then you recover proper IDE integration. This will make hooking them into a CI/CD pipeline simpler as well.

With that, we have covered all the bases. Add this to your tool belt. If you need variation, you probably want polymorphism. But class hierarchies (including abstract interfaces) is only one specific type of polymorphism, which we can call runtime polymorphism. There are other types as well. What we discussed here is a form of static polymorphism. When you realize you need polymorphism, don’t just jump to the type we’re all trained to think of as the one type. Think about when the variation needs to be selected, and choose the polymorphism that makes decision at that time: no earlier, and no later.

What Type Systems Are and Are Not

Introduction

Programmers are typically taught to think of the type system in languages (particularly object-oriented languages) as being a way to express the ontology of a problem.  This means it gives us a way to model whether one thing “is a” other thing.  The common mantra is that inheritance lets us express an “is a” relationship, while composition lets us express a “has a” relationship.  We see some common examples used to demonstrate this.  One is taxonomy: a dog is a mammal, which is an animal, which is a living creature.  Therefore, we would express this in code with a class LivingCreature, a class Animal that is a subclass of LivingCreature, a class Mammal that is a subclass of Animal, and a class Dog that is a subclass of Mammal.  Another one is shapes: a rectangle is a quadrilateral, which is a polygon, which is a shape.  Therefore, we similarly have a class hierarchy Shape -> Polygon -> Quadrilateral -> Rectangle.

There are problems with thinking of the type system in this way.  The biggest one is that the distinction between “is a” and “has a” is not as clear-cut as it is made out to be.  Any “is-a” relationship we can think of can be converted into a “has-a” expression, and vice versa.  A dog is a mammal, but that is equivalent to saying a dog has a class (a taxonomic class, as in “kingdom phylum class order…”, not a programming language class) that is equal to “mammal”.  A quadrilateral is a polygon, but that is equivalent to saying it has a side count, which equals 4.  We can define a rectangle with “has-a” expressions too: it has four angles, and all are equal to 90 degrees.

We can also express what we might typically model in code as “has-a” relationships with “is-a” statements as well.  A car has a color, which can equal different values like green, red, silver, and so on.  But we could also define a “green car”, which is a car.  We could, though we probably wouldn’t want to, model this with inheritance: define a subclass of Car called GreenCar, another called RedCar, another called SilverCar, and so on.

There are well-known examples of when something we might naturally think of as an “is-a” relationship turns out to be poorly suited for inheritance.  Probably the most common example is the “square-rectangle” relationship.  When asked if a square is a rectangle, almost everyone would answer yes.  And yet, defining Square to be a subclass of Rectangle can cause a lot of problems.  If I can construct a Rectangle with any length and width, I will end up with a Square if I pass in equal values, but the object won’t be an instance of Square.  I can try to mitigate this by making the Rectangle constructor private, and define a factory method that checks if the values are equal and returns a Square instance if they are.  This at least encapsulates the problem, but it remains that the type system isn’t really capturing the “squareness” of rectangles because the instance’s type isn’t at all connected to the values of length and width.

If these objects are mutable, then all bets are off.  You simply can’t model squareness with inheritance for the simple reason that an object’s type is immutable.  But if the length and width of a rectangle are mutable, then its “squareness” is also mutable.  A rectangle that was not a square can become a square, and vice versa.  But the instance’s type is set in stone at construction.  Most of the discussions I’ve seen around this focus on the awkwardness or arbitrariness of how to implement setLength and setWidth on a Square.  Should these methods even exist on Square, and if so, does setting one automatically set the other to keep them equal?  But I believe this awkwardness is a mere side effect of the underlying problem.  No matter what, if I can set the length and width of a Rectangle, I can make it become a Square, and yet it won’t be a Square according to the type system.  I simply can’t encapsulate the disconnect between the Square type and actual squareness (which is defined by equality of the side lengths) for mutable Rectangles.

So what do I do?  I use composition instead of inheritance.  Instead of Square being a subclass of Rectangle, Rectangle has a property isSquare, a bool, and it checks the equality of the two side lengths.  That way, the answer to the “is it a square” question correctly consults the side lengths and nothing else, and it can change during the lifetime of a particular rectangle instance.  This also correctly prevents me from even trying to “require” that a Rectangle is, and will be for the rest of its life, a Square; a promise that simply can’t be made with mutable Rectangles.

The same problem exists elsewhere on a type hierarchy of shapes.  If a Rectangle is a subclass of Quadrilateral, then the quadrilateral’s angles (or more generally, vertices) better not be mutable, or else we’ll end up turning a quadrilateral into a rectangle while the type system has no idea.  We have the same problem with immutable quadrilaterals that we would have to make the constructor private and check the angles/vertices in a factory method to see if we need to return a Rectangle (or even Square) instance.  The problem exists again if Quadrilateral is a subclass of Polygon.  If the number of sides/vertices is mutable, we can turn a Quadrilateral into a Triangle or Pentagon, and we’d similarly have to worry about checking the supplied vertices/sides in a factory method.

Types as Constness Constraints

We can see that, because whether something is an “is-a” or “has-a” relationship is not really a meaningful distinction (we can always express one as the other), this doesn’t tell us whether it is appropriate to express a concept using the type system of a programming language.  The square-rectangle example hints at what the real purpose of the type system is, and what criteria we should be considering when deciding whether to define a concept as a type.  The fundamental consideration is constness.  What do we know, 100%, that will never be different, at the time we write code?  When a certain block of code is executed, what are we sure is going to be the same every time it executes, vs. what could be different each time?  The type system is a technique for expressing these “known to always be the same” constraints.

Such constraints exist in all programming languages, not just compiled languages.  This is why I must resist the temptation to refer to them as “compile-time constants”.  There may be no such thing as “compile time”.  Really, they are author-time constraints (I’ve also seen this referred to as “design time”). They are knowledge possessed at the time the code is written, as opposed to when it is executed.

These constraints come in several forms.  If I call a function named doThisThing, I am expressing the knowledge, at authorship time, that a function called doThisThing exists.  Furthermore, this function has a signature.  It takes ‘n’ parameters, and each of those parameters has a type.  This means every time the doThisThing function is executed, there will be ‘n’ parameters of those types available.  This is a constant across every execution of the function.  What varies is the variables: the actual values of those parameters.  Those can be different every time the function executes, and therefore we cannot express the values as authorship-time constraints.

Where function signatures express what we know at authorship time about the invocation of a code block, the type system is a way of expressing what we know about a data block.  If we have a struct SomeStruct, with three members firstMember, secondMember and thirdMember of types int, float and string respectively, and we have an instance of SomeStruct in our code, we are saying that we know, for every execution of that code, that it is valid to query and/or assign any of those three members, with their respective types.  The other side of this coin is we know it is invalid to query anything else.  If we write someStruct.fourthMember, we know at authorship time that this is a programming error.  Fundamentally, we don’t have to wait to execute the code and have it throw an exception to discover this error.  The error exists and is plainly visible in the written code, simply by reading it.  The type system provides a parsable syntax that allows tools like the compiler to detect and report such an error.

Inheritance vs. Composition

The implication of this is that the fundamental question we should be asking when deciding to model something with a type is: are we modeling an authorship-time known constant?  If the need is to create a constraint where something is known to be true every time code is executed, the type system is the way to do that.  Inheritance and composition represent different levels of constness.  Or, rather, they represent different aspects of constness.  If I define a GreenCar as a subclass of Car, I am creating the ability to say I know this car is green, and will always be green.  If I instead define a Car to have a member called color, then I am saying I know that every car always has a color, but that color can change at any time.

What can I do with the composition approach that I can’t do with the inheritance approach?  I can change the color of a car.  What can I do with the inheritance approach that I can’t do with the composition approach?  Well, the exact opposite: I can guarantee a car’s color will never change, which allows me to do things like define a function that only takes a green car.  I can’t do that with the composition approach because I would be able to store a passed-in reference to a GreenCar, then later someone with the equivalent reference to the Car could change its color.

The more “const” things are, the more expressive I can be at compile time.  On the other hand, the less I can mutate things.  Both are capabilities that have uses and tradeoffs.  The more dynamic code is, the more it can do, but the less it can be validated for correctness (another way to think of it is the more code can do, the more wrong stuff it can do).  The more static code is, the less it can do, but the more it can be validated for correctness.  The goal, I believe, is to write code that is just dynamic enough to fulfill all of our needs, but no more, because extra dynamism destroys verifiability with no advantage.  It is also possible to allocate dynamism in different ways.  If we need code to be more flexible, instead of relaxing the constraints on existing code, we can create new code with different constraints, and concentrate the dynamism in a place that decides, at runtime, which of those two codepaths to invoke.

Now, you’re probably thinking: “that’s not why I would chose not to model a car’s color as types”.  The obvious reason why that would be insane is because the number of colors is, at least, going to be about 16.7 million.  So to exhaustively model color in this way, we’d have to write literally tens of millions of subclasses.  Even if we delegated that work to code generation, that’s just an absurd amount of source code to compile (and it would probably take forever).  It simply isn’t practical to do this with the type system.

Problems of practicality can’t be ignored, but it’s important to understand they are different than the problem of whether it would correctly express the constness of a constraint to model it with types.  This is because practicality problems are problems with the expressiveness of a language, including its type system.  These problems are often different across different languages, and can be eliminated in future iterations of languages in which they currently exist.  If it is simply inappropriate to use the type system because something is genuinely dynamic, this isn’t going to change across languages or language versions, nor is it something that could be sensibly “fixed” with a new language version.

To illustrate this, it is possible to practically model a color property with types in C++.  C++ has templates, and unlike generics in other languages, template parameters don’t have to be types.  They can be values, as long as those values are compile-time constants (what C++ calls a constexpr).  We can define a color as:

struct Color
{
  uint8_t r, g, b;
};

Then we can define a colored car as:

template<Color CarColor> class ColoredCar : public Car
{
  …
  Color color() const override { return CarColor };
  …
};

Then we can instantiate cars of any particular color, as long as the color is a constexpr:

ColoredCar<Color {255, 0, 0}> redCar;

We will presumably have constexpr colors defined somewhere:

namespace Colors
{
  static const constexpr Color red {255, 0, 0};
  static const constexpr Color yellow {255, 255, 0};
  static const constexpr Color green {0, 255, 0};
  static const constexpr Color blue {0, 0, 255};
  …
} 

Then we can write:

ColoredCar<Colors::red> redCar;

We can define a function that only accepts red cars:

function acceptRedCar(const ColoredCar<Color::red> redCar);

Of course none of this makes sense unless color is a const member/method on Car.  If a car’s color can change, it’s simply wrong to express it as a type, which indicates an immutable property of an object.

This kind of thing isn’t possible in any other language I know of.  So if you aren’t working in C++, you simply can’t use the type system for this, for the simple reason that the language isn’t expressive enough to allow it.  It may be a different type of problem, but it’s a problem nonetheless.  So the decision to use the type system to express something must take into account both whether it the thing being expressed is really an authorship-time constant, and whether the language’s type system is expressive enough to handle it.

Even in C++, there are other problems with modeling color this way.  Just like with the square-rectangle, if we construct a Car (not a ColoredCar), whose color happens to be red, it doesn’t automatically become a ColoredCar<Colors::red>.  A dynamic cast from Car to ColoredCar wouldn’t inspect the color member/method as we might expect it to.  We would have to ensure at construction that the correct ColoredCar type is selected with a factory method.  Now, there are likely other properties of a car we might want to model this way.  I often use cars as a way to demonstrate favoring composition over inheritance.  A car has a year, make, model, body type, drivetrain, engine, and so on.  Notice I said has a.  I could also say a car is a 2010 Honda Civic sedan, automatic transmission, V4.  The “classic” reason to avoid inheritance is the fact that “is-a” relationships are very often not simple trees.  We would need full-blown multiple inheritance, which would cause a factorial explosion of subtypes.  If I wanted to model all these aspects of a car with inheritance, I would really need something like:

template<uint Year, CarMake Make, CarModel<Make> Model, CarBodyType BodyType, TransmissionType TransmissionType, Engine Engine, Color CarColor> TypedCar : public Car
{
  ...
}

That’s already a pretty big mess, and it’s only that tidy (which isn’t so tidy) thanks to the unique power of C++ templates.  In any other language, you’re now talking about writing the classes 2010RedHondaCivicSedanAutoV4, 2011RedHondaCivicSedanAutoV4, 2010GreenHondaCivicSedanAutoV4, 2010RedHondaAccordSedanAutoV4, and so on more or less ad infinitum.  God forbid you’d actually start going down this path before you realize the lack of multiple inheritance blows the whole thing up even while ignoring the utter impracticality of it.

But this isn’t really enough.  With this two-level inheritance structure, I can either say to the compiler “I know this object is a car, and I know nothing else about it”, or “I know everything about this car, including its year, make, model, etc.”.  To be fully expressive I would want to be able to model between these extremes, like a car where I know its make, model and color, but I don’t know anything else.  I don’t even think C++ templates can generate the full web of partially specialized templates inheriting each other that you’d need for this (though I’m hesitant to assert that, I’ve been consistently shocked at what can be done, albeit in ridiculous and circuitous fashions, with C++ templates.  That doesn’t really change the conclusion here though).

So, long story short, it’s probably never a good idea to model a car’s color, or any of its other attributes, even if they are const, with the type system, because it’s not practical.  However, I want to emphasize that this really is a problem of practicality.  Future languages or language versions may become expressible enough to overcome these limitations. The point is we need to augment our “do I know this now, at authorship time, or only at runtime?” question, particularly when the answer “I know it at authorship time”, with another question: is it practical, given the language capabilities, to express a constraint with the type system? If the answer is no, then we fall back to composition and must sacrifice the automatic authorship time validation. We can instead do the validation with an automated test.

Demonstrating the Concepts

Okay, after all that theory, let’s go through an example to illustrate what the real lesson is here.  The type system is a tool available to you to turn runtime errors into compile time errors.  If there’s ever a point in your code where you need to throw an exception (and throwing an exception is definitely better than trying to cover up the problem and continue executing), think carefully about whether the error you’re detecting is detectable at authorship-time.  Are you catching an error in an external system, that you genuinely don’t control, or are you catching a programming error you might have made somewhere else in your code?  If it’s the latter, try to design your type system to make the compiler catch that error where it was written, without having to run the code.

If you ever read the documentation for a library and you see something like this:

You must call thisMethod before you call thisOtherMethod.  If you call thisOtherMethod first, you will get a OtherMethodCalledBeforeThisMethod exception

That’s a perfect example of not using the type system to its full advantage.  What they should have done was define one type that has only thisMethod on it, whose return value is another type that has only thisOtherMethod on it.  Then, the only way to call thisOtherMethod is to first call thisMethod to get an instance of the type that contains thisOtherMethod.

Let’s say we have a File class that works this way:

class File
{

public:

  File(const std::string& path); // Throws an exception if no file at that path exists, or if the path is invalid

  void open(bool readOnly); // Must call this before reading or writing

  std::vector<char> read(size_t count) const; // Must call open before calling this method;

  void write(const std::vector<char>& data); // Must call open with readOnly = false before calling this method

  void close(); // Must call when finished to avoid keeping the file locked.  Must not call before calling open
};

Now, let’s list all of the programming errors that could occur with using this class:

  1. Calling the constructor with an invalid path
  2. Calling the constructor with the path for a nonexistent file
  3. Calling read, write or close before calling open
  4. Calling write after calling open(true) instead of open(false)
  5. Calling read, write or open after calling close
  6. Not calling close when you’re done with the file

Think about all of these errors.  How many are genuinely runtime errors that we cannot know exist until the code executes?  There’s only one.  It’s #2.  #1 might be a runtime error if the path is built at runtime.  If the path is a compile-time constant, like a string literal, then we can know at compile-time if it’s an invalid path or not.  We can’t know whether a file at a particular valid path exists at the time of execution until we actually execute the code, and it can be different each time.  We simply must check this at runtime and emit a runtime error (exception) appropriately.  But the rest of those errors?  Those are all authorship-time errors.  It is never correct to do those things, which means we know at the time the code was written we did something wrong.

So, let’s use the type system to turn all of those errors, except #2, and #1 for dynamically built paths, into compile time errors.

First, let’s consider #1.  The underlying issue is that not every string is a valid file path.  Therefore, it’s not appropriate to use std::string as the type for a file path.  We need a FilePath type.  Now, we can build a FilePath from a string, including a dynamically built string, but we might not end up with a valid FilePath.  We can also build a FilePath in a way that’s guaranteed (or at least as close as the language allows) to be valid.  A valid file path is an array of path elements. What makes a valid path element depends on the platform, but for simplicity let’s assume that it’s any string made of one or more alphanumeric-only characters (ignoring drive letters and other valid path elements that can contain special characters). We can therefore define a FilePath as constructible from a std::vector of path elements:

class FilePath
{

public:

  FilePath::FilePath(const std::vector<FilePathElement>& pathElements) : _pathElements(pathElements) { }

  std::string stringValue() const
  {
    // Code to join the path elements by the separator “/“
  }

private:

  std::vector<FilePathElement> _pathElements;
};

Now, for the path elements, the tricky part is checking at compile-time that a constexpr string (like a string literal) is nonempty and alphanumeric.  I won’t go into the implementation, but the signature would look like this:

template<StringLiteral String>
constexpr bool isNonEmptyAndAlphaNumeric();

The template parameter here is a value that represents a string literal. You would call this code like this: isNonEmptyAndAlphaNumeric<"some string literal">(). Why not make the input string a function argument? Quite simply, C++ doesn’t support constexpr function arguments. It’s a proposed feature we’ll hopefully get soon. To get around this we have to bake value into a type as a template parameter.

Then, we would use this in the constructor for FilePathElement:

template<StringLiteral String>
FilePathElement::FilePathElement()
{
  static_assert(isNonEmptyAndAlphaNumeric<String>(), “FilePathElement must be nonempty and alphanumeric”);
}

If you aren’t familiar with some of this C++ stuff, static_assert is evaluated at compile time, and therefore will cause a compiler error if the passed in expression evaluates to false.  This of course means the expression must be evaluatable at compile time, which is what the constexpr keyword indicates.

We sometimes will need to construct a FilePathElement out of a dynamically built string.  But since we can’t confirm at compile time, we instead do a runtime check, and if needed create a runtime error:

FilePathElement::FilePathElement(const std::string& string)
{
  if(!isNonEmptyAndAlphaNumeric(string))
    throw std::invalid_argument(“FilePathElement must be nonempty and alphanumeric”);
}

Now we can define constructors for a FilePath that take strings.  Since we can’t know statically if it’s valid, it needs to throw a runtime exception:

FilePath::FilePath(const std::string& string)
{
  // Try to split the string into a vector of strings separated by “/“
  std::vector<std::string> strElements = splitString(string, “/“);

   // Try to convert each string element to a FilePathElement.  If any of them are invalid, this will cause an exception to be thrown
  std::transform(
    strElements.begin(), 
    strElements.end(), 
    std::back_inserter(_pathElements), 
    [] (const std::string& element) { return FilePathElement(element); // This constructor might throw }
  );
}

If you have no idea what’s going on in that std::transform call, this is just the STL’s low-level way of doing a collection map. It’s equivalent to this in Swift:

_pathElements = try strElements
  .map({ strElement in 
    
    return try FilePathElement(strElement); // This constructor might throw
  });

You might be thinking: couldn’t we make a FilePath constructor that takes a constexpr string (baked into a template parameter) and validates at compile time that it can be split into a vector of FilePathElements?  Maybe with C++17 or C++20, or earlier depending on how you to do it.  C++ is rapidly expanding the expressivity of compile-time computations.  Anything involving containers (which traditionally require heap allocation) at compile time is a brand spanking new capability.

Now, we can form what we know at compile-time is a valid FilePath:

FilePath filePath = {FilePathElement<“some”>(), FilePathElement<“location”>(), FilePathElement<“on”>(), FilePathElement<“my”>(), FilePathElement<“machine”>()};

If we did this:

FilePath filePath = {FilePathElement<“some”>(), FilePathElement<“location?”>(), FilePathElement<“on”>(), FilePathElement<“my”>(), FilePathElement<“machine”>()};

Then the second path element would hit the static_assert and cause a compiler error.

Okay, so what if we’re not using C++?  Well, then you can’t really prove the path elements are nonempty and alphanumeric at compile time.  You just have to settle for run-time checks for that.  You can at least define the FilePath type that is built from a list of path elements, but you can’t get any more assurance than this. The language just isn’t expressive enough. That’s an example of the practicality problem. Due to limitations of the language, we can’t bake the validity of a path element’s string into the FilePathElement type, and we therefore lose automatic compile errors if we accidentally try to create a file path from an invalid string literal. If we want static validation, we need to write a test for each place we construct path elements from string literals to confirm they don’t throw exceptions.

Okay, the next problem is #2.  That’s an inherently runtime problem, so we’ll deal with it by throwing an exception in the constructor for File.

Moving on, all the methods on File are always supposed to be called after open.  To enforce this statically, we’ll define a separate type, let’s say OpenFile, and move all the methods that must be called after open onto this type.  Then we’ll have open on File return an OpenFile:

class File
{

public:

  File(const std::string& path);

  OpenFile open(bool readOnly) 
  {
    // Code to actually open the file, i.e. acquire the lock 
    return OpenFile(*this, readOnly); 
  } 
};

class OpenFile
{

public:

  std::vector<char> read(size_t count) const;

  void write(const std::vector<char>& data); // Must have been called with readOnly = false

  void close(); // Must call when finished to avoid keeping the file locked.

  friend class File;

private:

  OpenFile(File& file, bool readOnly) : _file(file), _readOnly(readOnly) { }

  File& _file;
  bool _readOnly;
};

Notice that the constructor for OpenFile is private, but it makes File a friend, thus allowing File, and only File, to construct an instance of OpenFile.  This helps us guarantee that the only way to get ahold of an OpenFile is to call open on a File first.  Note that we can do the actual work to “open” a file (i.e. acquire the underlying OS lock on the file) in the constructor for OpenFile, instead of in the open method.  That’s an even better guarantee that this work must happen prior to being able to read, write or close a file.  Then, it won’t really matter if we make the constructor private, and open would just be syntactic sugar.

Now we have a compile-time guarantee that a file gets opened before it gets read/written to/closed.  We still have the problem that even if we pass in the literal true for readOnly and then call write, the failure will happen at runtime.  We need to move this to compile-time failure.  One idea would be to use constness for this purpose.  After all, we have already made read a const method and write a non-const method.  However, since const classes in C++ aren’t quite full-blown types (particularly, we can’t define a “const” constructor), this won’t really work here.  We need to make two separate types ourselves.  Then, we can split the open method into two variants for read-only and read-write:

class File
{

public:

  File(const std::string& path);

  ConstOpenFile openReadOnly() { return ConstOpenFile(*this); }
  OpenFile openReadWrite() { return OpenFile(*this); }
};

class ConstOpenFile
{

public:

  ConstOpenFile(File& file) : ConstOpenFile(file, true) { }

  std::vector<char> read(size_t count) const;

  void close(); // Must call when finished to avoid keeping the file locked.

protected:

  ConstOpenFile(File& file, bool readOnly) : _file(file)
  {
    // Code to acquire the lock on the file.  We pass in readOnly so we can acquire the right kind of lock
  }

  File& _file; // Do we even need to store this anymore?
};

class OpenFile : public ConstOpenFile
{

public:

  OpenFile(File& file) : ConstOpenFile(file, false) { }

  void write(const std::vector<char>& data);
};

This is also how we would do it languages that don’t support user-defined constness.

In C++, we can use templates to consolidate these two classes. For the methods, namely write, that are specific only to one of them, we can use SFINAE tricks in C++17 or earlier, but we can use constraints in C++20:

class File
{

public:

  File(const std::string& path);

  template<bool Writeable>
  OpenFile<Writeable> open() { return OpenFile<Writeable>(*this); }
};

template<bool Writeable>
class OpenFile
{

public:

  OpenFile(File& file) : _file(file) 
  { 
    // Code to acquire the lock on the file.  The template parameter Writeable is available to the program as an ordinary bool variable.
  }

  std::vector<char> read(size_t count) const;

  void write(const std::vector<char>& data) requires Writeable; // The compiler won't let us call this method on an OpenFile<false> instance 

  void close(); // Must call when finished to avoid keeping the file locked.

private:

  File& _file; // Do we even need to store this anymore?
};

We’re almost finished.  The remaining problems are preventing read, write and open from being called after calling close, and making sure close gets called when we’re done. Both of these problems boil down to: calling close must be the last thing that happens with an OpenFile. This is a perfect candidate for RAII.  We’re already acquiring a resource in initialization (a.k.a. construction).  This means we should release the resource in the destructor:

class ConstOpenFile
{

public:

  ConstOpenFile(File& file) : ConstOpenFile(file, true) { }
  ~ConstOpenFile()
  {
    // Code to release the lock on the file
  }

  std::vector<char> read(size_t count) const;

protected:

  ConstOpenFile(File& file, bool readOnly) : _file(file)
  {
    // Code to acquire the lock on the file.  We pass in readOnly so we can acquire the right kind of lock
  }

  File& _file;
};

By closing the file in the destructor (and only in the destructor), we’re making the guarantee that this won’t happen while we still have an instance on which to call other stuff like read and write (we could have a dangling reference, but that’s a more general problem that is solved with various tools to define and manage ownership and lifecycles), and the guarantee that it will happen once we discard the instance.

In reference counting languages like Obj-C and Swift, we can do this in the deinit.  In garbage collect languages like C# and Java, we don’t have as ideal of a choice.  We shouldn’t use the finalizer because it won’t get called until much later, if at all, and that will result in files being left open (and therefore locked, blocking anyone else from opening them) long after we’re done using them.  The best we can do is implement IDisposable (C#) or AutoCloseable (Java), and make sure we remember to call Dispose or close, or wrap the usage in a using (C#) or try-with-resources (Java) block.

And now, all those programming errors that can be detected at compile time, are being detected at compile time.

This is how you use the type system.  You use it to take any constraint you have identified is always satisfied every time your code gets executed, and express it in a way that allows it to be statically verified, thereby moving your runtime errors earlier to authorship-time.  The ideal we aim for is to get all program failures into the following two categories:

  • Failures in external systems we don’t and can’t control (network connections, other machines, third party libraries, etc.)
  • Static (compilation/linting) errors

In particular, we aim to convert any exception that indicates an error in our code into an error that our compiler, or some other static analysis tool, catches without having to execute the code. The best kinds of errors are errors that are caught before code gets executed, and one of the main categories of such errors are the ones you were able to tell the type system to find.

Examples Aren’t Specifications

“Specification by Example” is a particular implementation strategy of Behavior-Driven Development. The central justification, as far as I can tell, is expressed in the following snippet from the Wikipedia page:

Human brains are generally not that great at understanding abstractions or novel ideas/concepts when first exposed to them, but they’re really good at deriving abstractions or concepts if given enough concrete examples

Ironically enough, this is immediately followed with “citation needed”.

Anyone with experience teaching math will immediately understand what is off about this statement. The number of people who can see the first few numbers in a sequence, and deduce from that the sequence itself, is much much smaller than the number of people who can understand the sequence when they see it. If I endeavored to teach you what a derivative is by just showing you a few examples of functions and their derivatives, I would be shocked if you were able to “derive” the abstraction that way.

It’s not just a matter of raw intelligence. It is true that only the highly intelligent can engage in this kind of pattern recognition. This is, in fact, exactly what an IQ test is. But the bigger problem is that multiple sequences can have the same values for the first few elements. It is quite simply not enough information to deduce a sequence from a few examples of its elements.

Examples help illustrate an abstraction, and thereby make it easier to understand. First I present an abstraction to you. I explain that a derivative measures the rate of change of a function, in the limit as the change goes to zero. Then I show you examples to help you grasp it. I don’t do it the other way around, and I certainly don’t skip the part where I explain what a derivative is, and hope by simply seeing a few derivatives you’ll realize what I’m showing you.

The “specification by example” practices I’ve seen all recognize that it would be a terrible idea to have developers truly try to derive the product specification from examples. All of them supplement the examples with actual statements of the abstractions. They do what I said: explain a “rule”, then follow it with examples to help illustrate the rule. But then, out of some kind of confusion, the insistence is then to enshrine the examples as “the specification”, instead of the rules.

A good overview of how this practice is fit into BDD is given here. The practice of “example mapping” is applied to generate Gherkin scenarios for concrete examples of behavior. The essential practice is that Gherkin is written exclusively for concrete examples, and not for abstract rules.

Let’s go back to the Wikipedia article to see how a few cases of sleight-of-hand are applied in order to justify this. From the article:

With Specification by example, different roles participate in creating a single source of truth that captures everyone’s understanding.

This is, in fact, an argument for something completely different: elimination of the overlapping and largely redundant documents that different roles of a development organization maintain. It has nothing whatsoever to do with expressing specifications through concrete examples. A “single source of truth” is equally possible with specifications expressed directly. In fact, doing so is far better in this sense, because no interpretative burden is left on developers to get from what is documented to what is specified. Specifying directly by abstractions avoids each reader of the concrete examples deriving his own personal “source of truth” about what the examples mean.

We see this kind of thing a lot. The justification for scrum teams and ceremonies, apparently, is that it keeps manual testing load low. No, that’s the the justification for test automation. That has nothing to do with scrum teams. It is a very common practice to try to “trojan horse” some novel concept in by attaching it to another, unrelated and generally already widely lauded practice. Avoiding redundant documentation is already a good idea. It is not a reason to adopt an entirely unrelated practice of specification by example.

Continuing:

Examples are used to provide clarity and precision, so that the same information can be used both as a specification and a business-oriented functional test.

Examples don’t provide precision, they provide clarity at the expense of precision. This is the fundamental point of confusion here. Examples are not specifications. I can provide these examples of a business rule:

“If John enters a 2 digit number into the field and tries to submit it, he is unsuccessful”

“If John enters a 5 digit number into the field and tries to submit it, he is successful”

“If John enters an 8 digit number into the field and tries to submit it, he is unsuccessful”

There are so many ways to interpret what I’m really getting at with these examples, I can’t list them all. Is the rule that the number of digits must be between 3 and 7? 4 and 6? Exactly 5? No one would dare hand only this to programmers and expect them to produce the desired software. That’s why every “specification by example” system supplements these examples with an actual rule like, “the number of digits must be between 3 and 6”.

The imprecision of examples is exactly why they can’t be specifications. Examples don’t specify. They exemplify.

As for “business-oriented test”, that’s BDD and TDD. The specification should be the business requirement, not some technical realization of that requirement. The requirement should be tested, preferably with an automated test. None of that requires the specification to be expressed through concrete examples.

Continuing:

Any additional information discovered during development or delivery, such as clarification of functional gaps, missing or incomplete requirements or additional tests, is added to this single source of truth.

It is? Why? What makes that happen? Does specifying by example force developers to go back and amend the requirements when they discover something new? Of course not. Maybe they will, maybe they won’t. Hopefully they will. It’s good practice to document a discovery that led to a code change in the requirements. This has nothing to do with whether requirements are expressed directly (abstractly) or indirectly (through concrete examples).

Continuing:

When applied to required changes, a refined set of examples is effectively a specification and a business-oriented test for acceptance of software functionality. After the change is implemented, specification with examples becomes a document explaining existing functionality. As the validation of such documents is automated, when they are validated frequently, such documents are a reliable source of information on business functionality of underlying software. To distinguish between such documents and typical printed documentation, which quickly gets outdated,[4] a complete set of specifications with examples is called Living Documentation.

This is an argument for tests as specifications, wherein the specifications are directly used by the CI/CD system to enumerate the test suite. The problem is that “a refined set of examples” cannot effectively be a specification. The author of this paragraph actually understands this. That’s why he says “specification with examples” (emphasis mine), instead of “specification by examples”, which is what this article is supposed to be advocating for. That one change in preposition completely alters what is being discussed, and completely undermines their case.

There are multiple (usually infinitely many) specifications that would align with any finite set of examples. Concrete examples simply don’t map 1-1 to abstractions. There’s a reason why the human mind employs abstractions so pervasively. I can’t tell you I’m hungry by pointing to a girl who is eating, and also sitting at a table, and also reading something on her phone (am I telling you I’m hungry, or that I want to sit down, or that I want to play on my phone, or that I think that girl is cute?)

Like I keep saying, everyone actually knows this is nonsense. If anyone really believed “specification by example” were possible, they would deliver only a set of examples to a development team and tell them to get working. They don’t do that. In the Cucumber world of “example mapping”, the actual acceptance criteria are of such critical importance, they are elevated to the same first-class citizen status as examples, and placed directly into the .feature files.

The rules are placed in .feature files as commented out, unexecutable plain English. If those rules change, well, maybe someone will go update that comment. We could completely overhaul the actual scenarios, the Gherkin (which is executable code and entails at least some sort of code change), and not touch the rules, and everything would be fine. These rules don’t explain existing functionality at all. They’re just comments, and you can write anything in them. They’re as bad as code comments for documentation.

By sticking with the practice of writing Gherkin for examples, instead of rules, the Gherkin ceases to be the specification. That’s why the feature files have to be augmented with a bunch of plain English. That English is actually the specification. All that’s happening here is that the benefits of a DSL like Gherkin are not exploited. The specifications are written in English, which is ambiguous, vague and imprecise (particularly in the way most people use it). To whatever extent the examples help resolve these ambiguities (especially when those examples are written in Gherkin), it would be far more effective to write the rules in Gherkin. The whole point of Gherkin is that English is too flexible and imprecise of a language with which to express software specifications. Writing Gherkin in a way that requires it to be supplemented with plain English negates its benefit.

My point is not that examples are unhelpful. Quite to the contrary, examples are extremely helpful, and often crucial in arriving at the desired abstractions. But “specification by example” assigns an entirely inappropriate role to examples. The primary role of examples is to motivate the discovery of appropriate specification. Examples stimulate people, particularly the ones who define the specifications, to think more carefully about what exactly the specifications are. A counterexample can prove that a scenario is too generic, and that a “given” needs to be added to constrain its scope.

Let’s return to the initial quote from the article. In my experience, the inability to understand abstract specifications is a nearly nonexistent problem in software development. I don’t ever remember a case where a requirement was truly specified in unambiguous terms, and someone simply drew a blank while reading it (or even just misinterpreted it, which would require an objectively wrong reading of the words). Instead, requirements are vague, unclear, ambiguous, confusing, and incomplete. Here’s an example:

When the user logs in, he should see his latest activities

What does that mean exactly? What counts as an “activity”? How many of the latest ones should he see? Is there a maximum? How are they displayed to the user? How should they be ordered?

The problem here isn’t that this requirement is so mind-blowing that we need to employ the tactics of a college level math lecture to get anyone to comprehend it. The information simply isn’t there. This is a lousy requirement because it isn’t specific, which means it isn’t a specification.

Really, what the supplemental examples do is fill in information that is missing in the rule. I can supplement this requirement with an example:

Given a user Sally
Given Sally has made 3 purchases
Given Sally has changed her delivery address 2 times
Given Sally has changed her payment info 3 times
When Sally logs in
Then Sally sees, in descending order by date, her 3 purchases, 2 delivery address changes, and 1 payment info change

Okay, this example is hinting at more information. A purchase, a delivery address change, and a payment info change, are all examples of “activities”. Great. That was a missing detail in the “rule”. It specifies an ordering. It also seems that the recent activity is limited to 6 items. That was also a missing detail in the rule.

But I can interpret that differently. Maybe the rule is that there is no limit to the total number of activities shown, but there is a limit to only show 1 payment info change. Both of those rules fulfill this example. We need the actual rules.

Relying on examples in this manner is just a way to get by with vague and incomplete “rules”. In fact, if there is ever a perceived need to supplement a rule with examples, that is a very reliable proof that the rule is incomplete and needs to be improved. We can take the fact that the rule is enough on its own, no examples needed, as a bellwether for the completeness and specificity of the rules.

Making the examples the target for Gherkin, which is what turns into your acceptance tests, completely fails as a BDD/TDD mechanism. The fundamental process of development driven by behaviors and tests is that you don’t touch the production code unless and until, and only to the minimal extent that, a failing test requires it. If you’re only writing tests for specific examples, the minimum work you need to do to make those tests pass is to satisfy the examples, not the rules.

I could write code that simply hardcodes 3 purchases, 2 address changes and 1 payment info change into the “activities” view on the home screen. Doing so would almost certainly be easier than fetching the logged in user’s real list of activities, parsing and truncating them. That would make this test pass. Even if there are a couple more examples with different sets of example activities, I could still get away with hardcoding them. And to the extent that the examples are our “documentation”, this is correct. But I know that’s not what I am supposed to be doing, so eventually, even though all the tests are passing, I have to go in and start messing with production code to make it do what I understand is really what we want it to do. In this workflow, acceptance tests simply aren’t the driving force of the production code, in any sense. They revert to the old role of tests as merely being verifications.

(This hints at a bigger discussion about whether tests, even under the hood, should ever use hardcoded stub data. Doing so always risks a false positive when the production code also hardcodes the same data, but it’s a very common and quick-to-stand-up method of testing. If this is an implementation detail of the tests, at least the test self-documents that this hardcoded data isn’t part of the test definition, or the requirement, which is certainly much better than a test in which arbitrary hardcoded stub data is right there in the test and requirement definition. The problem of “I can make this test pass by hardcoding the same data in production code” is still present, but arguably at a much smaller risk of occurring, because it’s clear from reading the test that those stubbed values are fake and private to the test implementation. If you want to fully eliminate this problem, you should randomly generate the fake data as part of executing the test.)

The fact that different people come away with different understandings of what exactly the requirement means does not point to some defect in the human brain’s ability to comprehend abstractions. It points to a defect in the language of the requirement, which genuinely does not specify exactly what the requirement is. The pervasive problem is vague requirements, not developers who can’t understand what the product owner wants. The problem is a language problem, not a comprehension problem. That’s why the solution is a domain-specific language (Gherkin), not a brain transplant.

Examples are fine, and they can help. But they don’t get rid of the problem that the plain English business rule, I can almost guarantee you, is vague and ambiguous. Even if the ritual of communal exemplification causes the participants to all reach a shared understanding, it’s not going to help the next guy who comes along. And in case this isn’t clear, specifications are documentation, and documentation lives longer than any particular team member’s involvement. The whole point of the “document” is to be the thing anyone reads when they need to get an answer to something.

No one really believes examples can be the documentation. So when you insist on your Gherkin being only for examples, you necessitate plain English documentation. Is it better to have plain English documentation plus examples in Gherkin, than to just have plain English documentation? I’m sure it is. But both are far, far inferior to having all Gherkin documentation (the major exception here is visual requirements, in which case a picture is literally worth a thousand words. Visual requirements, aka fonts, colors, spacing, sizes, etc., are best expressed visually). The point of this is to produce true, precise specifications of the product. Plain English specifications aren’t precise enough, and examples (in any language) are even worse in this sense. Keeping examples around only allows you to get away with incomplete specifications. You shouldn’t need examples to supplement rules. The rules should be expressive and clear enough on their own. You can use examples to help arrive at that clear, expressive rule. Once you do, the examples are scaffolding and they can, and probably should, be torn down.

Specifications define. Examples illustrate. It will cause nothing but trouble to confuse the two.

What Is “Agile” Anyway?

Introduction

What is “agile” software development?

Well, more than anything, it’s definitely not waterfall.

At least in the case in this age of big consulting firms teaching software shops how to be “properly agile” (Big Agile, or Agile, Inc., as we call it), that’s pretty much its definition. Waterfall is the ultimate scapegoat. However much we try to pin down its essential features, what matters, and what ultimately verifies whether we have correctly described it, is that it is the cause of everything that ever went wrong in the software world.

Accusing something of being “waterfall” is the ultimate diss in the agile world. The defense will always be to deny the charge as ridiculous.

This really hits home if you, like the proverbial heretic going through a crisis of faith, start doing some research and discover that “waterfall” originated in a paper (not by name) as the description of a failure mode in software development where fundamental flaws (meaning in the very design of a feature) are discovered after designing and developing software, during testing at the very end. Royce’s solution to this problem not only has nothing to do with agility, but largely runs directly contrary to the popular mantras of agile shops (he placed a ton of emphasis on meticulous documentation, where in agile circles not documenting things is elevated to a principle).

In the effort to comprehend this, I have often seen people identify the essential feature of “waterfall” as being the existence of a sequence of steps that are performed in a specific order:

feature design -> architecture/implementation -> testing -> delivery

Any time this kind of step-by-step process rears its ugly head, we scream “waterfall!” and change what we’re doing.

The results are… humorous.

After all, these steps are necessary steps to building a piece of software, and no process is ever going to change that. You can’t implement a feature before the feature is specified. You can’t test code that doesn’t exist yet (TDD isn’t saying “test before implementing”, it’s saying “design or specify the test before implementing”), and you at least shouldn’t deliver code before testing it. The only way to not follow these steps, in that order, is to not build software. From the paper:

One cannot, of course, produce software without these steps

(While these are the unavoidable steps of building software, this does not imply that other things, commonplace in the industry, are unavoidable. This includes organizing a business into siloed “departments” around each of these steps, believing that “architecture” and “implementation” are distinct steps that should be done by different people, etc.)

As the Royce paper explains, the existence and order of these steps is not the essential feature of the “waterfall” model he was describing. The essential feature, the “fall” of “waterfall”, is that the process is unidirectional and there is no opportunity to move “up”. More specifically, once the process starts, it either goes to the very end, or must be aborted and restarted all the way from the top. There is no built-in capability to walk back a single step to address and correct problems. This is really because failures that require jumping back to the beginning are discovered too late. Royce’s correction is intended to ensure that flaws in the execution of one step are always discovered at latest in the next step, and thus no need to jump back more than one step arises.

But what does this have to do with agility? Well, nothing. Royce wasn’t talking about agility. In fact, his solution is to do a ton more up-front analysis and design of software before starting any other steps, in order to anticipate and correct fundamental flaws at the beginning. This is basically the opposite philosophy of agile, which is to embrace failure at the end but to speed up the pipeline so that you get to the end quickly (this hinges on the idea that you can compartmentalize success and failure into individual features, and thus if a single feature fails it “wastes” only the effort invested into that one feature rather than an entire release. In my opinion this is an extremely dubious idea).

What even is agility, in the context of building software?

The Definition of Agility

Let’s remind ourselves of what the word “agility” actually means. Anyone who’s played an action RPG like the Elder Scrolls should remember that “Agility” is one of the “attributes” of your character, for which you can train and optimize. In particular, it is not “Speed”, and it is not “Strength”. Agility is the ability to quickly change direction. The easiest way to illustrate agility, and the fact it competes with speed, is with air travel vehicles: airplanes and helicopters. An airplane is optimized for speed. It can go very fast, and is very good at making a beeline from one airport to another. It is not very good at making a quick U-turn mid-flight. A helicopter, on the other hand, is much more maneuverable. It can change direction very quickly. To do so, it sacrifices top speed.

An airplane is optimized for the conditions of being 30,000 ft. in the air. There are essentially no obstacles, anything that needs to be avoided is large and detectable from far away and for a long time (like a storm), and the flight path is something that can be pretty much exactly worked out ahead of time.

A helicopter is optimized for low altitude flight. There are more smaller obstacles that cannot feasibly be mapped out perfectly. The pilot needs to make visual contact with obstacles and avoid them “quickly” by maneuvering the helicopter. There is, in a sense, more “traffic”: constantly changing, unpredictable obstacles that prevent a flight path from being planned ahead of time. The flight path needs to be discovered and worked out one step at a time, during flight.

(This is similar to the example Eric Reis uses in The Lean Startup, where he compares the pre-programmed burn sequence of a NASA rocket to the almost instantaneous feedback loop between a car driver’s eyes and his hands and feet, operating the steering wheel, gas and brakes)

The airplane is optimized for speed, and the helicopter is optimized for agility. You cannot optimize for both. This is a simple matter of the “engineering triangle”. Airplanes and helicopters are both faster and more agile than, say, a giant steamboat, but the choice to upgrade from a steamboat to an air travel is obvious and uninteresting. Once you make this obvious choice, you then need to make the unobvious choice of whether to upgrade to an airplane or to a helicopter.

The reason I say this is because “agile” practices are often confused with what are simply good engineering practices; practices that are more akin to upgrading from steam powered water travel to air travel than choosing an airplane or a helicopter. Those practices are beneficial whether you want to be fast or agile.

So what does “agility” (choosing the helicopter) mean in software?

Agility means rapid iteration.

Okay, what does iteration mean, and how rapid is “rapid”?

Iteration means taking a working software product, making changes to it that keep the product in a working state, and delivering the new version of the product.

What does “working” mean? That’s nontrivial, but the spoiler is, it’s decided entirely by the users of the software. Believe me, they’ll tell you whether it’s working or not.

Rapid means, basically, significantly more frequently than what you’d see in a “classical”, “non-agile” shop… which I’d say typically releases new versions every 6-12 months. So, maybe 2-3 months is the absolute upper limit of release frequency for “agile” shops. The goal is usually in the range from every 2 weeks, down to multiple times per day.

Let’s be absolutely clear that agility has nothing to do with speed. Speed refers to how many features or quality improvements per unit time (on average) your shop can deliver (let’s ignore the problem of how to define a “unit” of feature/quality with which to “count” or “measure” them). If you are a traditional shop who delivers 120 units of feature/quality improvements every 6 months, then your speed is 5 units/week. If you are an agile shop who delivers 5 units of feature/quality improvements every week, your speed is also 5 units/week. One shop delivers a big release every 6 months, the other delivers small releases every week. One has low agility, the other high agility, but both have the same speed.

We would measure agility not as the units of feature/quality per unit time delivered, but the inverse of the average release period (release frequency). The first shop’s release frequency is 1 / (6 months), or 1/24 per week. The second shop’s release frequency is 1 per week, and is thus 24 times more agile than the first shop.

We can see from this that agility and speed are separate variables that, at least in principle, can vary independently. If the practice you’re proposing would make a shop both faster and more agile, it has nothing to do with agility per se, but is a whole-system optimization. Using a high-level language instead of assembly is a whole-system optimization. Writing better architected code is a whole-system optimization. You don’t do those things because you want agility (implying you wouldn’t do them if you didn’t care about agility), you do those things because they are better, more advanced practices that improve software development in general.

But after you’re done adopting all the whole-system improvements you can think of, you then face a choice of adopting practices that further optimize one aspect of the system, but you can’t choose them all simultaneously (to do one is to not do the other), and thus you choose to either further optimize one aspect or to further optimize the other. You can choose to optimize for speed, but at the cost of not further optimizing for agility, resulting in a shop who can work faster (higher average feature/quality per week) but deliver less frequently. Or, you can choose to optimize for agility, but at the cost of not further optimizing for speed, resulting in a shop who can release more frequently but works slower.

Objecting that it’s possible to pick both and optimize everything at once is asserting that mutually exclusive practices that optimize one over the other do not exist. Yes, obviously some practices improve both agility and speed, and perhaps everything else too. The point isn’t that such choices don’t exist. The point is that rivalrous choices (choices that cannot coexist, to do one is to not do the other) exist too, and you eventually have to make those choices.

I’m belaboring this because I encounter what I call “anti-economic” thinking a lot… this is where people declare that opportunity costs as a category don’t exist, and choice-making is a straightforward process of figuring out what choices are better in every aspect. This is not how life works. Choices that have no opportunity costs are so uninteresting and unconscious there’s little point in talking about them. This is why economists say that all choices have opportunity costs… “choices” that don’t have costs basically don’t even count as “choosing”. Even something that at first appears to be a non-choice, like adopting air travel over steamboats, comes with temporary opportunity costs (you have to build the airplanes, helicopters, airports, etc., which could take years or decades until you have a robust infrastructure, whereas your steamboats work today).

The Fundamental Problem

Now, what does it take to become agile? A legitimate answer to this question will make it obvious how optimizing for agility does not optimize for speed. If you want to genuinely release every two weeks, you’ll end up with an overall lower average units of feature/quality per week delivery rate than if you were willing to sacrifice frequent delivery.

What we’re asking is, if you’re able to deliver 120 units of features/quality every 6 months, why is it not trivial to start releasing 5 units every week? You have to run the pipeline (feature design -> architect/implement -> test -> deliver) for each feature. Do we just need to reorder things so that we run the whole pipeline for individual features instead of batching? Is that all it takes?

Well, why would shops ever batch to begin with? Why would they design 120 units of features before starting to implement any of them, and then implement all of them before starting to test any of them? Understanding why any shop would batch in this way (and they do) is key to understanding what it takes to become agile.

Let’s imagine we start a new greenfield project by releasing one feature at a time, beginning to end. Now, the first obvious problem is that different people perform each of the pipeline steps. Your designers spend a day designing, then what? They just sit around waiting until the developers finish implementing, then the testers finish testing, the feature is released and the next one is ready to begin? Same question for all the other people.

No, you do what CPUs do, and stagger the pipeline: once the designer designs Feature 1 on Day 1, then she starts on Feature 2 on Day 2, and so on. By the time Feature 1 is delivered at the end of Day 4, she’s designed Features 1-4. And developers have implementing Features 1-3, and testers/bug fixers have tested and stabilized Features 1-2, and DevOps has released Feature 1.

So, we’re already batching by the amount equal to what it takes to run the whole pipeline. But that’s fine. We batch only that amount, so we’re only at most 4 features ahead (and only in the design phase) of what’s released. That’s pretty damn agile.

At the end of the first cycle, we have a software product that performs Feature 1.

Now we start on Feature 2. Is this exactly the same? What’s different now compared to the beginning? The difference is instead of working on totally greenfield software, we’re working on slightly less greenfield software: specifically software that performs Feature 1.

The designer now has to design Feature 2 not in a vacuum, but alongside Feature 1. The developers has to write code not in a brand new repo, but in one with code that implements Feature 1.

And the testers? What do they have to do?

That’s the key question.

Do they need to just test Feature 2 once it’s implemented? No, they have to test both Feature 1 and Feature 2. You can’t release the next increment of the software unless all the features it implements are in acceptable condition.

In short, testers have to regression test.

Being agile now is trivial: just run the whole pipeline each time. This is the first crucial insight we have:

Agility is trivial at the beginning of a project

The important question is then: does it stay trivial? How does this situation evolve as the software grows?

How does a designer’s job change as she designs in the context of a growing set of already designed features? Well, I’m not a designer, so don’t quote me on this, but presumably things should get easier. As long as you’re gradually building up a design language, with reusable components, each feature becomes more of a drag-and-drop assembly process and less of a ground-up component design process. That should accelerate each feature design. If you do poorly at this, maybe you get slower and slower over time, as it becomes harder to cram things in alongside other features, and you haven’t built up reusable components.

How does a developer’s job change as he develops on a growing codebase? I am a developer and I can talk for hours about this problem. If the code is well architected, it becomes easier to add new features for a similar reason with design: there’s a growing set of reusable code that just needs to be assembled into a new feature, utilities have been written to solve common problems, the architecture is flexible and admits modification and extension gracefully, and so on. In this case, feature development is accelerated as the set of existing features grows. If, instead, the code is tightly coupled spaghetti with random breakages popping up on the slightest change (“jenga code”), then feature development is decelerated, and you eventually might find that adding a single new button to a page takes a week.

How does a tester’s job change? Given that testers must test the entire software, there’s simply no opportunity for manual testing to accelerate. The testing burden clearly grows linearly as the software grows, and there’s nothing you can do about this (except skip regression testing, which amounts to skipping the testing pipeline stage). This means once you have 10x the number of features you do today, you can expect testing, which has to occur before each feature is released, will take roughly 10x as long.

Do you see the problem?

Now, I’m talking about manual testing. In manual testing, the bulk (not entirety, just most) of the cost is in running the tests, rather than designing them. This is because design is done once, but running has to be done over and over. But if we substitute automated tests for manual tests, suddenly the cost of running them becomes trivial, leaving only the design cost. The design cost is significantly higher, because telling a computer to do something is harder than telling a human to do it, especially if it involves any actual intelligence. This is huge, because it’s the cost of running tests that grows linearly with the existing feature set. Reduce that so far as to effectively eliminate it, and then you kill the linear growth of testing cost with current features.

Done well, design effort per feature goes down as the number of already designed features grows
Done well, development effort per feature goes down as the number of already implemented features grows
Automated testing is similar to the others: done well, things get faster. But manual testing is fundamentally different: effort grows linearly with the number of existing features.

Now, you have to pay the cost of designing a feature for every feature, and you have to pay the cost of implementing a feature for every feature. But you only have to pay the cost of testing for every release.

Classical Software Development

The “classical” practices of software development emerged to solve for these two facts:

  • Manual testing effort grows linearly with the number of existing features
  • Manual testing must be done once per release

When I talk about the “testing” phase, this doesn’t just involve the QA folks. It also involves some contribution from engineers. Testing is a cycle run between devs and testers, and it is typically run many times during a single testing phase. Testers receive the first build, file a bunch of bug tickets, developers fix those bugs and deliver the next build, testers close some bugs and open some others, rinse and repeat until all showstopper bugs are resolved.

Every new build given to testers has to be fully regression tested. Therefore, it is least wasteful to minimize the number of builds testers need to test. This involves two parts: minimizing the number of times the test-fix-deliver cycle is run for a single release, and minimizing the frequency of releases.

For the first part, minimizing the number of builds testers have to test means two things: first, fixing as much as possible in a single pass before delivering a build, and second, minimizing the chance that new bugs will arise in the next build. For the first, developers fix all filed showstopper bugs before cutting a new build. They don’t fix one bug and make a build, then another and make another build. For the second, developers only fix the filed bugs and make no other changes to the code.

This is what we call a code freeze. A code freeze is a crucial aspect of the classical, non-agile software development process.

As the software grows, running this whole process on a single build, which includes a large number of features for a mature product, can take months. A pipeline can only be run as fast as its slowest stage. Therefore, the software delivery pipeline can only be run once every several months. This means designers and developers will have several months to do implementation before the testing phase can start testing them. They’ll implement months worth of features, and the next testing phase will run once testing all of those new features.

It’s not necessary that design deliver in batches that will ultimately be tested in a single release. Designers can deliver features one after another every few days and developers can implement them, and they’ll pile up on the doorstep of the testing phase until the next release goes out. But consider what happens if we decide, during the testing phase for a release, we want to change the features of that release, either by adding, taking out or modifying features. Developers will have to make those changes, which are much more likely than mere bug fixes to generate more bugs, which will likely have a ripple effect of multiple additional test-fix-deliver cycles.

That’s very expensive. It’s a request to unfreeze the code during a code freeze.

For this reason, such requests have to be formally approved and rigorously justified.

Since there’s essentially “no backsies” on what gets admitted into a particular release, and releases are infrequent, the business (especially marketers and designers) typically want to think carefully about what goes into a particular release, and they’ll work that out ahead of time. This leads to batching the feature design for an entire release before handing it to developers.

Now, another important aspect of this is that finding and fixing bugs is taken care of during the testing phase, which means it’s wasteful to also try to do this during the implementation phase. Since you must pay the cost of the test-stabilize cycles during testing, you might as well not pay that cost during development. This means developers implement but don’t stabilize. They will only fix bugs that block development. For anything else, fixing a bug during development only risks the bug regressing later in development, and requiring it to be re-fixed. That’s wasteful, just fix it once. The point of the code freeze is minimizing the chance of regression. That’s why it’s most efficient to do only the bare minimum of bug fixing during the development cycle.

Due to staggering, the development team needs to be busy working on implementing features for the next release simultaneously with the current release being tested and stabilized. This translates into a branching policy called unstable master: the “master” or “trunk” branch of the code repo is where developers implement new features. Since they’re doing only bare minimum stabilization, master is always unstable, and never in a releasable state.

When the features for a release are all implemented, a new branch is created off of master, a release branch, and the code freeze is applied to work in this branch: only bug fixes are allowed to go into this branch (unless a change request is approved). Once testing is completed, a release is cut from the release branch. During this time, development for the next release is occurring in master. Once the release is cut, the release branch is merged back into master, in order to get the bug fixes into master.

The release branch is maintained after releasing, in order to make emergency fixes. For example, if we release version 1.5 from the release-1.5 branch, and customers discover a showstopper, we apply the bug fix to the release-1.5 branch and release it again. This ensures that if we need to make patches to the current live version, we have the exact version of the code currently live, and we can apply only the bug fix to it. Each time this is done, the release branch is merged back to master to get the emergency bug fix in.

Hopefully, after the build is released from the release branch, or at least soon after, feature development for the next release is done, and you can then create the next release branch off of master.

You don’t want multiple simultaneous release branches. Trust me, you don’t. I had to do that once.

You have to try to merge the bug fixes into master then merge them all into the open release branches. The staggering works by working on the next release in master, and the current release in the one open release branch. Obviously this gets screwed up when you have to make emergency fixes, but that’s just another reason why you want to minimize the chance that ever happens.

And thus we have the classical, non-agile development process:

  • Business/marketing carefully plans a large (6-12 month) batch of features to release all at once, and figures out how they’re going to market them.
  • Design takes the release roadmap and produces a design document with all the requisite features. Marketing starts working on the marketing campaign.
  • Developers receive the design document, and work in master implementing but not stabilizing the features
  • A release branch is made off of master, the testing phase is run, with test-fix-deliver cycles repeatedly done on release branch builds until all showstopper bugs are fixed.
  • The final build that passed QA is released publicly and the marketing campaign goes live.

This process evolved naturally out of the fact that testing requires full regression but this only needs to be done once per release.

Agile Software Development

The goal of agile software development is to be able to release a small number of features frequently. The logical conclusion of agile development is to release each single feature one after another, and thus do no batching of features in releases at all.

The most obvious practice we have to adopt is test automation.

If you want to release, say, once every two weeks, you simply cannot run this manual test-fix-deliver release build cycle every time. It will become infeasible to do regression in this way and still release biweekly for a greenfield project in probably a matter of months.

The goal is not to eliminate the QA department (as it is often misunderstood to be), but rather to focus manual QA entirely on exploratory testing. All known requirements, either discovered during product development or from exploratory testing, must lead to an automated test.

Quantitatively, the amount of behavior that, if broken, is a showstopper, that is not covered by automation must remain roughly constant. This is the fundamental criteria that eliminates the linear growth of testing effort with number of existing features. The constant amount of uncovered behaviors must remain small enough that test-fix-deliver cycles focusing only on those few uncovered areas can be feasibly done every two weeks, or whatever your release cadence is.

You don’t have to achieve 100% coverage, you just have to keep the amount (not percentage) of uncovered stuff constant. Since the denominator will grow but the numerator remains roughly fixed, that means you’ll asymptotically approach 100% coverage.

The goal, really, is to eliminate the need for a code freeze. We are, in a sense, inverting the process. Instead of implementing but destabilizing the code, we have to prevent the code from ever destabilizing, which moves the stabilization work up to right after implementation of each thing (really, on every modification to the code).

This leads naturally to the inverse branching policy of stable master. Rather than create release branches and stabilize there, master is kept stable, and development work is done in feature branches that are quickly merged back to master. Master gets a new feature one at a time, and it does so with assurance that all existing features still work. This means automation is enforced in such a way that master cannot accept a feature branch unless all automated tests pass.

The presence of automation changes the way developers work. Rather than discover much later that something broke, they are given early news of this by the automated tests, and are required to fix it now. This means bugs will get fixed repeatedly, and very frequently, as high as once per feature. That’s a key point, we’ll come back to it.

Automation eliminates the linear growth of the testing phase. It gets us off that green line and onto either the blue or orange curve in the graph above. Then, all three of the phases have similar looking graphs of effort per feature as a function of number of existing features. In all cases, the effort grows or diminishes independently of the number of existing features, and instead depending on how well each phase is executed. This is the fundamental challenge of maintaining agility: that the effort needed to get a single feature all the way to delivered is roughly constant, and doesn’t grow steadily as the project goes on.

But while we have now decoupled the pipeline from the number of existing features directly, we can still see that poor practices will eventually lead to the per feature effort growing. This will kill agility over time. This means maintaining agility requires adopting best practices in designing, implementing, and building test automation.

But these are not specific to agility. Remember that manual testing effort grows linearly with the number of existing features, but it only has to be paid per release. All the rest, including the effort of building automated tests, must be paid per feature. If you end up on the orange curves, the whole process is going to slow down whether you’re releasing frequently or not.

In other words, poor design, implementation and automation practices will slow down any shop, even the classical non-agile ones. This is essentially a tautology: such practices are deemed “poor” precisely because they work to the detriment of any software development process.

Good engineering practices are, therefore, whole-system optimizations (this is, again, a tautology). Every software shop should be doing their best to adopt the best design, implementation and automation practices. They should be working to make reusable components that can be easily composed, building code in a manner that makes modification easier rather than harder over time, and so on. What exactly these practices are is nontrivial to determine. Discovering and executing them is the essence of being a craftsman in this highly technical industry. A good developer is one who knows what those practices are and how to practice them. Same with designers.

That is irrelevant to agility per se, beyond the obvious fact that failing to adopt good practices will also screw up your ability to be agile, along with screwing up everything else.

Thus, at the end of all of this, the practices that are specifically about optimizing for agility come down to one thing:

Test Automation

That’s it. I could have just told you that at the beginning, but I doubt you would have believed me.

To optimize for agility, you dive headfirst into thorough test automation, and you take extremely seriously the requirement that you must keep the amount of uncovered scenarios roughly constant as the software grows. Basically, you’ll achieve high agility when you’re confident enough in your test automation that you’re willing to release without manual testing a build first.

The Competition with Speed

Now that we’ve discovered the key practice for optimizing for agility, let’s explore how optimizing in this way competes with optimizing purely for speed.

In short, how does making yourself able to release frequently necessarily make you slower overall?

Now, remember, the biggest reason why most shops are nowhere close to agile isn’t simply because they don’t have good automation. They’re rife with poor design and engineering practices. Addressing that is a whole-system optimization and will make them both faster and more agile. Remember, we’re talking about the choice we have after we make all these whole-system optimizations. Let’s say we’re already using top-tier practices across the board. How, then, does optimizing for agility make us slower than we could be if we still kept all those practices top-tier, but were willing to sacrifice frequent releasing?

Obfuscating what I’m about to explain is a big part of Big Agile’s consulting practices. I’ll talk about what their goals are, and what their executive/middle management customers want to be told by process consultants another day.

The most obvious way that we sacrifice potential speed is by spending so much time writing and maintaining automated tests. Now, when I say this, someone is surely going to respond, “but automated tests is a whole-system optimization!” Yes, absolutely… to a degree. Having some automation is surely a whole-system optimization over having no automation whatsoever. Developers experienced with it will tell you that it honestly makes our jobs easier in many cases to whip up a few unit tests. This is because even development without stabilizing requires some form of testing during development (to at least confirm the happy path is functional), and developers can easily waste a lot of time running this cycle with manual testing.

If you’re adding a button to a screen, and it takes 30-60 seconds to open up the app and get to that screen, and you’re doing this over and over, dozens of times, in the process of working on the button, you could definitely be slowing yourself down by not spending 5-10 minutes writing a unit test that performs the same verification in 3 seconds.

I’m not talking about simply having some automation. I’m talking about having so much test automation that it allows you to release a mature software product without manual testing at all.

That’s a s***load of automation, man.

Remember, our metric for success is that the total (not proportional) amount of critical (broken = showstopper, can’t release) functionality that is not covered by automated tests is constant over time. We have to asymptotically approach 100% test coverage… and I mean real, quality tests that will genuinely prove system integrity.

It takes a lot of time to both create and maintain that level, and that quality, of automated tests. You simply don’t have to do all of this if you only want to release infrequently. It’s going to slow you down significantly, relative to classical development, to invest so much time and effort into test automation. What you lose is raw development speed. What you gain is extremely fast, and reliable, assurance that the system still works, and can therefore be released again.

Next, let’s talk about what all those tests actually do for us. Tests don’t fix our code, they just announce to us that it’s wrong. Who fixes code in response to a failing test? Developers!

To release more frequently, you inevitably have to fix bugs more frequently. Assuming a certain quality of code and developers, things will tend to break (and re-break) at a particular frequency. I’m not saying things break more frequently in an agile process… they break exactly as frequently. But, you have to fix things more frequently… at least once per release cycle, and agility means more frequent release cycles.

Remember that I emphasized in classical software development, developers working on new features only implement but don’t stabilize stuff in master. Then they only stabilize at the end (ideally once, but in reality maybe two or three times, the test-fix-release cycle gets run and some things regress during this phase). They don’t have to keep re-stabilizing every week, or every day, but that’s exactly what all that automation makes you do.

This may seem like it evens out because, being more frequent, releases in agile let less time pass by, and therefore (for a given frequency of breakages) less gets broken. Having to fix dozens and dozens of bugs “all at once” in a classical shop may feel daunting, while in an agile shop you only ever produce a couple of bugs before you fix them, and that’s less intimidating. But this is deceptive (in the same way it “feels” less destructive to your budget to buy tons of cheap stuff compared to buying one big expensive thing). You’re ultimately spending more time fixing bugs in the agile process, because bugs are often repetitive (the whole point of regression testing is to address this). You end up fixing a bug every week instead of every 3 months, which (unless it’s such a severe bug it interferes with development work) is wasted effort if the code isn’t getting shipped out every week.

There’s other stuff you need to build to effectively release frequently, including robust rollback mechanisms, but those are smaller issues. The big one is that you have to write automated tests for every little tiny nook and cranny of the app, and the presence of those tests are literally just going to slow you down by making you fix stuff as soon as you break it, and fix it again as soon as you break it again. That’s not a bad thing… if you want to release frequently. But it’s going to cost you in raw speed.

Conclusion

If you decide that agility really is important, I hate to be the one to tell you, but your goofy team names, weekly “demos” (the quotes there are very intentional) and “backlog groomings”, and story points are completely irrelevant to that goal. You need to instead go all in on test automation, and also make sure you’re not building spaghetti code that’s going to collapse under its own weight in 6-12 months (the latter is always important, but spaghetti code might collapse an agile project faster than a classical one). And you need to not let yourself get tricked by the agility you demonstrated at the beginning (typically way before the software is ready to be delivered to any real customer). The fact you were able to show increments frequently in greenfield says nothing about your continued ability to do so on a maturing product.

No matter what kind of shop you are, stay on top of the crafts of product design and engineering. That will help you in all aspects and make you better overall. There’s no reason not to (the upfront investment will always pay for itself many times over).

With that out of the way (and emphasizing it’s unrelated to agility), go hard on automation, really hard, and you’ll be able to achieve agility. Whether you want to… that’s for your business to decide.

Tests vs. Test Specifications

When you first get introduced to the idea of test driven development, it may seem strange that tests become such a central focus. Sure, testing is important, and without doing so you wouldn’t catch unintended behavior before releasing it to customers. But why should tests be the driving concern of development? Why are they the driver instead of, say, a design document, or user stories?

In order to understand this, we need to draw an important distinction. I’ll start with an example. Consider the following sequence of instructions:

1: Place 4 cups of water in a large pot
2: Bring the water to a boil, then reduce to a simmer
3: Place 2 cups of white rice into the water, and cover
4: Cook for 25 minutes

Are you looking at food? No, you’re looking at a recipe for food. The recipe is a sequence of instructions that, if followed, will result in food. You can’t eat a recipe. You can follow the recipe, and then eat the food that gets created.

When we talk about “tests” in test-driven development (TDD), we’re not actually talking about the act of “testing”. We’re actually talking about the recipes for testing. When a developer who writes “automated” tests hears the word “test”, he most likely thinks of something like this:

@Test
public void testSomeBehavior() {

prepareFixturesForTest();

SomeClass objectUnderTest = createOUT();

Entity expectedResult = createExpectedResult();
Entity actualResult = objectUnderTest.doSomeBehavior();

Assert.assertEquals(expectedResult, actualResult);
}

That sequence of instructions is what we mean we way say “test”. But calling this a “test” is potentially confusing, because it would be like calling the recipe I printed above “food”. The “test”, meaning the process that is performed and ends with a “success” or “failure”, is what happens when we follow the instructions in this block of code. The code itself is the instructions for how to run the test. A more accurate term for it is a “test recipe”, or test specification. It is the specification of how to run a test. Testing means actually executing this code, either by compiling it and executing in on a machine, or having a human perform each step “manually”.

Before “automated” tests that developers write in the same (or similar) language in which they write their production code, testers were writing documents in English to describe what to do when it is time to test a new version. The only difference is the language. Both of these are test specifications, which are the instructions followed when doing the actual testing.

When we say “test-driven development”, we’re not talking about the driving force being the act of running tests on an application. We’re really talking about the creation of test specifications. We really mean “test-specification-driven development”. Once that is clear, it starts to make sense why it is so effective for test specifications to be the driver.

The full, explicit realization of what test specifications actually are is, arguably, the defining characteristic of “behavior-driven development” (BDD). By building on top of TDD, BDD recognizes that tests (really, test specifications) are the most thorough, accurate and meaningful form in which the specification for behavior/requirements exist. After all, what is the difference between a “story”, or “design spec”, or some other explanation of what a piece of software is supposed to do, and the instructions for how to validate whether it actually does that or not? The answer is… nothing! Well, the real difference is that stories or design specs can be vague, ambiguous, missing details, etc., and it’s not obvious. When you interpret a design spec as the step-by-step instruction for how to validate the behavior, so exact and detailed that a machine can understand it, suddenly those missing details will become obvious, and they’ll need to be filled in.

Before the underlying equivalence of design spec and test spec was properly understood, testers often became the ones who filled in the missing details, as they were turning vague spec requirements into fleshed out test scripts (whether they were writing them down, or just storing them in their heads). Ultimately, the testers were the true product owners. They ultimately dictated the minute details of behavior in the app, by defining exactly what behavior is “pass”, and what is “fail”. Of course a necessary step in releasing software is that it “passes” QA. When the software ends up in the hands of product owners and they aren’t happy with what they see despite it “passing” the tests, (or, the opposite, they are happy with what they see but QA insists it “failed” the tests), it creates a lot of confusing noise in the development pipeline, in the form of undocumented change requests (that will typically re-trigger confusion on future releases) or bogus bug reports. Furthermore, developers won’t really know if they coded what they were supposed to until after they send something to testers and get the feedback. In the “classic”, more siloed shops, with less communication between the “dev” org and the “QA” org, devs often wouldn’t see the test scripts QA is using, and would have to gradually discover what they consider “correct” behavior to be through a game of back-and-forth of releasing, failing, re-releasing, etc.

TDD and BDD are the solution to these problems. If it’s not the same developers who will eventually implement the behavior who also write the tests for that behavior (one of the common objections to TDD is that the coders and testers should be different, but they still are. Automated tests are run by machines, not the coders), they at least have access to that test and are using it as the basis for what code to write and when to decide it is satisfactorily completed. The creation of a test specification is correctly placed at the beginning, rather that the end, of the development cycle, and is actively used by the developers as a guide for implementation. This is exactly what they used to do, except they used the “design spec” or “story acceptance criteria” instead of the exact sequence of steps, plus the exact definition of what is “pass” and “fail”, that the testers will eventually use to validate it.

The alternative to TDD is “X-driven-development”, where X is whatever form in which a design requirement exists in the hands developers as they develop it. Whatever that requirement is, the testers also use it to produce the test script. The error in this reasoning is failing to understand that when the testers do this, they are actually completing the “design spec”, which is really an incomplete, intermediate form of a behavioral requirement. TDD, and especially BDD, move this completion step to where it should be (at the beginning), and involve all the parties that should be in attendance (most importantly the product owners and development team).

Also note that while the creation of the test spec is moved to the beginning of the development, the passing execution of the test is still at the end, where it obviously must be (another major benefit TDD has is adding test executions earlier, when they are supposed to fail, which tests the test to ensure it’s actually a valid test). The last step is still to run the test and see it pass. Understanding this requires explicitly separating what we typically call “tests” (which are actually test specifications) from the act of running tests.

With this clarified, hopefully developers will acquire the appropriate respect for the tests in their codebase. They aren’t just some “extra” thing that gets used at the end as a double-check. They are your specifications. In your tests lie the true definition of what your code is supposed to do. It is the best form of design specification and code documentation (much better than a code comment explaining in often vague words what the author intends, is a test that can be read to understand exactly what will make it pass, plus the ability to actually run it and confirm it does pass) that could possibly exist. That’s why they are arguably more important that the production code itself, and why a developer who has truly been touched by the TDD Angel and “seen the light” will regard the tests as his true job, and the production code as the secondary step.

This, I believe, is the underlying force that additionally makes TDD a very effective tool at discovering the best design for code, which I think is its most valuable feature. Well-designed code emerges from a thorough understanding of exactly what problem you are trying to solve. The fact that writing unit tests helps you discover this design earlier than you otherwise would (through writing a version of it, then experiencing the pain points of the initial design firsthand and refactoring in response to them) is because tests (test specifications) are specifications placed on every level, in every corner, of the codebase.

Code that is “not testable” is code whose behavior cannot be properly specified. The reason why “badly designed” code is “bad” is because it cannot be made sense of (if it works, it’s a happy, and typically quite temporary, accident). Specifying behavior down to unit levels requires making sense of code, which will quickly reveal the design forces contributing to it being un-sensible. This is really the same thing that happens on the product level. Instead of waiting until a defective product is out and discovering misunderstandings, the misunderstandings get resolved in the communal development of the behaviors. Likewise, developers who use TDD to drive design, which is when development truly becomes test-driven, don’t have to wait until a problem is solved to realize that the solution is problematic. Those design defects get discovered and corrected early on.

What’s driving development in TDD isn’t the act of validating whether the code is correct. It is the act of precisely defining what correctness means that drives development.

Massive View Controller

Massive View Controller

The Model-View-Controller (MVC) set of design patterns for GUI application development has devolved into what is derisively called “Massive View Controller”.  It is a good lesson in design thinking to follow how this devolution occurred.  The most interesting point, and what is in most need of explanation, is that in the original formulation of MVC, the controller was meant to be the smallest of the three components.  How did it end up engulfing almost all application code?

The answer, I believe, is that two forces have contributed to the controller becoming the dumping ground for almost everything.  One is in how the application frameworks for various platforms are designed.  When we look at mobile platforms like iOS and Android, both instruct developers to create an application by first creating a new subclass of their native “controller” class.  On iOS, this is UIViewController, and on Android, it is Activity (the fact either of these is seen as the C of MVC is a problem already, which we’ll get to).  This is a required step to hook into the framework and get an opportunity for your application code to begin executing.  But there is no similar requirement to create customized components for the M or V of MVC.  With no other guidance, novice developers will take this first required step, and put as much of their application code into this subclass they are required to create as possible.

The other is a widespread misunderstanding among developers of what the “model” and “view” of MVC are supposed to be.  Both “Model” and “View” are somewhat vague terms that mean different things in different contexts.  The word “model” is often used to refer to the data objects that represent different concepts in a code base.  For example, in an application for browsing a company’s employees, there will be a class called Person, with fields like name, title, startDate, supervisor, and so on.  A lot of developers, especially mobile developers, have apparently assumed that the M in MVC refers to these data objects.

But the authors of MVC weren’t instructing people to define data objects.  This is already a given in object-oriented programming.  Obviously you’re going to have data objects.  They didn’t think it was necessary to say this.  The M in MVC refers to the model for an application page, which specifically means the “business logic”.  It is the class representing what a page of an application does.  It handles the data, state and available actions (both user-initiated and event-driven) of a certain screen or visual element of a GUI application.  Most of what developers tend to stuff into the controller actually belongs in the model.  The old joke of MVC is that it’s the opposite of the fashion industry: we want fat models, not thin models.  Models should contain most of the code for any particular page or widget of an application.

Similarly, a lot of developers tended to assume “View” meant widgets: the reusable, generic toolbox of visual components that are packaged with a platform framework.  Buttons, labels, tables, switches, text boxes, and so on.  Unless some kind of custom drawing was needed, any “view” of an application is really just a hierarchical combination of these widgets.  Assuming that a custom “View” is only needed when custom drawing is needed, the work of defining and managing a hierarchy of widgets was put into the controller.

With these two misunderstandings, clearly none of the application-specific code would go into models, which are generic data objects not associated at all with any particular screen/form/activity, or into views, which are generic widgets usable by any graphical application.  Well, there’s only one other place for all the actual application logic to go.  And since developers were being told, “you need three components”, it appears many of them interpreted this as meaning, “all the application code goes into this one component”.  And thus, Massive View Controller was born.

As this antipattern spread throughout the community, the blame was misplaced on MVC itself, and new pattern suites to “fix” the “problems” with MVC emerged.  One of the better known ones in the iOS community is the VIPER Pattern.  This renames “Model” which, remember, devs think means the data objects, to “Entity”, and according to most of what you read about it, “splits” the Controller into a Presenter, which handles presentation logic, and Interactor, which handles the use case or business logic.

Now that we understand the confusion about MVC, we can see that VIPER is just Model-View-Presenter (MVP) reinvented. All that happened here is that the mistaken notions were corrected, but it was framed as the invention of a new pattern, instead of the reassertion of the correct implementation of an old pattern.  The “entities” were never part of the GUI design patterns to begin with.  The “Model” is actually what VIPER calls the “Interactor”, and always has been.  The only really novel part is the concept of a Router, which is supposed to handle higher-level navigation around an application.  But the need for such a component arose from another misunderstanding about MVC that I’ll talk about in a moment.  There are some more specific suggestions in VIPER about encapsulation: specifically, to hide the backend data objects entirely from the presentation layer, and instead define new objects for the Interactor to send to the Presenter.  This wasn’t required in MVC, but it isn’t incompatible with it either.  If anything that’s an additional suggestion for how to do MVC well.

As I mentioned before, the intention of MVC was that the Model would contain most of the code.  In fact, the Controller was supposed to be very thin.  It was intended to do little more than act as a strategy for the View to handle user interaction.  All the controller is supposed to do is intercept user interactions with the views, and decide what, if anything, to do with them, leaving the actual heavy lifting to the Model.  The Model is supposed to present a public interface of available actions that can be taken, and the controller is just supposed to decide which user interaction should invoke which action.  In MVC, the Controller is not supposed to talk back to the View to change state, because the Model would become out of sync with what is being displayed.  The Controller is only supposed to listen, and talk to the Model.  The Controller is not supposed to manage a view hierarchy.  The view hierarchy is a visual concern, to be handled by the visual component: the View.  A page in an application that is made up of a hierarchy of widgets should have its own View class that encapsulates and manages this hierarchy.  Any visual changes to the hierarchy should be handled by this View class, which observes the Model for state changes.  The presentation logic is all in the View, and the business logic is all in the Model.

This leaves very little in the Controller.  The Controller is just there to avoid having to subclass the View to support variations in interaction.  Views can be reused in different scenarios. For example, a “Edit Details” screen can be used to edit the details for a person in an organization, and also edit the details for a department in an organization, by allowing the displayed fields to vary. But another variation here is what happens when the user presses “Save”. In one situation, that triggers a person object to be saved to a backing store. In the other, it may trigger a prompt to display the list of people that will be impacted by the update. To avoid having to subclass the EditDetailsView component, the decision of which Model action to invoke is delegated out to an EditDetailsController.

Another major point of confusion is that in the original MVC, every component on the page was an MVC bundle.  For example, if we have a login page, which contains two rows, each of which has a label and a textbox, the first row being for entering the username and the second for the password, a submit button, and a loading indicator that can be shown or hidden, the typical way developers will do this is to build one big MVC bundle for this entire page, which manages everything down to the individual labels, textboxes, button, etc.  But originally, each one of these components was supposed to have a Model, View and Controller.  Each label would have a Model and a View (the Controller wouldn’t be necessary, since the labels are passive visual elements that cannot be interacted with), each textbox would have a Model, View and Controller, same for the button, and so on.

This is another point where the framework designers encouraged misunderstandings.  The individual widgets are designed as single classes that contain everything.  A Label, for example, not only contains the drawing logic for rendering text, it also holds all the data for what needs to be drawn (namely, the text string), and all the presentation data for how to draw it (text attributes, alignment, font, color, etc.).  The same is true of text boxes.  Only the Controller part is delegated out. iOS, as with all Apple platforms, uses targets and selectors for this delegation, but the target may or may not be what Apple frameworks call the “controller” (though it almost always is), and the granularity is on the level of individual interactions. Android uses a more standard OOP pattern of callback interfaces, but they are still one-per-interaction.

Along with this pattern of having the page-level components do all the management for the entire page, the inverse problem emerged of what to do when different pages need to communicate.  Thus the “Router” of VIPER was born, out of a perceived need to stick this orchestration logic somewhere.  But if you understand that MVC is inherently hierarchical, with all three components existing on each level of the view hierarchy, then it becomes clear where this “routing” behavior goes: in the M of whatever container view holds the different pages of an app and decides when and how to display them.  Since the platform frameworks are so inheritance-based, and typically give you subclasses with little to no configurability for these “container” views (examples on iOS would be UINavigationController, UITabBarController, etc.), they really don’t give you a way to follow their intended patterns and also have a sensible place for this “routing” logic to go.  But if the navigation or tab-bar (or other menu-selecting) views were all MVC bundles, then that logic would naturally live in the Models of those views.

Examples are also helpful, so I developed four implementations of a login page in an Android app to illustrate what traditional MVC is intended to look like.  The first one is Massive View Controller, what so many devs think MVC means.  There is a LoginService class that performs the backend work of the login web call, but all the business logic, visual logic, and everything in between is stuffed into a LoginController, which subclasses Activity.

public class LoginController extends AppCompatActivity implements LoginService.OnResultHandler {

private static final int MAX_USERNAME_LENGTH = 16;
private static final int MAX_PASSWORD_LENGTH = 24;

private TextView usernameLabel;
private EditText usernameField;

private TextView passwordLabel;
private EditText passwordField;

private Button submitButton;

private ProgressBar loadingIndicator;

private View errorView;
private TextView errorLabel;
private Button errorDismissButton;

private LoginService loginService;

@Override
protected void onCreate(Bundle savedInstanceState) {

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main);

// Assign view fields
usernameLabel = findViewById(R.id.username_label);
usernameField = findViewById(R.id.username_field);

passwordLabel = findViewById(R.id.password_label);
passwordField = findViewById(R.id.password_field);

submitButton = findViewById(R.id.submit_button);

loadingIndicator = findViewById(R.id.loading_indicator);

errorView = findViewById(R.id.error_view);
errorLabel = findViewById(R.id.error_label);
errorDismissButton = findViewById(R.id.error_dismiss_button);

// Configure Views
usernameLabel.setText("Username:");
passwordLabel.setText("Password:");

submitButton.setText("Submit");
errorDismissButton.setText("Try Again");

// Assign text update listeners
usernameField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

handleUserNameUpdated(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

passwordField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

handlePasswordUpdated(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

// Assign click handlers
usernameField.setOnClickListener(v -> usernameFieldPressed());
passwordField.setOnClickListener(v -> passwordFieldPressed());
submitButton.setOnClickListener(v -> submitPressed());
errorDismissButton.setOnClickListener(v -> errorDismissPressed());

// Create Service
loginService = new LoginService(this);
}

private void usernameFieldPressed() {

usernameField.requestFocus();
}

private void passwordFieldPressed() {

if(usernameField.length() > 0)
{
passwordField.requestFocus();
}
else
{
showErrorView("Please enter a username");
}
}

private void submitPressed() {

loadingIndicator.setVisibility(View.VISIBLE);

loginService.submit(usernameField.getText().toString(), passwordField.getText().toString());
}

private void errorDismissPressed() {

errorView.setVisibility(View.INVISIBLE);
}

// OnResultHandler
@Override
public void onResult(boolean loggedIn, String errorDescription) {

loadingIndicator.setVisibility(View.INVISIBLE);

if(loggedIn)
{
// Start home page activity
}
else
{
showErrorView(errorDescription);
}
}

private void handleUserNameUpdated(String text) {

if(text.length() > MAX_USERNAME_LENGTH)
usernameField.setText(text.substring(0, MAX_USERNAME_LENGTH));

updateSubmitButtonEnabled();
}

private void handlePasswordUpdated(String text) {

if(text.length() > MAX_PASSWORD_LENGTH)
passwordField.setText(text.substring(0, MAX_PASSWORD_LENGTH));

updateSubmitButtonEnabled();
}

private void updateSubmitButtonEnabled() {

boolean enabled = usernameField.length() > 0 && passwordField.length() > 0;

submitButton.setEnabled(enabled);
}

private void showErrorView(String errorDescription) {

errorLabel.setText(errorDescription);
errorView.setVisibility(View.VISIBLE);
}
}

The features implemented here are a basic login screen with two rows of text entry, one for the username, and one for the password. There is a “submit” button that initiates the login request, during which time a loading indicator is shown. If the login fails, an error view is shown with a description of the error, and a “Try Again” button that dismisses the error view and allows the user to make another attempt. There are some additional requirements I added to make the example more illustrative: the username and password fields have maximum length limitations, and attempting to edit the password field while the username is empty causes an error to be shown.

If we want to start refactoring this, the first step is to create proper MVC components for the Login page. This is the correction of the main misunderstanding about MVC. The Model is not a backend object representing a logged-in user or a login request, or a service object for performing the web call. The Model is for the login page. It is where the business logic of this page should live, independently of any logic for actually displaying it to a user. The Model is concerned with data, but it is data for the login page. Hence we call it the LoginModel. Likewise, everything about the view hierarchy, i.e. which widgets are on the screen, should be encapsulated into a LoginView, which does not expose this hierarchy to the outside world. I left it to the Activity to inflate a layout, and then pass the inflated view into the LoginView, but it would also be acceptable to have the View do this privately (the downside of course is that the layout is inflexible in that case).

Also, I started moving away from inheritance. A common way to do MVC is have the View inherit the framework View class. But this creates the classic problem of inheritance, which for this Android example would mean hardcoding which type of ViewGroup the Login page should be (ConstraintLayout, LinearLayout, FrameLayout, etc.). Instead I opted for composition: the LoginView doesn’t inherit anything, but contains the View object that holds the framework view hierarchy. The Activity subclass, which is required by Android, was factored out into a separate component that only creates and holds onto the MVC bundle. The Controller is reduced to its intended role of being a strategy for how the View triggers behavior in the Model (which allows a different strategy to be picked without having to change the View code, which deals exclusively with displaying the page). The Activity is now this:

public class LoginActivity extends AppCompatActivity {

private LoginView view;

@Override
protected void onCreate(Bundle savedInstanceState) {

super.onCreate(savedInstanceState);

setContentView(R.layout.activity_main);
this.view = new LoginView(findViewById(R.id.login_view));
}
}

It creates and holds onto the LoginView, which looks like this:

public class LoginView implements LoginModel.LoginModelObserver {

private final View view;

private TextView usernameLabel;
private EditText usernameField;

private TextView passwordLabel;
private EditText passwordField;

private Button submitButton;

private ProgressBar loadingIndicator;

private View errorView;
private TextView errorLabel;
private Button errorDismissButton;

private LoginController controller;
private LoginModel model;

public LoginView(View view) {

this.view = view;

this.model = new LoginModel();
this.controller = new LoginController(model);

// Assign model observer
model.observer = this;

// Assign view fields
usernameLabel = view.findViewById(R.id.username_label);
usernameField = view.findViewById(R.id.username_field);

passwordLabel = view.findViewById(R.id.password_label);
passwordField = view.findViewById(R.id.password_field);

submitButton = view.findViewById(R.id.submit_button);

loadingIndicator = view.findViewById(R.id.loading_indicator);

errorView = view.findViewById(R.id.error_view);
errorLabel = view.findViewById(R.id.error_label);
errorDismissButton = view.findViewById(R.id.error_dismiss_button);

// Configure Labels
usernameLabel.setText(this.model.getUsernameLabelText());
passwordLabel.setText(this.model.getPasswordLabelText());

submitButton.setText(this.model.getSubmitButtonText());
errorDismissButton.setText(this.model.getErrorDismissButtonText());

// Assign text update listeners
usernameField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

controller.usernameFieldEdited(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

passwordField.addTextChangedListener(new TextWatcher() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

controller.passwordFieldEdited(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
});

// Assign click handlers
usernameField.setOnClickListener(v -> controller.usernameFieldPressed());
passwordField.setOnClickListener(v -> controller.passwordFieldPressed());
submitButton.setOnClickListener(v -> controller.submitPressed());
errorDismissButton.setOnClickListener(v -> controller.errorDismissPressed());
}

public View getView() {

return this.view;
}

@Override
public void beginEditingUsername() {

usernameField.requestFocus();
}

@Override
public void beginEditingPassword() {

passwordField.requestFocus();
}

@Override
public void usernameUpdated(String username) {

usernameField.setText(username);
}

@Override
public void passwordUpdated(String password) {

passwordField.setText(password);
}

@Override
public void enableSubmitUpdated(boolean enabled) {

submitButton.setEnabled(enabled);
}

@Override
public void processingUpdated(boolean processing) {

loadingIndicator.setVisibility(processing ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void errorUpdated(boolean hasError, String description) {

errorView.setVisibility(hasError ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void finishLogin() {

// Start home page activity
}
}

The Controller now looks like this:

public class LoginController {

private LoginModel loginModel;

public LoginController(LoginModel loginModel) {

this.loginModel = loginModel;
}

public void usernameFieldPressed() {

loginModel.requestEditUsername();
}

public void passwordFieldPressed() {

loginModel.requestEditPassword();
}

public void usernameFieldEdited(String text) {

loginModel.setUsername(text);
}

public void passwordFieldEdited(String text) {

loginModel.setPassword(text);
}

public void submitPressed() {

loginModel.attemptLogin();
}

public void errorDismissPressed() {

loginModel.dismissError();
}
}

And finally the Model, where the business logic lives:

class LoginModel implements LoginService.OnResultHandler {

private static final int MAX_USERNAME_LENGTH = 16;
private static final int MAX_PASSWORD_LENGTH = 24;

private final String usernameLabelText;
private final String passwordLabelText;
private final String submitButtonText;
private final String errorDismissButtonText;

public static interface LoginModelObserver
{
void beginEditingUsername();
void beginEditingPassword();

void usernameUpdated(String username);
void passwordUpdated(String password);

void enableSubmitUpdated(boolean enabled);

void processingUpdated(boolean processing);
void errorUpdated(boolean hasError, String description);

void finishLogin();
}

LoginModelObserver observer;

private LoginService loginService;

private String username;
private String password;

private boolean processing;
private String errorDescription;

public LoginModel() {

this.loginService = new LoginService(this);

this.usernameLabelText = "Username:";
this.passwordLabelText = "Password:";

this.submitButtonText = "Submit";
this.errorDismissButtonText = "Try Again";
}

public String getUsernameLabelText() {

return usernameLabelText;
}

public String getPasswordLabelText() {

return passwordLabelText;
}

public String getSubmitButtonText() {

return submitButtonText;
}

public String getErrorDismissButtonText() {

return errorDismissButtonText;
}

public void requestEditUsername()
{
observer.beginEditingUsername();
}

public void requestEditPassword()
{
if(username.length() > 0)
{
observer.beginEditingPassword();
}
else
{
setError("Please enter a username");
}
}

public void setUsername(String username)
{
if(username.length() > MAX_USERNAME_LENGTH)
username = username.substring(0, MAX_USERNAME_LENGTH);

if(username.equals(this.username))
return;

this.username = username;

observer.usernameUpdated(this.username);
}

public void setPassword(String password)
{
if(password.length() > MAX_PASSWORD_LENGTH)
password = password.substring(0, MAX_PASSWORD_LENGTH);

if(password.equals(this.password))
return;

this.password = password;

observer.passwordUpdated(this.password);
}

public void attemptLogin()
{
setProcessing(true);

loginService.submit(username, password);
}

public void dismissError() {

setError(null);
}

@Override
public void onResult(boolean loggedIn, String errorDescription) {

setProcessing(false);

if(loggedIn)
{
observer.finishLogin();
}
else
{
setError(errorDescription);
}
}

private void updateSubmitEnabled() {

boolean enabled = username.length() > 0 && password.length() > 0;

observer.enableSubmitUpdated(enabled);
}

private void setProcessing(boolean processing) {

this.processing = processing;
observer.processingUpdated(this.processing);
}

private void setError(String errorDescription) {

this.errorDescription = errorDescription;
observer.errorUpdated(this.errorDescription != null, this.errorDescription);
}
}

Now we have components that aren’t much smaller, but are at least more cohesive. The View is only managing the hierarchy of Android View components, and the Model is only managing the business logic. The are communicating by the View observing the Model. In this case the observing is one-to-one. Typically the Observer Pattern is one-to-many, but we don’t need multiple observers yet.

Now, the next refactoring step would be to introduce MVC components for the parts of the login screen. The login screen has two text entry rows. We can define an abstraction for a text entry row, which has a label (to describe what the entry is for) and a text field for making the entry. Following the MVC pattern, there will be three components for this abstraction. The first is a TextEntryRowView:

public class TextEntryRowView implements TextEntryRowModel.Observer {

private final View view;

private TextView label;
private EditText field;

private TextEntryRowController controller;
private TextEntryRowModel model;

public TextEntryRowView(View view, TextEntryRowModel model) {

this.view = view;

this.model = model;
this.controller = new TextEntryRowController(model);

// Add model observer
model.addObserver(this);

// Assign view fields
label = view.findViewById(R.id.label);
field = view.findViewById(R.id.field);

// Configure Label
label.setText(model.getLabelText());

// Assign text update listeners
field.addTextChangedListener(this.controller);

// Assign click handlers
field.setOnClickListener(v -> controller.fieldPressed());
}

public View getView() {

return this.view;
}

@Override
public void editRequestDeclined(TextEntryRowModel model) {

}

@Override
public void beginEditing(TextEntryRowModel model) {

field.requestFocus();
}

@Override
public void fieldTextUpdated(TextEntryRowModel model, String text) {

field.setText(text);
}
}

Then we have a TextEntryRowController:

class TextEntryRowController implements TextWatcher {

private TextEntryRowModel model;

public TextEntryRowController(TextEntryRowModel model) {

this.model = model;
}

public void fieldPressed() {

model.requestEdit();
}

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

model.setFieldText(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}
}

The intention here is really that the Controller should be called when the user interacts with the keyboard to type into the text field. What is actually happening is that the Controller is implementing the TextWatcher interface provided by Android. This will get called even if the text is changed programmatically. For the sake of this example, I didn’t go through the trouble of filtering out those programmatic changes, but ideally the Controller would intercept only user-initiated text-change events, and those events would not change the text in the field unless the Controller decided to tell the Model to do so. This way, simply omitting the call to the Model would effectively disable editing (by the user) of the text field.

And now the TextEntryRowModel:

public class TextEntryRowModel {

public static interface Observer {

void editRequestDeclined(TextEntryRowModel model);

void beginEditing(TextEntryRowModel model);
void fieldTextUpdated(TextEntryRowModel model, String text);
}

public TextEntryRowModel(String labelText, int maxLength) {

this.labelText = labelText;
this.maxLength = maxLength;

observers = new ArrayList<>();
}

private List<Observer> observers;

private int maxLength;

private boolean editable;

private String labelText;
private String fieldText;

public void addObserver(Observer observer) {

observers.add(observer);
}

public boolean getEditable() {

return editable;
}

public void setEditable(boolean editable) {

this.editable = editable;
}

public String getLabelText() {

return labelText;
}

public String getFieldText() {

return fieldText;
}

public void setFieldText(String fieldText) {

if (fieldText.length() > maxLength)
fieldText = fieldText.substring(0, maxLength);

if (fieldText.equals(this.fieldText))
return;

this.fieldText = fieldText;

for(Observer observer: observers)
observer.fieldTextUpdated(this, this.fieldText);
}

public void requestEdit() {

if(editable) {

for(Observer observer: observers)
observer.beginEditing(this);
}
else {

for(Observer observer: observers)
observer.editRequestDeclined(this);
}
}
}

Notice that in this case, the observers are one-to-many. This is now necessary. You can see the TextEntryRowView needs to observe its Model, to know when the field text is updated. Also notice that the only publicly visible place to change the field text is in the model, not the view. The Android TextView holds the text being displayed, because that’s how the Android framework is designed. But that TextView is a private member of TextEntryRowView. The intention is that anyone, including the Controller that receives the user’s typing events, must tell the Model to update the text. The Model then broadcasts that change, allowing any number of interested objects to be notified that the text changed.

Also notice that in the setter for the text, we are checking whether the incoming text is the same as what is already stored in the Model. The Model, View and Controller are tied to each other in a loop. A change to the Model will trigger the View to update, and if we really want to ensure the two stay in sync, a change to the View will trigger the Model to update (in this case that happens because we are using the TextWatcher interface, which gets notified by all changes to the field’s text). This can cause an infinite loop, in which the View updates the Model, which updates the View, which updates the Model, and so on. To prevent this, at some point in the chain we need to check to make sure we aren’t making a redundant update. Doing so will terminate the loop after it makes one full cycle. This is a common pattern, especially in reactive programming. I call these loops “reactive loops”.

We do the same thing for the error view, which is another abstraction we can identify. We start with an ErrorView:

public class ErrorView implements ErrorModel.Observer {

private final View view;

private TextView descriptionLabel;
private Button dismissButton;

private ErrorController controller;
private ErrorModel model;

public ErrorView(View view, ErrorModel model) {

this.view = view;

this.model = model;
this.controller = new ErrorController(model);

// Add model observer
model.addObserver(this);

// Assign view fields
descriptionLabel = this.view.findViewById(R.id.label);
dismissButton = this.view.findViewById(R.id.dismiss_button);

dismissButton.setText(this.model.getDismissButtonText());

dismissButton.setOnClickListener(v -> controller.dismissPressed());
}

public View getView() {

return this.view;
}

@Override
public void dismissRequested() {

}

@Override
public void descriptionUpdated(String description) {

descriptionLabel.setText(description);
}
}

Then the ErrorController:

class ErrorController {

private ErrorModel model;

public ErrorController(ErrorModel model) {

this.model = model;
}

public void dismissPressed() {

model.dismiss();
}
}

And the ErrorModel:

public class ErrorModel {

public static interface Observer {

void dismissRequested();

void descriptionUpdated(String description);
}

public ErrorModel(String dismissButtonText) {

this.dismissButtonText = dismissButtonText;

this.observers = new ArrayList<>();
}

private List<Observer> observers;
private String description;

private String dismissButtonText;

public void setDescription(String description) {

this.description = description;

for(Observer observer: observers)
observer.descriptionUpdated(this.description);
}

public String getDismissButtonText() {

return dismissButtonText;
}

public void dismiss() {

for(Observer observer: observers)
observer.dismissRequested();
}

public void addObserver(Observer observer) {

this.observers.add(observer);
}
}

Now, the Login components will use these new classes. The LoginView now looks like this:

public class LoginView implements LoginModel.Observer {

private final View view;

private TextEntryRowView usernameRow;
private TextEntryRowView passwordRow;

private Button submitButton;

private ProgressBar loadingIndicator;

private ErrorView errorView;

private LoginController controller;
private LoginModel model;

public LoginView(View view) {

this.view = view;

this.model = new LoginModel();
this.controller = new LoginController(model);

// Assign model observer
model.addObserver(this);

// Assign view fields
usernameRow = new TextEntryRowView(this.view, this.model.getUsernameModel());
passwordRow = new TextEntryRowView(this.view, this.model.getPasswordModel());

errorView = new ErrorView(this.view, this.model.getErrorModel());
submitButton = view.findViewById(R.id.submit_button);
loadingIndicator = view.findViewById(R.id.loading_indicator);

submitButton.setOnClickListener(v -> controller.submitPressed());
}

public View getView() {

return this.view;
}

@Override
public void enableSubmitUpdated(boolean enabled) {

submitButton.setEnabled(enabled);
}

@Override
public void processingUpdated(boolean processing) {

loadingIndicator.setVisibility(processing ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void hasErrorUpdated(boolean hasError) {

errorView.getView().setVisibility(hasError ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void finishLogin() {

// Start home page activity
}
}

The LoginController looks like this:

public class LoginController {

private LoginModel model;

public LoginController(LoginModel model) {

this.model = model;
}

public void submitPressed() {

model.attemptLogin();
}
}

And the LoginModel looks like this:

public class LoginModel implements LoginService.OnResultHandler, TextEntryRowModel.Observer, ErrorModel.Observer {

    public interface Observer {

        void enableSubmitUpdated(boolean enabled);

        void processingUpdated(boolean processing);
        void hasErrorUpdated(boolean hasError);

        void finishLogin();
    }

    public static final int MAX_USERNAME_LENGTH = 16;
    public static final int MAX_PASSWORD_LENGTH = 24;

    private List<Observer> observers;

    private TextEntryRowModel usernameModel;
    private TextEntryRowModel passwordModel;
    private ErrorModel errorModel;

    private final String submitButtonText;

    private LoginService loginService;
    private boolean submitEnabled;
    private boolean processing;
    private boolean hasError;

    LoginModel() {

        this.usernameModel = new TextEntryRowModel("Username:", MAX_USERNAME_LENGTH);
        this.passwordModel = new TextEntryRowModel("Password:", MAX_PASSWORD_LENGTH);
        this.errorModel = new ErrorModel("Try Again");

        this.observers = new ArrayList<>();

        this.submitButtonText = "Submit";

        this.loginService = new LoginService(this);

        this.usernameModel.addObserver(this);
        this.passwordModel.addObserver(this);
        this.errorModel.addObserver(this);
    }

    public TextEntryRowModel getUsernameModel() {

        return usernameModel;
    }

    public TextEntryRowModel getPasswordModel() {

        return passwordModel;
    }

    public ErrorModel getErrorModel() {

        return errorModel;
    }

    public void addObserver(Observer observer) {

        observers.add(observer);
    }

    public String getSubmitButtonText() {

        return submitButtonText;
    }

    public void attemptLogin() {

        setProcessing(true);

        loginService.submit(usernameModel.getFieldText(), passwordModel.getFieldText());
    }

    private void setSubmitEnabled(boolean submitEnabled) {

        this.submitEnabled = submitEnabled;

        for(Observer observer: observers)
            observer.enableSubmitUpdated(processing);
    }

    private void setProcessing(boolean processing) {

        this.processing = processing;

        for(Observer observer: observers)
            observer.processingUpdated(processing);
    }

    private void setHasError(boolean hasError) {

        this.hasError = hasError;

        for(Observer observer: observers)
            observer.hasErrorUpdated(this.hasError);
    }

    private void setError(String errorDescription) {

        setHasError(errorDescription != null);
        errorModel.setDescription(errorDescription);
    }

    // LoginService.OnResultHandler
    @Override
    public void onResult(boolean loggedIn, String errorDescription) {

        setProcessing(false);

        if(loggedIn)
        {
            for(Observer observer: observers)
                observer.finishLogin();
        }
        else
        {
            setError(errorDescription);
        }
    }

    // TextRowEntryModel.Observer
    @Override
    public void editRequestDeclined(TextEntryRowModel model) {

        if(model == passwordModel && model.getFieldText().length() == 0)
            setError("Please enter a username");
    }

    @Override
    public void beginEditing(TextEntryRowModel model) {

    }

    @Override
    public void fieldTextUpdated(TextEntryRowModel model, String text) {

        if(model == usernameModel)
            passwordModel.setEditable(text.length() > 0);

        boolean submitEnabled = usernameModel.getFieldText().length() > 0 && passwordModel.getFieldText().length() > 0;
        setSubmitEnabled(submitEnabled);
    }

    // ErrorModel.Observer
    @Override
    public void dismissRequested() {

        setHasError(false);
    }

    @Override
    public void descriptionUpdated(String description) {

    }
}

Here is the one-to-many Observer Pattern in action, and this demonstrates fundamentally how various parts of an application, on any level, communicate with each other: by inter-model observation. That is how changes are propagated around a page of the app, keeping various components in sync with each other. This does not disrupt a View staying in sync with its own Model because there can be multiple observers. It is a business logic concern that one part of the use case changing requires another part of the use case to change. This is not a visual logic concern, and should not be done in views.

The code is in a fairly good state now, but for the sake of illustration I will do one more round of refactoring and create MVC bundles on the level of individual widgets. At this point we’re actually hiding and overriding certain aspects of the framework. The Android framework is not designed for individual widgets to be MVC bundles. We can essentially adapt/wrap the framework, which additionally decouples our application code almost entirely from the platform on which it is running.

First we’ll make the MVC components for a text field (which can be either editable or read-only), starting with a TextFieldView:

public class TextFieldView implements TextFieldModel.Observer {

private TextView view;

private TextFieldController controller;
private TextFieldModel model;

public TextFieldView(TextView view, TextFieldController controller, TextFieldModel model) {

this.view = view;
this.controller = controller;
this.model = model;

this.view.setText(this.model.getText());
this.view.setTypeface(this.model.getFont());
this.view.setTextColor(this.model.getTextColor());

this.view.setOnClickListener(this.controller);
this.view.addTextChangedListener(this.controller);

this.model.addObserver(this);
}

@Override
public void beginEditing(TextFieldModel model) {

view.requestFocus();
}

@Override
public void textUpdated(TextFieldModel model, String text) {

view.setText(text);
}
}

TextFieldController is just an interface, because what exactly should happen when a user interacts with a text field depends on the context:

public interface TextFieldController extends View.OnClickListener, TextWatcher {

}

The Android framework provides interfaces for handling view clicks and text updates (again, ideally we’d want only user-initiated text updates, but for brevity we’ll just piggyback on what Android gives us). So the Controller just extends these already existing interfaces.

One useful implementation we can provide right off the bat is a Controller that does nothing, which disables user interaction with the text field and makes it read-only. This is the the NullTextFieldController:

public class NullTextFieldController implements TextFieldController {

@Override
public void onClick(View v) {


}

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

}

@Override
public void afterTextChanged(Editable s) {

}
}

(Disabling a TextField also requires setting focusable to false. I’m ignoring this in the example)

Then we have the TextFieldModel:

public class TextFieldModel {

public static interface Observer {

void beginEditing(TextFieldModel model);
void textUpdated(TextFieldModel model, String text);
}

private List<Observer> observers;

private String text;
private Typeface font;
private int textColor;

public TextFieldModel(String text) {

this.text = text;

this.observers = new ArrayList<>();
}

public void addObserver(Observer observer) {

observers.add(observer);
}

public String getText() {

return text;
}

public void setText(String text) {

if(text.equals(this.text))
return;

this.text = text;

for(Observer observer: observers)
observer.textUpdated(this, this.text);
}

public Typeface getFont() {

return font;
}

public int getTextColor() {

return textColor;
}

public void beginEditing() {

for(Observer observer: observers)
observer.beginEditing(this);
}
}

The key distinction here is that the data being displayed by a text field now lives in a Model, not in the View (as is typically the case in these platform frameworks). The data for a text field includes the text it is displaying, plus any presentation data (font, color, etc.). Because a Model is observable, anyone (and multiple listeners at once) can listen to changes to what this text field is displaying. As with the previous example, the Model becomes the one place where the outside world can and should change what the text field.

Now let’s do the same for a button, starting with a ButtonView:

public class ButtonView implements ButtonModel.Observer {

    private Button view;

    private ButtonController controller;
    private ButtonModel model;

    public ButtonView(Button view, ButtonController controller, ButtonModel model) {

        this.view = view;
        this.controller = controller;
        this.model = model;

        this.model.addObserver(this);

        this.view.setText(this.model.getText());
        this.view.setOnClickListener(this.controller);
    }

    @Override
    public void enabledUpdated(boolean enabled) {

        view.setEnabled(enabled);
    }
}

Again, the ButtonController is just an interface, so we can decide in each case what happens when a button is pressed:

public interface ButtonController extends View.OnClickListener {

}

And the ButtonModel:

public class ButtonModel {

    public static interface Observer {

        void enabledUpdated(boolean enabled);
    }

    private List<Observer> observers;

    private String text;
    private boolean enabled;

    public ButtonModel(String text) {

        this.text = text;

        this.observers = new ArrayList<>();
    }

    public void addObserver(Observer observer) {

        observers.add(observer);
    }

    public String getText() {

        return text;
    }

    public void setEnabled(boolean enabled) {

        this.enabled = enabled;

        for(Observer observer: observers)
            observer.enabledUpdated(this.enabled);
    }
}

A fully featured ButtonModel would hold everything about a button’s state, including whether it is selected and/or highlighted, any icons, etc.

Now we can use these to implement TextEntryRow, starting with the View:

public class TextEntryRowView implements TextEntryRowModel.Observer {

    private final View view;

    private TextFieldView label;
    private TextFieldView field;

    private TextEntryRowController controller;
    private TextEntryRowModel model;

    public TextEntryRowView(View view, TextEntryRowController controller, TextEntryRowModel model) {

        this.view = view;

        this.model = model;
        this.controller = controller;

        // Add model observer
        model.addObserver(this);

        // Assign view fields
        label = new TextFieldView(view.findViewById(R.id.label), this.controller.getLabelController(), this.model.getLabelModel());
        field = new TextFieldView(view.findViewById(R.id.field), this.controller.getFieldController(), this.model.getFieldModel());
    }

    public View getView() {

        return this.view;
    }

    @Override
    public void editRequestDeclined(TextEntryRowModel model) {


    }

    @Override
    public void fieldTextUpdated(TextEntryRowModel model, String text) {
        
    }
}

Then the Controller:

class TextEntryRowController {

private final NullTextFieldController labelController;
private TextFieldController fieldController;

private TextEntryRowModel model;

TextEntryRowController(TextEntryRowModel model) {

this.model = model;

this.labelController = new NullTextFieldController();

this.fieldController = new TextFieldController() {

@Override
public void beforeTextChanged(CharSequence s, int start, int count, int after) {

}

@Override
public void onTextChanged(CharSequence s, int start, int before, int count) {

model.setFieldText(s.toString());
}

@Override
public void afterTextChanged(Editable s) {

}

@Override
public void onClick(View v) {

model.requestEdit();
}
};
}

TextFieldController getLabelController() {

return labelController;
}

TextFieldController getFieldController() {

return fieldController;
}
}

Now it is the TextEntryRowController deciding that the first TextField (the label) is read-only, by assigning it a NullTextFieldController. For the other TextField, the Controller sends a message to its Model, not the TextField’s model. This makes the TextEntryRowModel responsible for how, and if, to update the field.

Here is the Model:

public class TextEntryRowModel implements TextFieldModel.Observer {

    public static interface Observer {

        void editRequestDeclined(TextEntryRowModel model);
        void fieldTextUpdated(TextEntryRowModel model, String text);
    }

    public TextEntryRowModel(String labelText, int maxLength) {

        this.labelModel = new TextFieldModel(labelText);
        this.fieldModel = new TextFieldModel("");

        this.maxLength = maxLength;

        observers = new ArrayList<>();

        this.fieldModel.addObserver(this);
    }

    private List<Observer> observers;

    private int maxLength;

    private boolean editable;

    private TextFieldModel labelModel;
    private TextFieldModel fieldModel;

    public void addObserver(Observer observer) {

        observers.add(observer);
    }

    public TextFieldModel getLabelModel() {

        return labelModel;
    }

    public TextFieldModel getFieldModel() {

        return fieldModel;
    }

    public boolean getEditable() {

        return editable;
    }

    public void setEditable(boolean editable) {

        this.editable = editable;
    }

    public String getFieldText() {

        return fieldModel.getText();
    }

    public void setFieldText(String fieldText) {

        fieldModel.setText(fieldText);
    }

    public void requestEdit() {

        if(editable) {

            fieldModel.beginEditing();
        }
        else {

            for(Observer observer: observers)
                observer.editRequestDeclined(this);
        }
    }

    @Override
    public void beginEditing(TextFieldModel model) {

    }

    @Override
    public void textUpdated(TextFieldModel model, String text) {

        if (text.length() > maxLength)
            text = text.substring(0, maxLength);

        if (text.equals(fieldModel.getText()))
            return;

        fieldModel.setText(text);

        for(Observer observer: observers)
            observer.fieldTextUpdated(this, fieldModel.getText());
    }
}

Here we can see TextEntryRowModel updating the TextField by calling the underlying TextFieldModel, which is a private member of TextEntryRowModel.

Now let’s look at the Error view, implemented with the MVC widgets:

public class ErrorView implements ErrorModel.Observer {

private final View view;

private TextFieldView descriptionLabel;
private ButtonView dismissButton;

private ErrorController controller;
private ErrorModel model;

public ErrorView(View view, ErrorController controller, ErrorModel model) {

this.view = view;
this.controller = controller;

this.model = model;

// Add model observer
model.addObserver(this);

// Assign view fields
descriptionLabel = new TextFieldView(this.view.findViewById(R.id.label), this.controller.getDescriptionController(), this.model.getDescriptionModel());
dismissButton = new ButtonView(this.view.findViewById(R.id.dismiss_button), this.controller.getDismissController(), this.model.getDismissModel());
}

public View getView() {

return this.view;
}

@Override
public void dismissRequested() {

}
}

And the Controller:

class ErrorController {

private ErrorModel model;
private TextFieldController descriptionController;
private ButtonController dismissController;

public ErrorController(ErrorModel model) {

this.model = model;

this.descriptionController = new NullTextFieldController();

this.dismissController = new ButtonController() {

@Override
public void onClick(View v) {

model.dismiss();
}
};
}

public TextFieldController getDescriptionController() {

return descriptionController;
}

public ButtonController getDismissController() {

return dismissController;
}
}

And the Model:

public class ErrorModel {

public static interface Observer {

void dismissRequested();
}

public ErrorModel(String dismissButtonText) {

this.descriptionModel = new TextFieldModel("");
this.dismissModel = new ButtonModel(dismissButtonText);

this.observers = new ArrayList<>();
}

private List<Observer> observers;

private TextFieldModel descriptionModel;
private ButtonModel dismissModel;

public void setDescription(String description) {

descriptionModel.setText(description);
}

public void addObserver(Observer observer) {

this.observers.add(observer);
}

public TextFieldModel getDescriptionModel() {

return descriptionModel;
}

public ButtonModel getDismissModel() {

return dismissModel;
}

public void dismiss() {

for(Observer observer: observers)
observer.dismissRequested();
}
}

Here we see the ErrorModel updating the text displayed in the ErrorView by calling the TextFieldModel it holds as a member. All the data coordination is done through a hierarchy of Models. This is the business logic, and it is separated and collected into the Models of the application. The Views only decided how to turn this use case data into visuals, making sure to stay up to date when the use case changes.

Now we can update the Login components to use the Button MVC classes. First the View:

public class LoginView implements LoginModel.Observer {

private final View view;

private TextEntryRowView usernameRow;
private TextEntryRowView passwordRow;

private ButtonView submitButton;

private ProgressBar loadingIndicator;

private ErrorView errorView;

private LoginController controller;
private LoginModel model;

public LoginView(View view) {

this.view = view;

this.model = new LoginModel();
this.controller = new LoginController(model);

// Assign model observer
model.addObserver(this);

// Assign view fields
usernameRow = new TextEntryRowView(this.view, this.model.getUsernameModel());
passwordRow = new TextEntryRowView(this.view, this.model.getPasswordModel());

errorView = new ErrorView(this.view, this.model.getErrorModel());
submitButton = new ButtonView(view.findViewById(R.id.submit_button), this.controller.getSubmitButtonController(), this.model.getSubmitButtonModel());
loadingIndicator = view.findViewById(R.id.loading_indicator);
}

public View getView() {

return this.view;
}

@Override
public void processingUpdated(boolean processing) {

loadingIndicator.setVisibility(processing ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void hasErrorUpdated(boolean hasError) {

errorView.getView().setVisibility(hasError ? View.VISIBLE : View.INVISIBLE);
}

@Override
public void finishLogin() {

// Start home page activity
}
}

Then the Controller:

public class LoginController {

private ButtonController submitButtonController;

private LoginModel model;

public LoginController(LoginModel model) {

this.model = model;

this.submitButtonController = new ButtonController() {

@Override
public void onClick(View v) {

model.attemptLogin();
}
};
}

public ButtonController getSubmitButtonController() {

return submitButtonController;
}
}

Then the Model:

public class LoginModel implements LoginService.OnResultHandler, TextEntryRowModel.Observer, ErrorModel.Observer {

public interface Observer {

void processingUpdated(boolean processing);
void hasErrorUpdated(boolean hasError);

void finishLogin();
}

public static final int MAX_USERNAME_LENGTH = 16;
public static final int MAX_PASSWORD_LENGTH = 24;

private List<Observer> observers;

private TextEntryRowModel usernameModel;
private TextEntryRowModel passwordModel;
private ErrorModel errorModel;

private ButtonModel submitButtonModel;

private LoginService loginService;
private boolean processing;
private boolean hasError;

LoginModel() {

this.usernameModel = new TextEntryRowModel("Username:", MAX_USERNAME_LENGTH);
this.passwordModel = new TextEntryRowModel("Password:", MAX_PASSWORD_LENGTH);
this.errorModel = new ErrorModel("Try Again");

this.submitButtonModel = new ButtonModel("Submit");

this.observers = new ArrayList<>();

this.loginService = new LoginService(this);

this.usernameModel.addObserver(this);
this.passwordModel.addObserver(this);
this.errorModel.addObserver(this);
}

public TextEntryRowModel getUsernameModel() {

return usernameModel;
}

public TextEntryRowModel getPasswordModel() {

return passwordModel;
}

public ErrorModel getErrorModel() {

return errorModel;
}

public ButtonModel getSubmitButtonModel() {

return submitButtonModel;
}

public void addObserver(Observer observer) {

observers.add(observer);
}

public void attemptLogin() {

setProcessing(true);

loginService.submit(usernameModel.getFieldText(), passwordModel.getFieldText());
}

private void setProcessing(boolean processing) {

this.processing = processing;

for(Observer observer: observers)
observer.processingUpdated(processing);
}

private void setHasError(boolean hasError) {

this.hasError = hasError;

for(Observer observer: observers)
observer.hasErrorUpdated(this.hasError);
}

private void setError(String errorDescription) {

setHasError(errorDescription != null);
errorModel.setDescription(errorDescription);
}

// LoginService.OnResultHandler
@Override
public void onResult(boolean loggedIn, String errorDescription) {

setProcessing(false);

if(loggedIn)
{
for(Observer observer: observers)
observer.finishLogin();
}
else
{
setError(errorDescription);
}
}

@Override
public void editRequestDeclined(TextEntryRowModel model) {

if(model == passwordModel && model.getFieldText().length() == 0)
setError("Please enter a username");
}

@Override
public void fieldTextUpdated(TextEntryRowModel model, String text) {

if(model == usernameModel)
passwordModel.setEditable(text.length() > 0);

boolean submitEnabled = usernameModel.getFieldText().length() > 0 && passwordModel.getFieldText().length() > 0;
submitButtonModel.setEnabled(submitEnabled);
}

// ErrorModel.Observer
@Override
public void dismissRequested() {

setHasError(false);
}
}

Now we have a design that properly represents the original intention of MVC. Notice that Controllers are now tiny. They are the smallest components in the code. As intended, the biggest components are the Models. And with Models on every level of the hierarchy, no single Model has too many responsibilities (the LoginModel is about 25% smaller than the MassiveViewController we started with, and almost half of this is boilerplate code like property accessors). In this small example, the total amount of code inflated by quite a bit, but as an application grows larger and more complex, and reusability increases, this pattern will start to significantly reduce the total amount of code needed. All the classes except LoginModel are available for reuse in other areas. Clearly, whatever valid criticisms there are of MVC, “MassiveViewController” isn’t one of them.

There are, of course, other GUI application patterns, like MVP and MVVM, but that’s another topic. When properly understood, any of these patterns, including MVC, will help you factor your applications into small, often reusable components, with high cohesion and encapsulation (with the unit-testability that comes along with these), and none of them will grow too large. If you see an application with huge “view controllers”, especially if they are subclassing framework classes, whatever it is, it isn’t MVC.

Abstraction Layers

Over time, the software we write continues to increase its sophistication.  We are solving more and more advanced problems, expressing the solution as a specific sequence of ones and zeroes stored on the memory of a Turing-complete machine.  These programs are steadily growing larger in size (today’s executables are typically on the order of a few megabytes, which is a few million 8-bit numbers), and require faster or more sophisticated (i.e. multicore) hardware to execute in reasonable time.  But the human mind is not getting more advanced over time, at least not nearly at the same rate.  We are as mentally advanced today as we were in the 1950s, but our computer programs are several orders of magnitude more advanced and complex.  How can the human software developers who produce this code possibly keep up their understanding with something that is so rapidly increasing in complexity?

The answer is by abstraction.  Today’s programmers do not hand-craft the millions of bytes of machine code instructions that ultimately form cutting edge software.  Nor could they ever read the code in that format and hope to comprehend what it does, much less how it does it.  Attempting to read and follow a modern software program in its compiled machine-language format is a clear demonstration of how vastly more complex computer software has become since the days when computer programs were hand-written machine code.  Instead, programmers use high-level programming languages to write code.  These languages contain much more abstract concepts than “add”, “move”, “jump” or the other concepts that machine instructions represent.

Properly understand, even machine code itself is an abstraction.  Each machine instruction represents an abstract operation done to the state of the machine.  We can take this abstraction away and watch the operation of a machine while executing a program, from the perspective of its electronic state.  We can record the time at which different gates are switched to produce different sub-circuits.  But even this is an abstraction.  We can go below the level of switching gates and hook voltmeters to the circuits to produce a graph of voltage over time.  While trying to understand what a computer program does by reading its machine code is hopeless, trying to understand what it does by recording the physical state of a machine running it is far more hopeless.  It would be hopeless even to understand hand-written machine code from the 50s in this way (and doing so would proceed by first trying to rediscover the machine code).  Even the very low-level (but very high-level, from the perspective of the electronic hardware that implements our Turing machines) abstraction of a central processor and a sequence of instructions to process aids massively in our ability to comprehend and compose software.  Not even a trivial computer program could be feasibly designed by specifying the voltage on the terminals of circuits as a function of time.

But even when looking at modern programs on the higher level of their source code they are still, as a whole, intractable to human comprehension.  It is not uncommon for a modern computer program to contain millions of lines of source code.  How could a human possibly be able to understand something with millions of interworking parts? 

The answer, again, is by abstraction.  A “line” of source code is the lowest level of abstraction in a program’s source code.  These lines are grouped together into functions.  Functions are grouped together into classes.  Classes are grouped together into modules.  Modules are grouped together into subsystems.  Subsystems are grouped together into libraries.  Libraries are grouped together into applications.  We can understand and follow something that ultimately takes millions of lines of code to express because we do not digest it in the form of raw lines.  We understand the application as a composition of a handful of libraries.  We do not attempt to understand how the libraries do what they do, as we try to understand the applications that use them.  We only understand the libraries by what they do, and from that, we understand the application in terms of how it uses those libraries to implement its own behaviors.  In a well-designed software application, the application-level code is not millions of lines of code.  It is thousands, or even merely hundreds, of lines of code.  This is what makes it tractable to the human mind.

But each of those lines is now far removed from what a computer can understand.  A line of code calling a high-level library ultimately gets compiled down to what could be thousands of machine code instructions.  By identifying a boundary between what and how, and equivalently why and what, we are able to take what is otherwise a massive and impenetrable problem, and factor it into individually digestible pieces.  We do not simply divide the problem into parts by size.  We cannot understand a compiled binary of millions of machine instructions by taking the first thousand and looking at them in isolation, then the next thousand, and so on.  The division is conceptual and builds up a hierarchy of higher and higher-level concepts, linked together by a why-how relationship.  The result is a series of layers, like the floors of a building.  We call these abstraction layers.

An abstraction layer is a “why”, “what” or “how” depending on from what perspective we are looking at it.  When considering a particular abstraction layer, it becomes the subject of our consideration: the “what”.  The layer immediately above it is, from this perspective, the “why”.  This abstraction layer exists because the abstraction layer above it needs it.  The layer immediately below it is, from this perspective, the “how”.  It is what the current abstraction layer uses as its implementation details.  If we then focus on the next layer down, it becomes the “what”, the previous layer becomes “why”, and the next layer down becomes “how”.

Abstraction layers are represented in different ways with different types of programming languages.  In object-oriented languages, an abstraction layer is identified by a class, which has two parts: an interface and an implementation.  When focusing on an interface, an implementation of that interface is the how.  When focusing on the implementation, the interface is the why.  Classes then link to each other through composition.  The implementation of a class contains fields, which are references to other interfaces.  A class’s implementation calls methods on its members, and on the parameters of its own methods.  So then these other interfaces are the how of an implementation.  When interface A is implemented by AImp, and AImp is composed of interfaces B and C, then A, AImp, and the set containing B and C each form abstraction layers, in order of most abstract to least abstract.  AImp is the “how” of “A”, while A is the “why” of AImp.  B and C are the “how” of AImp, while AImp is the (or a) “why” of B and C.  Somewhere there will be a BImp and CImp, which continues the sequence of abstraction layers.

In functional languages, function declarations and function bodies perform the analogous roles to interfaces and implementations.  A function body is the “how” of a function declaration, and the function declaration is the “why” of a function body.  Meanwhile, a function body contains a sequence of calls to other function declarations (note that a call to a function is a reference to a function declaration, not to a function body).  When function declaration A has a function body ABody, and ABody calls function declarations B and C, then A, ABody, and the set containing B and C form the analogous abstraction layers to A, AImp and {B, C} in the object-oriented example above.

Programmers navigate a computer program by starting at one implementation, and if needed, clicking on a line of code and selecting “go to definition”, which takes them to another implementation, with other lines of code that can also be followed to their definition.  This is a navigation of abstraction layers, and demonstrates how they link together repeatedly.

This structure of different layers being linked together is fractal.  On one level, a block of code is formed as multiple lines that call other blocks of code.  Those blocks are similarly formed as multiple lines to yet other blocks of code.  Thus the structure of code exhibits self-similarity at different scales.

Note that I said a well-designed application will contain a few hundred or thousand lines of code, in the form of calls to highly abstract library functions.  But a poorly designed application may not organize itself into libraries at all, or do so in a poor fashion that prevents one from truly forgetting about what is under the hood of a single line of code.  Lacking or improper abstractions forces one to digest a larger amount of information in a single “bite” in order to comprehend what the program does.  This makes the program more difficult to understand, because a larger chunk of its inherent complexity must be considered all at once.  Any small part of that chunk’s complexity requires dealing with all the rest of that chunk’s complexity.  While no human programmer could possibly understand the machine-code version of a modern software program, it is commonplace for the source code of a modern application to stretch the ability of human comprehension to its limits.  This takes the form of poorly designed computer code that is missing proper, well-formed abstractions that truly divide the problem into small, and truly distinct, bitesize pieces.

This leads us to the following principle of good software design, that serves as the foundation for all other software design principles:

The complexity in understanding computer code primarily varies proportionally with the distance between its abstraction layers

Each part of a computer program, however abstract, is implemented with less abstract, more concrete code, until one reaches the level of machine code.  By “distance between abstraction layers”, we mean how much less abstract a certain layer’s implementation is than its interface.  If the gap is very large, a class’s methods will inevitably be very long, difficult to follow and difficult to understand.  The gaps can be closed by introducing dividing abstractions: an abstraction layer placed above the low-level implementation details as they currently are, but below the high-level interface being implemented.  The implementation of those intermediate abstractions is simpler because the abstraction is closer to the implementation details.  Meanwhile, the implementation of the higher abstraction in terms of this intermediate abstraction is simpler than the original implementation, for the same reason.

From this it is clear that more and more advanced computer programs, with human designers, are only made possible by building upon the less advanced programs we have already built.  This is quite literally how computers have advanced.  The first computer programs had to be hand-written in machine code.  But then programmers hand-wrote the machine code for an assembler, which enabled them to then write programs in assembly.  With an assembler, they could then write a self-hosting assembler: an assembler whose own source code is not only written in its own assembly language, but that can be successfully assembled by itself.  Then they wrote a BASIC compiler in assembly, which then enabled writing a self-hosting BASIC compiler.  Then they wrote a C compiler in BASIC, and then a self-hosting C compiler.  Then they wrote a C++ or Smalltalk compiler in C, and then self-hosting C++/Smalltalk compilers.  Today, we have high-level programming languages like Java, which were, and often still are, implemented with lower level languages.  Each step is a tool, that becomes the means to constructing a more sophisticated tool, which in turn becomes the means to construct an even more sophisticated tool, and so on.  The tools, which are computer programs themselves, becomes more and more sophisticated, which enables the creation of not only sophisticated programs in general, but more sophisticated tools in particular.

This process is not peculiar to the development and advancement of computer software.  It is rather the general means by which humans have produced all the extremely advanced and sophisticated things they have produced.  A man uses his bare hands to fashion tools from stone, which he then uses to fashion a forge, which he then uses to fashion metal tools, which he then uses to fashion mechanical devices, which he then uses to fashion engines, which he then uses to fashion factories, which he then uses to fashion electronics, which he then uses to fashion all the advanced technology that surrounds us today.  We start by hand-crafting consumer goods.  Then we hand-craft tools that we use to craft consumer goods.  Then we hand-craft tools to make other tools.  Then we craft tools to make the tools we use to make other tools.  And so on.

Economists call this process the elongation of the production structure: a process by which the production goes through increasingly more steps.  Instead of directing building the thing we want, first we build a thing that builds the thing we want.  Even more indirectly, we build the thing that builds the thing that builds the thing we want.  This continues until, in a modern industrial and electronic economy, the actual end-to-end process of manufacturing a good from nature-given resources, when taking into account the production of all the tools used in its production, takes hundreds or thousands of steps, involving resources acquired from and shipped around all parts of the world, and occurring over many years or even decades. 

A modern economy is so complex that it can never be understood all at once by any of the humans whose actions constitute that economy.  Nor does it need to be understood all at once by its operators in order to function.  Instead parts of it are understood in isolation, and in how they fit into the parts immediately adjacent.  No people organize or coordinate the economy as a whole.  A person focuses on one small part, which is possible because the incredibly complex process of modern production has been factored out into conceptually well-formed parts (repeatedly, in a fractal way) that remain well-defined and identifiable in isolation.  If anyone attempted to understand a production process by reading a graph of positions of all the factors involved, as a function of time, they would be hopelessly lost.

The factors of production (tools and machines) of modern industry are implementations of abstractions.  We are able to define the requirement of a tool or machine as a derived requirement of producing something else (another tool/machine or a consumer good/service), because we are able to identify a high-level concept in the process of producing something.  If we defined production of a good in terms of the sequence of movements of physical objects over time that takes place in the process as it is done now, we would have no way of moving to a different sequence of movements of different objects and meaningfully say that the same production process has occurred (or even that the same thing has been produced).  By identifying the “what” as distinct from the “how”, the “how” becomes interchangeable.  This is the only way to correctly express a requirement.  The definition of producing a sandwich does not include details like taking a bladed piece of metal and moving it in a zig-zag pattern through a piece of bread.  Such details do not define a sandwich.  What defines a sandwich is sliced bread.  That definition relies on our ability to identify a high-level abstraction called “sliced”, which can be independently defined and verified.  It is not just a matter of allowing variation in the implementation details of making a sandwich.  It is about correctness.  It is simply wrong to define a sandwich by how the bread was sliced.

This is what we do in computer software when we abstract it.  We correctly define the requirement, which defines the what and not the how.  At the same time, the requirement itself is the “how” of some other, higher-level and more abstract requirement.  For example, the requirement to present an upgrade screen to a user is the “how” of a more abstract requirement to enable users to upgrade their accounts, which itself is the “how” of a still more abstract requirement to maximize profits.  On each level, it is not simply inconvenient or inflexible to put the “how” into the definition of a requirement.  It is simply wrong.  It does not correctly express what the requirement actually is, in the sense of specifying what conditions need to be met in order to say the requirement has been satisfied.

This is so deeply entwined into the structure of human thought, it is not really possible for us to imagine anything without it.  What we call “abstractions” here, are what in language are called “words”.  Every word in a language is an abstraction.  A word has a definition, which is another collection of words.  A word a high-level abstraction, with the words in its definition being lower-level abstractions.  The process of the human mind taking in data and structuring it into a form that is comprehensible to logical thought, is a process of abstraction.  To try to think about something without abstractions at all is to try to think without using language (even one you invented yourself), which is an oxymoron.

Recognizing the fundamental role of abstracting, and more specifically properly abstracting, while designing computer software, is none other than recognizing that abstracting underlies the very process of logical structuring that the human mind does to make reality understandable.  It has perhaps required more explicit emphasis in software than in other places (like manufacturing), because the virtual worlds we are creating in software are more malleable than the real one.  It is less obvious that the higher-level concepts in our code must follow a logical structure, because we create them from scratch (in a sense), than the higher-level physical entities we construct in the real world.  It is perhaps easier to see why a car needs to be built as a composition of an engine, a transmission, an axel, and so on, than it is to see why an application needs to be built as a composition of a user interface, bindings, models, use cases, services, stores and so on.  After all, aren’t all of these things just “made up”?  It’s all 1s and 0s in the end, right?

But these are all just mental constructs.  That a car is composed of an engine, a transmission, an axel, and so on, is only apparent to the mind of a rational observer.  It is not part of the physics of the car itself, which is, ultimately, just a distribution of mass and energy throughout space over time.  These “parts” of a car as just as “made up” as the parts of a software application.  They are both abstractions above the “raw” physical level of reality.  As they belong to the same category, they are just as important.  Trying to build software without abstractions (specifically proper abstractions) is as hopeless as building a car as a big jumbled pile of moving masses.  Good design of computer software ultimately comes down to whether the problem being solved has been correctly understood and broken down, with all the lines between what/why and how/what have been drawn in the right place.  Good design derives from identifying the proper abstractions, and expressing them as such in code.

If you find yourself straining to comprehend the codebase you are working on, it could be that the problem you are trying to solve is so irreducibly complex that it is almost impossible to grasp.  But much more likely (especially if you are working on a GUI application), your codebase is poorly abstracted and needs to be conceptually organized.  Good design all flows downhill from having the virtual world of a codebase composed of well-defined abstractions (the “well-defined” part is typically given names like “high cohesion” in design discussions, which really means the “thing” being considered has a concise and straightforward definition).  The benefit you reap from discovering and using such abstractions is as great as the benefit to human society of creating their wealth with a series of tools and machines rather than by hand.  It will be the difference between an impoverished software shop and an affluent one.