What Should Your Entities Be?

Introduction

So, you’re writing an application in an object-oriented programming language, and you need to work with data in a relational database. The problem is, data in a relational database is not in the same form as data in OOP objects. You have to solve this “impedance mismatch” in some way or another.

What exactly is the mismatch? Well, plain old data (POD) objects in OOP are compositions: one object is composed of several other objects, each of which are composed of several more objects, and so on. Meanwhile, relational databases are structured as flat rows in tables with relations to each other.

Let’s say we have the following JSON object:

{
  "aNumber": 5,
  "aString": "Hello",
  "anInnerObject": {
    "anotherNumber": 10,
    "aDate": "12-25-2015",
    "evenMoreInnerObject": {
      "yetAnotherNumber": 30,
      "andOneMoreNumber": 35,
      "oneLastString": "Hello Again"
    }
  }
}

In OOP code, we would represent this with the following composition of classes:

class TopLevelData {

  let aNumber: Int
  let aString: String
  let anInnerObject: InnerData 
}

class InnerData {

  let anotherNumber: Int
  let aDate: Date
  let evenMoreInnerObject: InnerInnerData
}

class InnerInnerData {
  
  let yetAnotherNumber: Int
  let andOneMoreNumber: Int
  let oneLastString: String
}

This is effectively equivalent to the JSON representation, with one important caveat: in most OOP languages, unless you use structs, the objects have reference semantics: the InnerData instance is not literally embedded inside the memory of the TopLevelData instance, it exists somewhere else in memory, and anInnerObject is really, under the hood, a pointer to that other memory. In the JSON, each sub-object is literally embedded. This means we can’t, for example, refer to the same sub-object twice without having to duplicate it (and by extension all its related objects), and circular references are just plain impossible.

This value vs. reference distinction is another impedance mismatch between OOP objects and JSON, which is what standards like json-api are designed to solve.

In the database, this would be represented with three tables with foreign key relations:

TopLevelData

id

Int
Primary Key
aNumber

Int
aString

Text
anInnerObjectId

Int
ForeignKey
InnerData.id

InnerData

id

Int
Primary Key
anotherNumber

Int
aDate

Date
evenMoreInnerObjectId

Int
ForeignKey
InnerInnerData.id

InnerInnerData

id

Int
Primary Key
yetAnotherNumber

Int
andOneMoreNumber

Int
oneLastString

Text

This representation is more like how OOP objects are represented in memory, where foreign keys are equivalent to pointers. Despite this “under the hood” similarity, on the visible level they’re completely different. OOP compilers “assemble” the memory into hierarchical structures we work with, but SQL libraries don’t do the same for the result of database queries.

The problem you have to solve, if you want to work with data stored in such tables but represented as such OOP classes, is to convert between foreign key relations in tables to nesting in objects…

…or is it?

It may seem “obvious” that this impedance mismatch needs to be bridged. After all, this is the same data, with different representations. Don’t we need an adapter that converts one to the other?

Well, not necessarily. Why do we need to represent the structure of the data in the database in our code?

ORMs to the Rescue

Assuming that yes, we do need that, the tools that solve this problem are called object-relational mapping, or ORM, libraries. The purpose of an ORM is to automate the conversion between compositional objects and database tables. At minimum, this means we get a method to query the TopLevelData table and get back a collection of TopLevelData instances, where the implementation knows how to do the necessary SELECTs and JOINs to get all the necessary data, build each object out of its relevant parts, then assign them to each other’s reference variables.

If we want to modify data, instead of hand-crafting the INSERTs or UPDATEs, we simply hand the database a collection of these data objects, and it figures out what records to insert or update. The more clever ones can track whether an object was originally created from a database query, and if so, which fields have been modified, so that it doesn’t have to write the entire object back to the database, only what needs updating.

We still have to design the database schema, connect to it, and query it in some way, but the queries are abstracted from raw SQL, and we don’t have to bother with forming the data the database returns into the objects we want, or breaking those objects down into update statements.

The fancier ORMs go further than this and allow you to use your class definitions to build your database schema. They can analyze the source code for the three classes, inspect its fields, and work out what tables and columns in those tables are needed. When it sees a reference-type field with one object containing another, it’s a cue to create a foreign key relationship. With this, we no longer need to design the schema, we get it “for free” by simply coding our POD objects.

This is fancy and clever. It’s also, the way we’ve stated it, unworkably inefficient.

Inefficiency is a problem with any of these ORM solutions because of their tendency to work on the granularity of entire objects, which correspond to entire rows in the database. This is a big problem because of foreign key relations. The examples we’ve seen so far only have one-to-one relations. But we can also have one-to-many, which would look like TopLevelData having a field let someInnerObjects: [InnerData] whose type is a collection of objects, and many-to-many, which would add to this a “backward” field let theTopLevelObjects: [TopLevelData] on InnerData.

The last one is interesting because it is unworkable in languages that use reference counting for memory management. That’s a circular reference, which means you need to weaken one of them, but by weakening one (say, the reference from InnerData back to TopLevelData) means you must hold onto the TopLevelData separately. If you, for example, query the database for an InnerData, and want to follow it to its related TopLevelData, they’ll be gone as soon as you get your InnerData back.

This is, of course, not a problem in garbage collected languages. You just have to deal with all the other problems of garbage collection.

With x-to-many relations, constructing a single object of any of our classes might end up pulling hundreds or thousands of rows out of the database. The promise we’re making in our class, however implicit, is that when we have a TopLevelData instance, we can follow it through references to any of its related objects, and again, and eventually end up on any instance that is, through an arbitrarily long chain of references, related back to that TopLevelData instance. In any nontrivial production database, that’s an immediate showstopper.

A less severe form of this same problem is that when I grab a TopLevelData instance, I might only need to read one field. But I end up getting the entire row back. Even in the absence of relations, this is still wasteful, and can become unworkably so if I’m doing something like a search that returns 10,000 records, where I only need one column from each, but the table has 50 columns in it, so I end up querying 50,000 cells of data. That 50x cost, in memory and CPU, is a real big deal.

By avoiding the laborious task of crafting SQL queries, where I worry about SELECTing and JOINing only as is strictly necessary, I lose the ability to optimize. Is that premature optimization? In any nontrivial system, eventually no.

Every ORM has to deal with this problem. You can’t just “query for everything in the object” in the general case, because referenced objects are “in”, and that cascades.

And this is where ORMs start to break down. We’ll soon realize that the very intent of ORMs, to make database records look like OOP objects, is fundamentally flawed.

The Fundamental Flaw of ORMs

We have to expose a way to SELECT only part of an object, and JOIN only on some of its relations. That’s easy enough. Entity Framework has a fancy way of letting you craft SQL queries that looks like you’re doing functional transformations on a collection. But the ability to make the queries isn’t the problem. Okay, so you make a query for only part of an object.

What do you get back?

An instance of TopLevelData? If so, bad. Very bad.

TopLevelData has everything. If I query only the aNumber field, what happens when I access aString? I mean, it’s there! It’s not like this particular TopLevelData instance doesn’t have an aString. If that were the case, it wouldn’t be a TopLevelData. A class in an OOP language is literally a contract guaranteeing the declared fields exist!

So, what do the other fields equal? Come on, you know what the answer’s gonna be, and it’s perfectly understandable that you’re starting to cry slightly (I’d be more concerned if you weren’t):

null

I won’t belabor this here, but the programming industry collectively learned somewhere in the last 10 years or so that the ancient decision of C, from which so many of our languages descend in some way, to make NULL not just a valid but the default value of any pointer type, is one of the most expensive bug-breeding decisions that’s ever been made. I don’t mean to diss Dennis and Ken, it was a perfectly understandable decision for a language like C at the time it was created.

It’s not reasonable to carry that forward into C++, Objective-C, C# and Java, and this is one of the greatest improvements that Swift and Kotlin (and Typescript if you configure it properly) made over these languages. null is not a String, so if I tell my type system this variable is a String, assigning null to it should be a type error! If I want to signal that a variable is either a String or null, I need a different type, like String?, or Optional<String>, or String | null, which is not identical to String and can’t be cast to one.

I said I wouldn’t belabor this, so back to the subject: the ability of ORMs to do their necessary optimization in C# and Java is literally built on the biggest gaping hole of their type systems. And of course this doesn’t work with primitives, so you either have to make everything boxed types, or God forbid, decide that false and 0 is what anything you didn’t ask for will equal.

It really strikes to the heart of this issue that in Swift, which patched that gaping hole, an ORM literally can’t do what it wants to do in this situation. You’d have to declare every field in your POD objects to be optional. But then what if a particular field is genuinely nullable in the database, and you want to be able to tell that it’s actually null, and not just something you didn’t query for? Make it an optional optional? For fu…

Either way, making every field optional would throw huge red flags up, as it should.

In Java and C#, there’s no way for me to know, just by looking at the TopLevelData instance I have, if the null or false or 0 I’m staring at came from the database or just wasn’t queried. All the information about what was actually SELECTed is lost in the type system.

We could try to at least restrict this problem to the inefficiency of loading an entire row (without relations), but making relations lazy loaded: the necessary data is only queried from the database when it is accessed in code. This tries to solve the problem of ensuring the field has valid data whenever accessed while also avoiding the waste of loading a potentially very expensive (such as x-to-many) relation that is never accessed.

This comes with a host of its own problems, and in my experience it’s never actually a workable solution. Database connections are typically managed with some type of “context” object that, among other things, is how you control concurrency, since database access is generally not thread safe. You usually create a context in order to make a query, get all the data you need, then throw the context away once the data is safely stored in POD objects.

If you try to lazy-load relations, you’re trying to hit the database after the first query is over, and you’ve thrown the context away. Either it will fail because the context is gone, or it’s going to fail because the object with lazy-loading capabilities keeps the context alive, and when someone else creates a context it throws a concurrency exception.

You can try so solve this by storing some object capable of creating a new context in order to make the query on accessing a field. But even if you can get this to work, you’ll end up potentially hitting the database while using an object you returned to something like your UI. To avoid UI freezes you’d have to be aware that some data is lazy-loaded, keep track of whether it’s been loaded or not, and if not, make sure to do it asynchronously and call back to update the UI when it’s ready. By that point you’re just reinventing an actual database query in a much more convoluted way.

The Proper Solution

What we’re trying to do is simply not a good idea. Returning a partial object of some class, but having it declared as a full instance of that class, violates the basic rules of object-oriented programming. The whole point of a type system is to signal that a particular variable has particular members with valid values. Returning partials throws that out the window.

We can do much better in Typescript, whose type system is robust enough to let us define Partial<T> for any T, that will map every member of type M in T to a member of type M | undefined. That way, we’re at least signaling in the type system that we don’t have a full TopLevelData. But we still can’t signal which part of TopLevelData we have. The stuff we queried for becomes nullable even when it shouldn’t be, and we have to do null checks on everything.

Updating objects is equally painful with ORMs. We have to supply a TopLevelData instance to the database, which means we need to create a full one somehow. But we only want to update one or a few fields. How does the framework know what parts we’re trying to update? Combine this with the fact that part of the object may be missing because we didn’t query for all of it, and what should the framework do? Does it interpret those empty fields as instructions to clear the data from the database, or just ignore them?

I know Entity Framework tries to handle this by having the generated subclasses of your POD objects try to track what was done to them in code. But it’s way more complicated than just setting fields on instances and expecting it to work. And it’s a disaster with relations, especially x-to-many relations. I’ve never been able to get update statements to work without loading the entire relations, which it needs just so it can tell exactly how what I’m saving back is different from what’s already there. That’s ridiculous. I want to set a single cell on a row, and end up having to load an entire set of records from another table just so the framework can confirm that I didn’t change any of those relations?

Well, of course I do. If I’m adding three new records to a one-to-many relation, and removing one, then how do I tell EF this? For the additions, I can’t just make an entity where the property for this relationship is an array that contains the three added records. That’s telling EF those are now the only three related entities, and it would try to delete the rest. And I couldn’t tell it to remove any this way. The only thing I can do is load the current relations, then apply those changes (delete one, add three) to the loaded array and save it back. There’s no way to do this in an acceptably optimized fashion.

The conclusion is inescapable:

It is not correct to represent rows with relations in a database as objects in OOP

It should be fairly obvious, then, what we should be representing as objects in OOP:

We should represent queries as objects in OOP

Instead of writing a POD object to mirror the table structure, we should write POD objects to mirror query structures: objects that contain exactly and only the fields that a query SELECTed. Whether those fields came from a single table or were JOINed together doesn’t matter. The point is whatever array of data each database result has, we write a class that contains exactly those fields.

For example, if I need to grab the aNumber from a TopLevelData, the aDate from its related InnerData, and both yetAnotherNumber and oneLastString from its related InnerInnerData, I write the following class:

struct ThisOneQuery {

  let aNumber: Int
  let aDate: Date
  let oneLastString: String
}

This means we may have a lot more classes than if we just wrote one for each table. We might have dozens of carefully crafted queries, each returning slightly different combinations of data. Each one gets a class. That may sound like extra work, but it’s upfront work that saves work later, as is always the case with properly designing a type system. No more accidentally accessing or misinterpreting nulls because they weren’t part of the query.

We apply the same principal to modifying data. Whatever exact set of values a particular update statement needs, we make a class that contains all and only those fields, regardless of whether they come from a single table or get distributed to multiple tables. Again, we use the type system to signal to users of an update method on our Store class exactly what they need to provide, and what is going to be updated.

These query objects don’t need to be flat. They can be hierarchical and use reference semantics wherever it is helpful. We can shape them however we want, in whatever way makes it easiest to work with them. The rule is that every field is assigned a meaningful value, and nulls can only ever mean that something is null in the database.

Entity Framework does something interesting that approximates what I’m talking about here: when you do a Select on specific fields, the result is an anonymous type that contains only the fields you selected. This is exactly what we want. However, since the type is anonymous (it doesn’t have a name), you can’t return them as is. We still need to write those query result classes and give them a name, but this feature of Entity Framework will make it a lot easier to construct those objects out of database queries.

We can get similar functionality in C++ by using variadic templates, to write a database query method that returns a tuple<...> containing exactly the fields we asked for. In that case, it’s a named type and we can return it as is, but the type only indicates which types of fields, in what order, we asked for. The fields aren’t named. So we’d still want to explicitly define a class, presumably one we can reinterpret_cast that tuple<...> to.

The payoff of carefully crafting these query classes is that we get stronger coupling between what our model layers work with to drive the UI and what the UI actually needs, and looser coupling between the model layer and the exact details of how data is stored in a database. It’s always a good idea to let requirements drive design, including the database schema. Why create a schema that doesn’t correspond in a straightforward way to what the business logic in your models actually needs? But even if we do this, some decisions about how to split data across tables, whether to create intermediate objects (as is required for many-to-many relations), and so on may arise purely out of the mechanics of relational databases, and constitute essentially “implementation details” of efficiently and effectively storing the data.

Writing classes that mirror the table structure of a database needlessly forces the rest of the application to work with data in this same shape. You can instead start by writing the exact POD objects you’d most prefer to use to drive your business logic. Once you’ve crafted them, they signal what needs to be queried from the database. You have to write your SELECTs and JOINs so as to populate every field on these objects, and no more.

If, later, the UI or business logic requirements change, and this necessitates adding or removing a field to/from these query classes, your query method will no longer compile, guiding you toward updating the query appropriately. You get a nice, compiler-driven pipeline from business requirements to database queries, optimized out of the box to supply exactly what the business requirement need, wasting no time on unnecessary fetches.

This also guides how queries and statements are batched. Another problem with ORMs is that you can’t bundle fetches of multiple unrelated entities into a single query, because there’s no type representing the result. It would be a tuple, but only C++ has the machinery necessary to write a generic function that returns arbitrary tuples (variadic generics). You’re stuck having to make multiple separate trips to the database. This may be okay, or even preferred, in clients working with embedded databases, but wherever the database lives on another machine, each database statement is a network trip, and you want to batch those where possible.

By writing classes for queries, the class contain be a struct that contains the structs for each part of the data, however unrelated it is to other parts. With this we can hit the database once, retrieving all we need, and nothing extra, even if it constitutes multiple fetches of completely unrelated data. We can do the same with updates, although we could achieve the same in ORM with transactions.

Queries as classes also integrates very well with web APIs, especially if they follow a standard like json-api that supports partial objects. Anyone who’s tried writing the network glue for updating a few fields in an object whose class represents an entire backend database entity knows the awkwardness of having to decide either to inefficiently send the entire object every time, or come up with some way to represent partial objects. This could be straightforward in Typescript, where a Partial<T> would contain only what needs to be updated, but even there we can improve the situation with transaction objects because they signal what data is going to be updated. With queries, requesting specifically the fields needed translates straightforwardly into parsing the responses to the query objects, which contain the same fields as what was requested.

Conclusion

It turns out it not only is not necessary but wholly misguided to try to represent your database tables as classes in OOP code. That set of classes exists conceptually, as that’s exactly what the database is ultimately storing, but just because those classes conceptually exist doesn’t mean you need to code them. You may find it useful to write them purely to take advantage of the schema-specifying features of ORMs, but their usage should not go beyond this.

The actual interactions with the database, with data going in and out, don’t work in terms of entire rows with all their relations, but with carefully selected subsets. The solution we’re yearning for, that made us think an ORM might help, is in fact a rather different solution of representing individual queries as classes. Perhaps eventually a tool can be written that automates this with some type of code generation. Until then, I promise you’ll be much happier handwriting those query classes than you ever were working with entity classes.

On ReactiveX – Part IV

In the previous parts, first we tried to write a simple app with pure Rx concepts and were consumed by demons, then we disentangled the Frankenstein Observable into its genuinely cohesive components, then organized the zoo of operators by associating them with their applicable targets. Now it’s time to put it all together and fix the problems with our Rx app.

Remember, our requirements are simple: we have a screen that needs to download some data from a server, display parts of the downloaded data in two labels, and have those labels display “Loading…” until the data is downloaded. Let’s recall the major headaches that arose when we tried to do this with Rx Observables:

  • We started off with no control over when the download request was triggered, causing it to be triggered slightly too late
  • We had trouble sharing the results of one Observable without inadvertently changing significant behavior we didn’t want to change. Notice that both this and the previous issue arose from the trigger for making the request being implicit
  • We struggled to find a stream pipeline that correctly mixed the concepts of “hot” and “cold” in such a way that our labels displayed “Loading…” only when necessary,

With our new belt of more precise tools, let’s do this right.

The first Observable we created was one to represent the download of data from the server. The root Observable was a general web call Observable that we create with an HTTPRequest, and whose type parameter is an HTTPResponse. So, which of the four abstractions really is this? It’s a Task. Representing this as a “stream” makes no sense because there is no stream… no multiple values. One request, one response. It’s just a piece of work that takes time, that we want to execute asynchronously, and may produce a result or fail.

We then transformed the HTTPResponse using parsers to get an object representing the DataStructure our server responds with. This is a transformation on the Task. This is just some work we need to do to get a Task with the result we actually need. So, we apply transformations to the HTTP Task, until we end up with a Task that gives us the DataStructure.

Then, what do we do with it? Well, multiple things. What matters is at some point we have the DataStructure we need from the server. This DataStructure is a value that, at any time, we either have or don’t have, and we’re interested in when it changes. This is an ObservableValue, particularly of a nullable DataStructure. It starts off null, indicating we haven’t retrieved it yet. Once the Task completes, we assign this ObservableValue to the retrieved result.

That last part… having the result of a Task get saved in an ObservableValue… that’s probably a common need. We can write a convenience function for that.

We then need to pull out two different strings from this data. These are what will be displayed by labels once the data is loaded. We get this by applying a map to the ObservableValue for the DataStructure, resulting in two ObservableValue Strings. But wait… the source ObservableValue DataStructure is nullable. A straight map would produce a nullable String. But we need a non-null String to tell the label what to display. Well, what does a null DataStructure represent? That the data isn’t available yet. What should the labels display in that case? The “Loading…” text! So we null-coalesce the nullable String with the loading text. Since we need to do that multiple times, we can define a reusable operator to do that.

Finally we end up with two public ObservableValue Strings in our ViewModel. The View wires these up to the labels by subscribing and assigning the label text on each update. Remember that ObservableValues give the option to have the subscriber be immediately notified with the current value. That’s exactly what we want! We want the labels to immediately display whatever value is already assigned to those ObservableValues, and then update whenever those values change. This only makes sense for ObservableValues, not for any kind of “stream”, which doesn’t have a “current” value.

This is precisely that “not quite hot, not quite cold” behavior we were looking for. Almost all the pain we experienced with our Rx-based attempt was due to us taking an Observable, which includes endless transformations, many of which are geared specifically toward streams, subscribing to it and writing the most recently emitted item to a UI widget. What is that effectively doing? It’s caching the latest item, which as we saw is exactly what a conversion from an EventStream to an ObservableValue does. Rx Observables don’t have a “current value”, but the labels on the screen certainly do! It turned out that the streams we were constructing were very sensitive to timing in ways we didn’t want, and it was becoming obvious by remembering whatever the latest emitted item was. By using the correct abstraction, ObservableValue, we simply don’t have all these non-applicable transformations like merge, prepend or replay.

Gone is the need to carefully balance an Observable that gets made hot so it can begin its work early, then caches its values to make it cold again, but only caches one to avoid repeating stale data (remember that caching the latest value from a stream is really a conversion from an EventStream to a… ObservableValue!). All along, we just needed to express exactly what a reactive user interface needs: a value, whose changes over time can be reacted to.

Let’s see it:

class DataStructure
{
    String title;
    int version;
    String displayInfo;
    ...
}

extension Task<Result>
{
    public void assignTo(ObservableValue<Result> destination)
    {
        Task.start(async () ->
        {
            destination.value = await self;
        });
    }
}

extension ObservableValue<String?>
{
    private Observable<String> loadedValue()
    {
        String loadingText = "Loading...";

        return self
            .map(valueIn -> valueIn ?? loadingText);
    }
}

class AppScreenViewModel
{
    ...

    public final ObservableValue<String> dataLabelText;
    public final ObservableValue<String> versionLabelText;

    private final ObservableValue<DataStructure?> data = new ObservableValue<DataStructure?>(null);
    ...

    public AppScreenViewModel()
    {
        ...

        Task<DataStructure> fetchDataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response);

        fetchDataStructure.assignTo(data);
        
        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure?.displayInfo)
            .loadedValue();

        versionLabelText = dataStructure
            .map(dataStructure -> 
 dataStructure?.version.map(toString))
            .loadedValue();
    }
}

...

class AppScreen : View
{
    private TextView dataLabel;
    private TextView versionLabel;

    ...

    private void bindToViewModel(AppScreenViewModel viewModel)
    {
        subscriptions.add(viewModel.dataLabelText
            .subscribeWithInitial(text -> dataLabel.setText(text)));

        subscriptions.add(viewModel.versionLabelText
            .subscribeWithInitial(text -> versionLabel.setText(text)));
    }
}

Voila.

(By the way, I’m only calling ObservableValues “ObservableValue“s to avoid confusing them with Rx Observables. I believe they are what should be properly named Observable, and that’s what I would call them in a codebase that doesn’t import Rx)

This, I believe, achieves the declarative UI style we’re seeking, that avoids the need to manually trigger UI refreshes and ensures rendered data is never stale, and also avoids the pitfalls of Rx that are the result of improperly hiding multiple incompatible abstractions behind a single interface.

Where can you find an implementation of these concepts? Well, I’m working on it (I’m doing the first pass in Swift, and will follow with C#, Kotlin, C++ and maybe Java implementations), and maybe someone reading this will also start working on it. For the time being you can just build pieces you need when you need them. If you’re building UI, you can do what I’ve done several times and write a quick and dirty ObservableValue abstraction with map, flatMap and combine. You can even be lazy and make them all eagerly stored (it probably isn’t that inefficient to just compute them all eagerly, unless your app is really crazy sophisticated). You’ll get a lot of mileage out of that alone.

You can also continue to use Rx, as long as you’re strict about never using Observables to represent DataStreams or Tasks. They can work well enough as EventStreams, and Observables that derive from BehaviorSubjects work reasonably well as ObservableValues (until you need to read the value imperatively). But don’t use Rx as a replacement for asynchrony. Remember that you can always block threads you create, and yes you can create your own threads and block them as you please, and I promise the world isn’t going to end. If you have async/await, remember that it was always the right way to handle DataStreams and Tasks, but don’t try to force it to handle EventStreams or ObservableValues… producer-driven callbacks really are the right tool for that.

Follow these rules and you can even continue to use Rx and never tear your hair out trying to figure out what the “temperature” of your pipelines are.

On ReactiveX – Part III

Introduction

In the last part, we took a knife to Rx’s one-size-fits-all “Observable” abstraction, and carved out four distinct abstractions, each with unique properties that should not be hidden from whoever is using them.

The true value, I believe, of Rx is in its transformation operators… the almost endless zoo of options for how to turn one Observable into another one. The programming style Rx encourages is to build every stream you need through operators, instead of by creating Subjects and publishing to them “imperatively”.

So this begs the question… what happens to this rich (perhaps too rich, as the sheer volume of them is headache-inducing) language of operators when we cleave the Observable into the EventStream, the DataStream, the ObservableValue and the Task?

What happens is operators also get divided, this time into two distinct categories. The first category is those operators that take one of those four abstractions and produces the same type of abstraction. They are, therefore, still transformations. They produce an EventStream from an EventStream, or a DataStream from a DataStream, etc. The second category is those operators that take one of those four abstractions and produces a different type of abstraction. They are not transformers but converters. We can, for example, convert an EventStream to a DataStream.

Transformations

First let’s talk about transformations. After dividing up the abstractions, we now need to divvy up the transformations that were originally available on Observable, by determining which ones apply to which of these more focused abstractions. We’ll find that some still apply in all cases, while others simply don’t make sense in all contexts.

We should first note that, like Observable itself, all four abstractions I identified retain the property of being generics with a single type parameter. Now, let’s consider the simplest transformation, map: a 1-1 conversion transformation that can change that type parameter. This transformation continues to apply to all four abstractions.

We can map an EventStream to an EventStream: this creates a new EventStream, which subscribes to its source EventStream, and for each received event, applies a transformation to produce a new event of a new type, and then publishes it. We can map a DataStream to a DataStream: this creates a new DataStream, where every time we consume one of its values, it first consumes a value of its source DataStream, then applies a transformation and returns the result to us. We can map an ObservableValue: this creates a new ObservableValue whose value is, at any time, the provided transformation of the source ObservableValue (this means it must be read only. We can’t manually set the derived ObservableValue‘s value without breaking this relationship). It therefore updates every time the source ObservableValue updates. We can map a Task to a Task: this is a Task that performs its source Task, then takes the result, transforms it, and returns it as its own result.

We also have the flatMap operator. The name is confusing, and derives from collections: a flatMap of a Collection maps each element to a Collection, then takes the resulting Collection of Collections and flattens it into a single Collection. Really, this is a compound operation that first does a map, then a flatten. The generalizing of this transformation is that it takes an X of X of T, and turns it in an X of T.

The flatten operator, and therefore flatMap, also continues to apply to all of our abstractions. How do we turn an EventStream of EventStreams of T into an EventStream of T? By subscribing to each inner EventStream as it is published by the outer EventStream, and publishing its events to a single EventStream. The resulting EventStream receives all the events from all the inner EventStreams as they become available. How do we turn a DataStream of DataStreams of T into a DataStream of T? When we consume a value, we consume a single DataStream from the outer DataStream, store it, and then supply each value from it on each consumption, until it runs out, then we go consume the next DataStream from the outer DataStream, and repeat. How do we turn an ObservableValue of an ObservableValue of T into an ObservableValue? By making the current value the current value of the current ObservableValue of the outer ObservableValue (which then updates every time either the outer or inner ObservableValue updates). How do we turn a Task that produces a Task that produces a T into a Task that produces a T? We run the outer Task, then take the result inner Task, run it, and return its result.

In Rx land, it was realized that flatten actually has another variation that didn’t come up with ordinary Collections. Each time a new inner Observable is published, we could continue to observe the older inner Observables, or we can stop observing the old one and switch to the new one. This is a slightly different operator called switch, and it leads to the combined operator switchMap. For us, this continues to be the case for only EventStream, because the concept depends on being producer-driven: that streams publish values on their own accord, and we must decide whether to keep listening for them. DataStreams are consumer-driven, so flatMap must get to the end of one inner DataStream before moving to the next. ObservableValue and Task don’t involve multiple values so the concept doesn’t apply there.

Now let’s look at another basic transformation: filter. Does this apply to all the abstractions? No... because filtering is inherently about multiple values: some get through, some don’t. But only two of our four abstractions involve multiple values: EventStream and DataStream. We can therefore meaningfully filter those. But ObservableValue and Task? Filtering makes no sense there, because there’s only one value or result. Any other transformations that inherently involve multiple values (filter is just one, others include buffer or accumulate) therefore only apply to EventStream and DataStream, but not ObservableValue or Task.

Another basic operator is combining: taking multiple abstractions and combining them into a single one. If we have an X of T1 and an X of T2, we may be able to combine them into a single X of a combined T1 and T2 (i.e. a (T1, T2) tuple). Or, if we have a Collection of Xs of Ts, we can combine them into a single X of a Collection of Ts, or possibly a single X of T. Can this apply to all four abstractions? Yes, but we’ll see that for abstractions that involve multiple values, there are multiple ways to “combine” their values, while for the ones that involve on a single value, there’s only one way to “combine” their values.

That means for ObservableValue and Task, there’s one simple combine transformation. A combined ObservableValue is one whose value is the tuple/collection made up of its sources’ values, and therefore it changes when any one of its source values changes. A combined Task is one that runs each one of its source Tasks in parallel, waits for them all to finish, then returns the combined results of all as its own result (notice that, since Task is fundamentally concerned with execution order, this becomes a key feature of one of its transformations).

With EventStream and DataStream, there are multiple ways in which we can combine their values. With EventStream, we can wait for all sources to publish one value, at which point we publish the first combined value, we store these combined values, then each time any source value changes, we update only the published one, keeping all the rest the same, and publish that new combined value. This is the combineLatest operator: each published event represents the most recently published events from each source stream. We can alternatively wait for each source to publish once, at which point we publish the combination, then we don’t save any values, and again wait for all sources to publish again before combining and publishing again. This is the zip operator.

But combineLatest doesn’t make sense for DataStream though, because it based on when each source stream publishes. The “latest” of combineLatest refers to the timing of the source stream events. Since DataStreams are consumer-driven, there is no timing. The DataStream is simply asked to produce a value by a consumer. Therefore, there’s only combine, where when a consumer consumes a value, the combined DataStream then consumes a value from each of its sources, combines then and returns it. This is the zip operator, which continues to apply to DataStream.

Both EventStream and DataStream also ways to combine multiple streams into a single stream of the same type. With EventStream, this is simply the stream that subscribes to multiple sources and publishes when it receives a value from any of them. This is the merge operator. The order in which a merged EventStream publishes is dictated by the order it receives events from its source streams. We can do something similar with DataStream, but since DataStreams are consumer-driven, the transformation has to decide which source to consume from. A merge would be for the DataStream to first consume all the values of the first source stream, then all the values of second one, and so on (thus making it equivalent to a merge on a Collection… we could also call this concatenate to avoid ambiguity). We can also do a roundRobin: each time we consume a value the combined stream consumes one from a particular source, then one from the next one, and so on, wrapping back around after it reaches the end. There are all sorts of ways we can decide how to pick the order of consumption, and a custom algorithm can probably be plugged in as a Strategy to a transformation.

Somewhat surprisingly, I believe that covers it for ObservableValue and Task, with one exception (see below): map, flatten and combine are really the only transformations we can meaningfully do with them, because all other transformations involve either timing or value streams. Most of the remaining transformations from Rx we haven’t talked about will still apply to both EventStream and DataStream, but there are some important ones that only apply to one or the other. Any transformations that involve order apply only to DataStream, for example append or prepend. Any transformations that are driven by timing of the source streams apply only to EventStream, for example debounce or delay. And some transformations are really not transformations but conversions.

The exception I mentioned is for ObservableValue. EventStreams are “hot”, and their transformations are “eager”, and it never makes sense for them to not be (in the realist interpretation of events and “observing”). Derived ObservableValues, however, can be “eager” or “lazy”, and both are perfectly compatible with the abstraction. If we produce one ObservableValue from a sequence of transformations (say, maps and combines) on other ObservableValues, then we can choose to either perform those transformations every time someone reads the value, or we can choose to store the value, and simply serve it up when asked.

I believe the best way to implement this is to have derived ObservableValues be lazy by default: their values get computed from their sources on each read. This also means when there are subscribers to updates, they must subscribe to the source values’ updates, then apply the transformations each time new values are received by the sources. But sometimes this isn’t the performance we want. We might need one of those derived values to be fast and cheap to read. To do that, we can provide the cache operator. This takes an ObservableValue, and creates a new one that stores its value directly. This also requires that it eagerly subscribe to its source value’s updates and use them to update the stored value accordingly. There is also an issue of thread safety: what if we want to read a cached ObservableValue from one thread but it’s being written to on another thread? To handle this we can allow the cache operator to specify how (using a Scheduler) the stored value is updated. These issues of caching and thread safety are unique to observable values.

Converters

Now let’s talk about how we can turn one of these four abstractions into another one of of the four. In total that would be 12 possible converters, assuming that there’s a useful or sensible way to convert each abstraction into each other one.

Let’s start with EventStream as the source.

What does it mean to convert an EventStream into a DataStream? This means we’re talking a producer driven stream and converting it to a consumer driven stream. Remember the key distinction is that EventStreams are defined by timing: the events happen when they do, and subscribers are either subscribed at the time or they miss them. DataStreams are defined by order: the data are returned in a specific sequence, and it’s not possible to “miss” one (you can skip one but that’s a conscious choice of the consumer). Thus, turning an EventStream into a DataStream is fundamentally about storing events as they are received until they are later consumed, ensuring that none can be missed. It is, therefore, buffering. For this reason, this conversion operator is called buffer. It internally builds up a Collection of events received from the EventStream, and when a consumer consumes a value, the first element in the collection is returned immediately. If the Collection is empty, the consumer will be blocked until the next event is received.

What does it means to convert an EventStream to an ObservableValue? It would mean we’re storing the latest event emitted by the stream, so we can query what it is at any time. We call this converter cacheLatest. Note that the latest event must be cached, or else we wouldn’t be able to read it on demand. That’s fundamentally what this converter is doing: taking transient events that are gone right after they occur, and making them persistent values that can be queried as needed. This can be combined with other transformations on EventStream to produce some useful derived converters. For example, if we apply the accumulate operator to the EventStream, then use cacheLatest to produce an ObservableValue, the result is an accumulateTo operator, which stores a running accumulation (perhaps a sum) of incoming values over time.

What does it mean to convert an EventStream to a Task? Well, basically it would mean we create a Task that waits for one or more events to be published, then returns them as the result. But as we will see soon, it makes more sense to create “wait for the next value” Tasks out of DataStreams, and we can already convert EventStreams to DataStreams with buffer. Therefore, this converter would really be a compound conversion of first buffering and then fetching the next value. We can certainly write it as a convenience function, but under the hood it’s just composing other converters.

Now let’s move to DataStream as the source.

What does it mean to convert a DataStream to an EventStream? Well, an EventStream publishes its own events, but a DataStream only returns values when a consumer consumes them. Thus, turning a DataStream into an EventStream involves setting up a consumer to immediately start consuming values, and publish them as soon as they are available. The result is that multiple observers can now listen for those values as they get published, with the caveat that they’ll miss any values if they don’t subscribe at the right time. We can call this conversion broadcast.

What does it mean to convert a DataStream to an ObservableValue? Nothing useful or meaningful, as far as I can tell. Remember meaning of converting an EventStream to an ObservableValue was to cache the latest value. That’s a reference to timing. But timing in a DataStream is controlled by the consumer, so all that could mean is a consumer powers through all the values and saves them to an ObservableValue. The result is a rapidly changing value that then gets stuck on the last value in the DataStream. That doesn’t appear to be a valid concept.

What does it mean to convert a DataStream to a Task? Simple: read the next value! In fact, in .NET, where Task is the return value of all async functions, the return value of the async read method would then have to be a Task. There can, of course, be other related functions to read more than one value. We can also connect an input DataStream to an output DataStream (which, remember, is a valid abstraction but not one carved out from Observable, which only represents sources and not sinks), which results in a Task whose work is to consume values from the input and send them to the output.

Now let’s move to ObservableValue as the source.

What does it mean to convert an ObservableValue to an EventStream? Simple: publish the updates! Now, part of an ObservableValue‘s features is being able to subscribe to updates. How is this related to publishing updates? When we subscribe to an ObservableValue, the subscription is lazy (for derived ObservableValues, the transformations are applied to source values as they arrive), and we have the option of notifying our subscriber immediately with the current value. But when we produce an EventStream from an ObservableValue, remember EventStreams are always hot! It must eagerly subscribe to the ObservableValue and then publish each update it receives. This is significant for derived lazy ObservableValues, because as long as someone holds onto an EventStream produced from it, it has an active subscription and therefore its value is being calculated, which it wouldn’t be if no one was subscribed to it. We can call this converter publishUpdates: it is basically ensuring that updates are eagerly computed and broadcasted so that anyone can observe them as any other EventStream.

What does it mean to convert an ObservableValue to a DataStream? Nothing useful or meaningful that I can think of. At best it would be a stream that lets us read the updates, but that’s just publishUpdates followed by buffer.

What does it mean to convert an ObservableValue to a Task? Again, I can’t think of anything useful or meaningful, that wouldn’t be a composition of other converters.

Now let’s move to Task as the source.

Tasks don’t convert to any of the other abstractions in any meaningful way, because they have only a single return value. Even ObservableValues, which fundamentally represent single values, still have an associated stream of updates. Tasks don’t even have this. For this reason, we can’t derive any kind of meaningful stream from a Task, which means there’s also nothing to observe.

The converters are summarized in the following table (the row is the input, the column is the output):

EventStreamDataStreamObservableValueTask
EventStreamN/AbuffercacheLatestnone
DataStreambroadcastN/Anoneread
ObservableValuepublishUpdatesnoneN/Anone
TasknonenonenoneN/A

There are, in total, only five valid converters.

Conclusion

After separating out the abstractions, we find that the humongous zoo of operators attached to Observable is tidied up into more focused groups of transformations on each of the four abstractions, plus (where applicable) ways to convert from one to the other. What this reveals is that places where an Rx-driven app creates deep confusion over “hotness/coldness” and side effects are areas where an Observable really represents one of these four abstractions but it combined with a transformation operation that does not apply to that abstraction. For example, one true event stream (say, of mouse clicks or other user gestures) appended to another one makes no sense. Nor does trying to merge two observable values into a stream based on which one changes first.

In the final part, we’ll revisit the example app from the first part, rewrite it with our new abstractions and escape the clutches of Rx Hell.

On ReactiveX – Part II

In the last part, we explored the bizarre world of extreme observer-dependence that gets created in an ReactiveX (Rx)-driven app, and how that world rapidly descends into hell, especially when it is applied as a blanket solution to every problem.

Is the correct reaction to all of this to say “Screw Rx” and be done with it? Well, not entirely. The part where we try to cram every shaped peg into a square hole, we should absolutely say “to hell” with that. Whenever you see library tutorials say any variant of “everything is an X”, you should back away slowly, clutching whatever instrument of self-defense you carry. The only time that statement is true is if X = thing. Yes, everything is a thing… and that’s not very insightful, is it? The reason “everything is an X” with some more specific X seems profound is because it’s plainly false, and you have to imagine some massive change in perception for it to possibly be true.

Rx’s cult of personality cut its way through the Android world a few years ago, and now most of its victims have sobered up and moved on. In what is a quintessentially Apple move, Apple invented their own, completely proprietary and incompatible version of Rx, called Combine, a couple of years ago, and correspondingly the “everything is a stream” drug is making its rounds through the iOS world. It, too, will come to pass. A large part of what caused RxJava to wane is Kotlin coroutines, and with finally Swift gaining async/await, Combine will subside as well. Why do these “async” language features replace Rx? Because Rx was touted as the blanket solution to concurrency.

Everything is not an event stream, or an observable, period. Some things are. Additionally, the Rx Observable is a concept with far too much attached to it. It is trying to be so many things at once, owing to the fact it’s trying to live up to the “everything is a me” expectation, which will only result in Observable becoming a synonym for Any, except instead of it doing what the most general, highest category should do (namely, nothing), it endeavors instead to do the opposite: everything. It’s a God object in the making. That’s why it ends up everywhere in your code, and gradually erodes all the information a robust type system is supposed to communicate.

But is an event stream, endeavoring only to be an event stream, with higher-order quasi-functional transformations, a useful abstraction? I believe it’s a massively useful one. I still use it for user interfaces, but I reluctantly do so with Rx’s version of one, mostly because it’s the best one available.

The biggest problem with Rx is that its central abstraction is really several different abstractions, all crammed together. After thinking about this for a while, I have identified four distinct concepts that have been merged under the umbrella of the Observable interface. By disentangling these from each other, we can start to rebuild more focused, well-formed libraries that aren’t infected with scope creep.

These are the four abstractions I have identified:

  • Event streams
  • Data streams
  • Observable values
  • Tasks

Let’s talk about each one, what they are (and just as importantly, what they aren’t), and how they are similar to and different from Rx’s Observable.

Event Streams

Let us return from the mind-boggling solipsism of extreme Copenhagen interpretation, where the world around us is brought into being by observing it, and return to classical realism, where objective reality exists independent of observation. Observation simply tells us what is out there. An event stream is literally a stream of events: things that occur in time. Observing is utterly passive. It does not, in any way, change what events occur. It merely signs one up to be notified when they do.

The Rx Observable practically commands the Copenhagen outlook by making the abstract method, to be overridden by the various subclasses returned by operators, subscribe. It is what, exactly, subscribing (a synonym for observing) means that varies with different types of streams. This is where the trouble starts. It sets us up to have subscribe be what controls publish.

A sane approach to an event stream is for the subscribe method to be final. Subscribing is what it is: it just adds a callback to the list of callbacks to be triggered when an event is published. It should not alter what is published. The interesting behavior should occur exclusively in the constructor of a stream.

Let us recall the original purpose of the Observer Pattern. The primary purpose is not really to allow one-to-many communication. That’s a corollary of its main purpose. The main purpose is to decouple the endpoints of communication, specifically to allow one object to send messages to another object without ever knowing about that other object, not even the interfaces it implements.

Well, this is no different than any delegation pattern. I can define a delegate in class A, then have class B implement that delegate, allowing A to communicate with B without knowing about B. So what is it, specifically, about the Observer pattern that loosens the coupling even more than this?

This answer is that the communication is strictly one way. If an A posts an event, and B happens to be listening, B will receive it, but cannot (without going through some other interface that A exposes) send anything back to Anot even a return value. Essentially, all the methods in an observer interface must have void returns. This is what makes one-to-many broadcasting a trivial upgrade to the pattern, and why you typically get it for free. Broadcasting with return values wouldn’t make sense.

The one-way nature of the message flow creates an objective distinction between the publisher (or sender) and the subscriber (or receiver). The intermediary that moves the messages around is the channel, or broker. This is distinct from, say, the Mediator Pattern, where the two ends of communication are symmetric. An important consequence of the asymmetry of observers is that the presence of subscribers cannot directly influence the publisher. In fact, the publisher in your typical Observer pattern implementation can’t even query who is a subscriber, or even how many subscribers there are.

A mediator is like your lawyer talking to the police. An observer is like someone attending a public speech you give, where the “channel” is the air carrying the sound of your voice. What you say through your lawyer depends on what questions the police ask you. But the speech you give doesn’t depend on who’s in the audience. The speaker is therefore decoupled from his audience to a greater degree than you are decoupled from the police questioning you.

By moving the publishing behavior into subscribe, Rx is majorly messing with this concept. It muddles the distinction between publisher/sender and subscriber/receiver, by allowing the subscribe/receive end of the chain to significantly alter what the publisher/sender side does. It’s this stretching of the word “observe” to mean something closer to “discuss” that can cause confusion like “why did that web request get sent five times?”. It’s because what we’re calling “observing a response event” is more like “requesting the response and waiting for it to arrive”, which is a two-way communication.

We should view event streams as a higher abstraction level for the Observer Pattern. An EventStream is just a wrapper around a channel, that encapsulates publishing and defines transformation operators that produce new EventStreams. The publishing behavior of a derived stream is set up at construction of the stream. The subscribe method is final. Its meaning never changes. It simply forwards a subscribe call to the underlying channel.

Event streams are always “hot”. If the events occur, they are published, if not, they aren’t. The transformation operations are eager, not lazy. The transform in map is evaluated on each event as soon as the event is published, independent of subscribers. This expresses the realism of this paradigm: those mapped events happen, period. Subscribing doesn’t make them happen, it just tells us about them. The way we handle whether derived streams continue to publish their derived events is by holding onto the stream. If a derived stream exists, it is creating and publishing derived events. If we want the derived events to stop firing, we don’t throw away subscriptions, we throw away the stream itself.

There’s no problem of duplication here. The subscribing is one-to-many, but the construction of the events, the only place where any kind of side effects can occur, is tied the construction of derived streams, which only happens once. One stream = one instance of each event. The other side of that coin is that missed events are missed, period. If you want any kind of caching behavior, that’s not an event stream. It’s something else.

I think we’ll also find that by separating out the other concepts we’ll get to next, the need to ever create event streams that have any side effects is reduced to essentially zero.

Rx streams have behavior for handling the stream “completing”, and handling exceptions that get thrown during construction of an item to be emitted. I have gone back and forth over whether it makes sense for a strict event stream to have a notion of “completing”. I lean more toward thinking it doesn’t, and that “completion” applies strictly to the next concept we’ll talk about.

What definitely does not make sense for event streams is failures. Event streams themselves can’t “fail”. Events happen or they don’t. If some exception gets thrown by a publisher, it’s a problem for the publisher, that’s either going to be trapped by the publisher, will kill the publisher, or kill the process. Having it propagate to subscribers, and especially having it (by design) terminate the whole stream doesn’t make sense.

Data Streams

The next concept is a data stream. How are “data” streams different from “event” streams? Isn’t an event just some data? Well, an event holds data, but the event is the occurrence itself. With data streams, the items are not things that occur at a specific time. They may become available at a specific time, but that time is otherwise meaningless. The only significance of the arrival time of a datum is that we have to wait for it.

More importantly, in a stream of data, every datum matters. It’s really the order, not the timing, of the items that’s important. It’s critical that someone reading the data stream receive every element in the correct order. If a reader wants to skip some elements, that’s his business. But it wouldn’t make sense for a reader to miss elements and not know it.

We subscribe to an event stream, but we consume a data stream. Subscribing is passive. It has no impact on the events in the stream. Consuming is active. It is what drives the stream forward. The “next” event in a stream is emitted whenever it occurs, independent of who is subscribed. The “next” event of a data stream is emitted when the consumer decides to consume it. In both cases, once an element is emitted, it is never re-emitted.

Put succinctly, an event stream is producer-driven, and a data stream is consumer-driven. An event stream is a push stream, and a data stream is a pull stream.

This means a data stream cannot be one-to-many. An event stream can have arbitrarily many subscribers, only because subscribing is passive; entirely invisible to the publisher. But a data stream cannot have multiple simultaneous consumers. If we passed a data stream to multiple consumers who tried to read at the same time, they would step on each others’ toes. One would consume a datum and cause the other one to miss it.

To clarify, we’re talking about a specific data stream we call an input stream. It produces values that a consumer consumes. The other type of data stream is an output stream, which is a consumer itself, rather than a producer. Output streams are a separate concept not related to Rx Observables, because Observables are suppliers, not consumers (consumers in Rx are called Subscribers).

Most languages already have input and output stream classes, but they aren’t generic. Their element type is always bytes. We can define a generic one like this:

interface InputStream<Element>
{
    bool hasMore();

    Element read();

    UInt skip(UInt count);
}

This time it’s a pure interface. There’s no default behavior. Different types of streams have to define what read means.

Data streams can be transformed in ways similar to event streams. But since the “active” part of a data stream is the reading, it is here that a derived stream will interact with its source stream. This will look more like how Rx Observable implements operators. The read method will be abstract, and each operator, like map and filter, will implement read by calling read on the source stream and applying the transform. In this case, the operators are lazy. The transform is not applied to a datum until a consumer consumes the mapped stream.

The obvious difference between this an Rx Observables is that this is a pull, rather than push, interface. The read method doesn’t take a callback, it returns a result. This is exactly what we want for a stream where the next value is produced by the consumer requesting it. A data stream is inherently a pull paradigm. A push-style interface just obscures this. Typical needs with data streams, for example reading “n” values, then switching to do other stuff and then returning to read some more, become incredibly convoluted with an interface designed for a stream where the producer drives the flow.

A pull interface requires that if the next datum isn’t available yet, the thread must block. This is the horror that causes people to turn everything into callbacks: so they never block threads. The phobia of blocking threads (which is really a phobia of creating your own threads that can be freely blocked without freezing the UI or starving a thread pool) is a topic for another day. For the sake of argument I’ll accept that it’s horrible and we must do everything to avoid it.

The proper solution to the problem of long-running methods with return values that don’t block threads is not callbacks. Callback hell is the price we pay for ever thinking it was, and Rx hell is really a slight variation of callback hell with even worse problems layered on top. The proper solution is coroutines, specifically async/await.

This is, of course, exactly how we’d do it today in .NET, or any other language that has coroutines. If you’re stuck with Java, frankly I think you should just let the thread block, and make sure you do the processing on a thread you created (not the UI thread). That is, after all, exactly how Java’s InputStream works. If you are really insistent on not blocking, use a Future. That allows consuming with a callback, but it at least communicates in some way that you only expect the callback to be called once. That means you get a Future each time you read a chunk of the stream. If that seems ugly/ridiculous to you, then just block the damn thread!

Data streams definitely have a notion of “completing”. Their interface needs to be able to tell a consumer that there’s nothing left to consume. How does it handle errors? Well, since the interface is synchronous, an exception thrown by a transformation will propagate to the consumer. It’s his business to trap it and decide how to proceed. It should only affect that one datum. It should be possible to continue reading after that. If an intermediate derived stream doesn’t deal with an exception thrown by a source stream, it will propagate through until it gets to an outer stream that handles it, or all the way out to the consumer. This is another reason why a synchronous interface is appropriate. It is exactly what try-catch blocks do. Callback interfaces require you to essentially try-catch on every step, even if a step actually doesn’t care about (and cannot handle) an error and simply forwards it. You know you hate all that boilerplate. Is it really worth all of that just to not block a thread?

(If I was told I simply cannot block threads I’d port the project to Kotlin before trying to process data streams with callbacks)

Observable Values

Rx named its central abstraction Observable. This made me think if I create an Observable<String>, it’s just like a regular String, except I can also subscribe to be notified when it changes. But that’s not at all what it is. It’s a stream, and streams aren’t values. They emit values, but they aren’t values themselves. What’s the difference, exactly? Well, if I had what was literally an observable String, I could read it, and get a String. But you can’t “read” an event stream. An event stream doesn’t have a “current value”. It might have a most recently emitted item, but those are, conceptually, completely different.

Unfortunately, in its endeavor toward “everything is me”, Rx provides an implementation of Observable whose exact purpose is to try to cram these two orthogonal concepts together: the BehaviorSubject. It is a literal observable value. It can be read to get its current value. It can be subscribed to, to get notified whenever the value changes. It can be written to, which triggers the subscribers.

But since it implements Observable, I can pass it along to anything that expects an Observable, thereby forgetting that its really a BehaviorSubject. This is where it advertises itself as a stream. You might think: well it is a stream, or rather changes to the value are a stream. And that is true. But that’s not what you’re subscribing to when you subscribe to a BehaviorSubject. Subscribing to changes would mean you don’t get notified until the next time the value gets updated. If it never changes, the subscriber would never get called. But subscribers to a BehaviorSubject always get called immediately with the current value. If all you know if you’ve got an Observable, you’ll have no idea if this will happen or not.

Once you’ve upcast to an Observable, you lose the ability to read the current value. To preserve this, you’ll have to expose it as a BehaviorSubject. The problem then becomes that this exposes both reading and writing. What if you want to only expose reading the current value, but not writing? There’s no way to do this.

The biggest problem is that operators on a BehaviorSubject produce the same Observable types that those operators always do, which again loses the ability to read the current value. You end up with a derived Observable where the subscriber always gets called immediately (unless you drop or filter or do something else to prevent this), so it certainly always has a current value, you just can’t read it. This has forced me to do very stupid stuff like this:

BehaviorSubject<Int> someInt = new BehaviorSubject<Int>(5);

Observable<String> stringifiedInt = someInt
    .map(value -> value.toString());

...

String currentStringifiedInt = null;

Disposable subscription = stringifiedInt
    .subscribe(value ->
    {
        currentStringifiedInt = value;
    });

subscription.dispose();

System.out.print("Current value: " + currentStringifiedInt);
...

This is ugly, verbose, obtuse and unsafe. I have to subscribe just to trigger the callback to produce the current value for me, then immediately close the subscription because I don’t want that callback getting called again. I have to rely on the fact that a BehaviorSubject-derived observable will emit items immediately (synchronously), to ensure currentStringifiedInt gets populated before I use it. If I turn the derived observable back into a BehaviorSubject (which basically subscribes internally and sticks each updated value into the new BehaviorSubject), I can read the current value, but I can write to it myself, thereby breaking the relationship between the derived observable and the source BehaviorSubject.

The fundamental problem is that observable values and event streams aren’t the same thing. We need a separate type for this. Specifically, we need two interfaces: one for read-only observable values, and one for read-write observable values. This is where we’re going to see the type of subscribe-driven lazy evaluation that we see inside of Rx Observables. Derived observables are read-only. Reading them triggers whatever cascade of processing and upstream reading is necessary to produce the value. When we subscribe, that is where it will subscribe to its source observables, inducing them to compute their values when necessary (when those values update) to send them downstream.

Furthermore, the subscribe method on our Observable should explicitly ask whether the subscriber wants to be immediately notified with the current value (by requiring a boolean parameter). Since we have a separate abstraction for observable values, we know there is always a current value, so this question always makes sense.

Since the default is lazy, and therefore expensive and repetitious evaluation, we’ll need an operator specifically to store a derived observable in memory for quick evaluation. Is this comparable to turning a cold (lazy) Rx Observable into a hot (eager) one? No, because the thing you subscribe to with observable values, the changes, are always hot. They happen, and you miss them if you aren’t subscribed. Caching is purely a matter of efficiency, trading computing time for computing space (storage). It has no impact whatsoever on when updates get published.

Caching will affect whether transformations to produce a value are run repeatedly, but only for synchronous reads (multiple subscribers won’t cause repeated calculations). The major difference is that we can eliminate repeated side-effects from double-calculating a value without changing how or when its updates are published. What subscribers see is totally separate from whether an observable value is cached, unlike in Rx where “sharing” an Observable changes what subscribers see (it causes them to miss what they otherwise would have received).

A single Observable represents a single value. Multiple subscribers means multiple people are interested in one value. There’s no issue of “making sure all observers see the same sequence”. If a late subscriber comes in, he’ll either request the current value, whatever it is, or just request to be notified of later changes. The changes are true events (they happen, or they don’t, and if they do they happen at a specific time). We’d never need to duplicate calculations to make multiple subscribers see stale updates.

Furthermore, we communicate more clearly what, if any, “side effects” should be happening inside a transformation. They should be limited to whatever is necessary to calculate the value. If we have a derived value that requires an HTTP request to calculate it, this request will go out either when the source value changes, requiring a re-evaluation, or it will happen when someone tries to read the value… unless we cache it, which ensures the request always goes out as soon as it can. It is multiple synchronous reads that would, for non-cached values, trigger multiple requests, not multiple subscribers. This makes sense. If we’ve specified we don’t want to store the value, we’re saying each time we want to query the value we need to do the work of computing it.

Derived (and therefore read-only) observable values, which can both be subscribed to and read synchronously, is the most important missing piece in Rx. It’s so important I’ve gone through the trouble multiple times to build rudimentary versions of it in some of my apps.

“Completion” obvious makes no sense for observable values. They never stop existing. Errors should probably never happen in transformations. If a runtime exception sneaks through, it’s going to break the observable. It will need to be rethrown every time anyone tries to read the value (and what about subscribing to updates?). The possibility of failure stretches the concept of a value whose changes can be observed past, in my opinion, its range of valid interpretation. You can, of course, define a value that has two variations of success and failure (aka a Result), but the possibility of failure is baked into the value itself, not its observability.

Tasks

The final abstraction is tasks. Tasks are just asynchronous function invocations. They are started, and they do or do not produce a result. This is fundamentally different from any kind of “stream” because tasks only produce one result. They may also fail, in which case they produce one exception. The central focus of tasks is not so much on the value they produce but on the process of producing it. The fact the process is nontrivial and long-running is the only reason you’d pick a task over a regular function to begin with. As such, tasks expose an interface to start, pause/resume and cancel. Tasks are, in this way, state machines.

Unlike any of the other abstractions, tasks really do have distinct steps for starting and finishing. This is what ConnectableObservable is trying to capture with its addition (or rather, separation from subscribe) of connect. The request and the response are always distinct. Furthermore, once a task is started, it can’t be “restarted”. Multiple people waiting on its response doesn’t trigger the work to happen multiple times. The task produces its result once, and stores it as long as it hangs around in case anyone else asks for it.

Since the focus here is on the process, not the result, task composition looks fundamentally different from stream composition. Stream composition, including pipelines, focuses on the events or values flowing through the network. While task composition deals with the results, it does so primarily in its concern with dependency, which is about the one thing task composition is really concerned with: exactly when the various subtasks can be started, relative to when other tasks start or finish. Task composition is concerned with whether tasks can be done in parallel or serially. This is even a concern for tasks that don’t produce results.

Since tasks can fail, they also need to deal with error propagation. An error in a task means an error occurring somewhere in the process of running the task: moving it from start to finish. It’s the finishing that is sabotaged by an error, not the starting. We expect starting a task to always succeed. It’s the finishing that might never happen due to an error. This is represented by an additional state for failed. This is why it is not starting a task that would throw an exception, but waiting on its result. It makes sense that in a composed task, if a subtask fails, the outer task may fail. The outer task either expects and handles the error by trapping it, or it doesn’t, in which case it propagates out and becomes a failure of the outer task.

This propagation outward of errors, through steps that simply ignore those errors (and therefore, ideally, should contain absolutely no boilerplate code for simply passing an error through), is similar to data streams, and it therefore demands a synchronous interface. This is a little more tricky though because tasks are literally concerned with composing asynchrony. Even if we’re totally okay with blocking threads, what if we want subtasks to start simultaneously? Well, that’s what separating starting from waiting on the result lets us do. We only need to block when we need the result. That can be where exceptions are thrown, and they’ll automatically propagate through steps that don’t deal with them, which is exactly what we want. This separates when an exception is thrown from when an exception is (potentially) caught, and therefore requires tasks to cache exceptions just like they do their result.

We can, of course, avoid blocking any threads by using coroutines. That’s exactly what the .NET Tasks do. If you’re in a language that doesn’t have coroutines, I have the same advice I have for data streams: just block the damn threads. You’ll tear your hair out with the handleResult/handleError pyramids of callback doom, where most of your handleError callbacks are just calling the outer handleError to pass errors through.

What’s missing in the Task APIs I’ve seen is functional transformations like what we have on the other abstractions. This is probably because the need is much less. It’s not hard at all to do what it is essentially a map on a Task:

async Task<MappedResult> mapATask()
{
    Task<Result> sourceTask = getSourceTask();
    Function<Result, MappedResult> transform = getTransform();

    return transform(await sourceTask);
}

But still, we can eliminate some of that boilerplate with some nice extension methods:

static async Task<MappedResult> Map<Result, MappedResult>(this Task<Result> ThisTask, Function<Result, MappedResult> transform)
{
    return transform(await ThisTask);
}

...

Task<Result> someTask = getTask();

await someTask
  .map(someTransform)
  .map(someOtherTransform);

Conclusion

By separating out these four somewhat similar but ultimately distinct concepts, we’ll find that the “hot” vs. “cold” distinction is expressed by choosing the right abstraction, and this is exposed to the clients, not hidden in the implementation details. Furthermore, the implication of side effects is easier to understand and address. We make a distinction of how “active” or “passive” different actions are. Observing an event is totally passive, and cannot itself incur side effects. Constructing a derived event stream is not passive, it entails the creation of new events. Consuming a value in a data stream is also not passive. Notice that broadcasting requires passivity. The only one-to-many operations available, once we distinguish the various abstractions, are observing an event stream and observing changes to an observable value. The former alone cannot incur side effects itself, and the latter can only occur side effects when going from no observers to more than none, and thus is independent of the multiplicity of observers. We have, in this way, eliminated the possibility of accidentally duplicating effort in the almost trivial manner that it is possible in Rx.

In the next part, we’ll talk about those transformation operators, and what they look like after separating the abstractions.

On ReactiveX – Part I

If a tree falls in the forest and no one hears it, does it make a sound?

The ReactiveX libraries have finally answered this age-old philosophical dilemma. If no one is listening for the tree falling, not only does it not make a sound, the tree didn’t even fall. In fact, the wind that knocked the tree down didn’t even blow. If no one’s in the forest, then the forest doesn’t exist at all.

Furthermore, if there are three people in the forest listening, there are three separate sounds that get made. Not only that, there are three trees, each one making a sound. And there are three gusts of wind to knock each one down. There are, in fact, three forests.

ReactiveX is the Copenhagen Interpretation on steroids (or, maybe, just taken to its logical conclusion). We don’t just discard counterfactual definiteness, we take it out back and shoot it. What better way to implement Schrodinger’s Cat in your codebase than this:

final class SchrodingersCat : Observable<boolean>
{
    public SchrodingersCat()
    {
        cat = new Cat("Mittens");
    }

    private void subscribeActual(@NonNull Observer<boolean> observer)
    {
        if(!observed)
        {
            observed = true;

            boolean geigerCounterTripped = new Random().nextInt(2) == 0;
            if(geigerCounterTripped)
                new BluntInstrument().murder(cat);
        }

        observer.onNext(cat.alive());
    }

    private final Cat cat;

    boolean observed = false;
}

In this example, I have to go out of my way to prevent multiple observers from creating multiple cats, each with its own fate. Most Observables aren’t like that.

When you first learn about ReactiveX (Rx, as I will refer to it from now on), it’s pretty cool. The concept of transforming event streams, whose values occur over time, as opposed to collections (Arrays, Dictionarys, etc.), whose values occur over space (memory, or some other storage location), the same way that you transform collections (map, filter, zip, reduce, etc.) immediately struck me as extremely powerful. And, to be sure, it is. This began the Rx Honeymoon. The first thing I knew would benefit massively from these abstractions are the thing I had already learned to write reactively, but without the help of explicit abstractions for that purpose: graphical user interfaces.

But, encouraged by the “guides”, I didn’t stop there. “Everything is an event stream”, they said. They showed me the classic example of executing a web request, parsing its result, and attaching it to some view on the UI. It seems like magic. Just define your API service’s call as an Observable, which is just a map of the Observable for an a general HTTP request (if your platform doesn’t provide for you, you can easily write it bridging a callback interface to an event stream). Then just do some more mapping and you have a text label that displays “loading…” until the data is downloaded, then it automatically switches to display the loaded data:

class DataStructure
{
    String title;
    int version;
    String displayInfo;
    ...
}

class AppScreenViewModel
{
    ...

    public final Observable<String> dataLabelText;

    ...

    public AppScreenViewModel()
    {
        ...

        Observable<HTTPResponse> response = _httpClient.request(
             "https://myapi.com/getdatastructure",
             HTTPMethod::get
         );

        Observable<DataStructure> parsedResponse = response
            .map(response -> new JSONParser().parse<DataStructure>(response.body, new DataStructure());

        Observable<String> loadedText = parsedResponse
             .map(dataStructure -> dataStructure.displayInfo);

        Observable<String> loadingText = Observable.just("Loading...);

        dataLabelText = loadingText
             .merge(loadedText);
    }
}

...

class AppScreen : View
{
    private TextView dataLabel;

    ...

    private void bindToViewModel(AppScreenViewModel viewModel)
    {
        subscription = viewModel.dataLabelText
            .subscribe(text -> dataLabel.setText(text));
    }
}

That’s pretty neat. And you wouldn’t actually write it like this. I just did it like this to illustrate what’s going on. It would more likely look something like this:

    public AppScreenViewModel()
    {
        ...

        dataLabelText = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure())
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just("Loading...");

        ...
    }

And, of course, you’d want to move the low-level HTTP client stuff out of the ViewModel. What you get is an elegant expression of a pipeline of retrieval and processing steps, with end of the pipe plugged into your UI. Pretty neat!

But… hold on. I’m confused. I have my UI subscribe (that is, listen) to a piece of data that, through a chain of processing steps, depends on the response to an HTTP request. I can understand why, once the response comes in, the data makes its way to the text label. But where did I request the data? Where did I tell the system to go ahead and issue the HTTP request, so that eventually all of this will get triggered?

The answer is that it happens automatically by subscribing to this pipeline of events. That is also when it happens. The subscription happens in bindToViewModel. The request will be triggered by that method calling subscribe on the observable string, which triggers subscribes to all the other observables because that’s how the Observable returned by operators like map and merge work.

Okay… that makes sense, I guess. But it’s kind of a waste of time to wait until then to send the request out. We’re ready to start downloading the data as soon as the view-model is constructed. Minor issue, I guess, since in this case these two times are probably a fraction of a second apart.

But now let’s say I also want to send that version number to another text label:

class DataStructure
{
    String title;
    int version;
    String displayInfo;
    ...
}

class AppScreenViewModel
{
    ...

    public final Observable<String> dataLabelText;
    public final Observable<String> versionLabelText;

    ...

    public AppScreenViewModel()
    {
        ...

        Observable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure());

        String loadingText = "Loading...";

        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just(loadingText);

        versionLabelText = dataStructure
            .map(dataStructure -> Int(dataStructure.version).toString())
            .merge(Observable.just(loadingText);
    }
}

...

class AppScreen : View
{
    private TextView dataLabel;
    private TextView versionLabel;

    ...

    private void bindToViewModel(AppScreenViewModel viewModel)
    {
        subscriptions.add(viewModel.dataLabelText
            .subscribe(text -> dataLabel.setText(text)));

        subscriptions.add(viewModel.versionLabelText
            .subscribe(text -> versionLabel.setText(text)));
    }
}

I fire up my app, and then notice in my web proxy that the call to my API went out twice. Why did that happen? I didn’t create two of the HTTP request observables. But remember I said that the request gets triggered in subscribe? Well, we can clearly see two subscribes here. They are each to different observables, but both of them are the result of operators that begin with the HTTP request observable. Their subscribe methods call subscribe on the “upstream” observable. Thus, both of the two chains eventually calls subscribe, once each, on the HTTP request observable.

The honeymoon is wearing off.

Obviously this isn’t acceptable. I need to fix it so that only one request gets made. The ReactiveX docs refer to these kinds of observables as cold. They don’t do anything until you subscribe to them, and when you do, they emit the same items for each subscriber. Normally, we might think of “items” as just values. So at worst this just means we’re making copies of our structures. But really, an “item” in this world is any arbitrary code that runs when the value is produced. This is what makes it possible to stuff very nontrivial behavior, like executing an HTTP request, inside an observable. By “producing” the value of the HTTP response, we execute the code that calls the HTTP client. If we produce that value for “n” listeners, we literally have to produce it “n” times, which means we call the service “n” times.

The nontrivial code that happens as part of producing the next value in a stream is what we can call side effects. This is where the hyper-Copenhagen view of reality starts getting complicated (if it wasn’t already). That tree falling sound causes stuff on its own. It chases birds off, and shakes leaves off of branches. Maybe it spooks a deer, causing it to run into a street, which causes a car driving by to swerve into a service pole, knocking it down and cutting off the power to a neighborhood miles away. So now, “listening” to the tree falling sound means being aware of anything that was caused by that sound. Sitting in my living room and having the lights go out now makes me an observer of that sound.

There’s a reason Schrodinger put the cat in a box: to try as best he could to isolate events inside the box from events outside. Real life isn’t so simple. “Optimizing” out the unobserved part of existence requires you to draw a line (or box?) around all the effects of a cause. The Butterfly Effect laughs derisively at the very suggestion.

Not all Observables are like this. Some of them are hot. They emit items completely on their own terms, even if no subscribers are present. By subscribing, you’ll receive the same values at the same time as any other subscribers. If one subscriber subscribes late, they’ll miss any previously emitted items. An example would be an Observable for mouse clicks. Obviously a new subscriber can’t make you click the mouse again, and you can click the mouse before any subscribers show up.

To fix our problem, we need to convert the cold HTTP response observable to a hot one. We want it to emit its value (the HTTP response, which as a side effect will trigger the HTTP request) on its own accord, independent of who subscribes. This will solve both the problem of waiting too long to start the request, and having the request go out twice. To do this, Rx gives us a subclass of Observable called ConnectableObservable. In addition to subscribe, these also have a method connect, which triggers them to start emitting items. I can use the publish operator to turn a cold observable into a connectable hot one. This way, I can start the request immediately, without duplicating it:

        ConnectableObservable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure())
            .publish();

        dataStructure.connect();

        String loadingText = "Loading...";

        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just(loadingText);

        versionLabelText = dataStructure
            .map(dataStructure -> Int(dataStructure.version).toString())
            .merge(Observable.just(loadingText);

Now I fire it up again. Only one request goes out! Yay!! But wait… both of my labels still say “Loading…”. What happened? They never updated.

The response observable is now hot: it emits items on its own. Whatever subscribers are there when that item gets emitted are triggered. Any subscribers that show up later miss earlier items. Well, my dev server running in a VM on my laptop here served up that API response in milliseconds, faster than the time between this code running and the View code subscribing to these observables. By the time they subscribed, the response had already been emitted, and the subscribers miss it.

Okay, back to the Rx books. There’s an operator called replay, which will give us a connectable observable that begins emitting as soon as we call connect, but also caches the items that come in. When anyone subscribes, it first powers through any of those cached items, sending them to the new subscriber in rapid succession, to ensure that every subscriber sees the same sequence of items:

        ConnectableObservable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
            .map(response -> parseResponse<DataStructure>(response, new DataStructure())
            .replay();

        dataStructure.connect();

        String loadingText = "Loading...";

        dataLabelText = dataStructure
            .map(dataStructure -> dataStructure.displayInfo)
            .merge(Observable.just(loadingText);

        versionLabelText = dataStructure
            .map(dataStructure -> Int(dataStructure.version).toString())
            .merge(Observable.just(loadingText);

I fire it up, still see one request go out, but then… I see my labels briefly flash with the loaded text, then go back to “Loading…”. What the fu…

If you think carefully about the last operator, the merge, well if the response comes in before we get there, we’re actually constructing a stream that consists first of the response-derived string, and then the text “Loading…”. So it’s doing what we told it to do. It’s just confusing. The replay operator, as I said, fires off the exact sequence of emitted items, in the order they were originally emitted. That’s what I’m seeing.

But wait… I’m not replaying the merged stream. I’m replaying the upstream event of the HTTP response. Now it’s not even clear to me what that means. I need to think about this… the dataStructure stream is a replay of the underlying stream that makes the request, emits the response, then maps it to the parsed object. That all happens almost instantaneously after I call connect. That one item gets cached, and when anyone subscribes, it loops through and emits the cached items, which is just that one. Then I merge this with a Just stream. What does Just mean, again? Well, that’s a stream that emits just the item given to it whenever you subscribe to it. Each subscriber gets that one item. Okay, and what does merge do? Well, the subscribe method of a merged stream subscribes to both the upstream observables used to build it, so that the subscriber gets triggered by either one’s emitted items. It has to subscribe to both in some order, and I guess it makes sense that it first subscribes to the stream on which merge was called, and then subscribes to the other stream passed in as a parameter.

So what’s happening is by the time I call subscribe on what happens to be a merged stream, it first subscribes to the replay stream, which already has a cached item and therefore immediately emits it to the subscriber. Then it subscribes to the Just stream, which immediately emits the loading text. Hence, I see the loaded text, then the loading text.

If I swapped the operands so that the Just is what I call merge on, and the mapped data structure stream is the parameter, then the order reverses. That’s scary. I didn’t even think to consider that the placement of those two in the call would matter.

Sigh… okay, I need to express that the loading text needs to always come before the loaded text. Instead of using merge, I need to use prepend. That makes sure all the events of the stream I pass in will get emitted before any events from the other stream:

    ConnectableObservable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
        .map(response -> parseResponse<DataStructure>(response, new DataStructure())
        .replay();

    dataStructure.connect();

    String loadingText = "Loading...";

    dataLabelText = dataStructure
        .map(dataStructure -> dataStructure.displayInfo)
        .prepend(Observable.just(loadingText);

    versionLabelText = dataStructure
        .map(dataStructure -> Int(dataStructure.version).toString())
        .prepend(Observable.just(loadingText);

Great, now the labels look right! But wait… I always see “Loading…” briefly flash on the screen. All the trouble I just dealt with derived from my dev server responding before my view gets created. I shouldn’t ever see “Loading…”, because by the time the labels are being drawn, the loaded text is available.

But the above explanation covers this as well. We’ve constructed a stream where every subscriber will get the “Loading…” item first, even if the loaded text comes immediately after. The prepend operator produces a cold stream. It always emits the items in the provided stream before switching to the one we’re prepending to.

The stream is still too cold. I don’t want the subscribers to always see the full sequence of items. If they come in late, I want them to only see the latest ones. But I don’t want the stream to be entirely hot either. That would mean if the subscribers comes in after the loaded text is emitted, they’ll never receive any events. I need to Goldilocks this stream. I want subscribers to only receive the last item emitted, and none before that. I need to move the replay up to the concatenated stream, and I need to specify that the cached items should never exceed a single one:

    Observable<DataStructure> dataStructure = getDataStructureRequest(_httpClient)
        .map(response -> parseResponse<DataStructure>(response, new DataStructure());

    dataStructure.connect();

    String loadingText = "Loading...";

    dataLabelText = dataStructure
        .map(dataStructure -> dataStructure.displayInfo)
        .prepend(Observable.just(loadingText)
        .replay(1);

    dataLabelText.connect();

    versionLabelText = dataStructure
        .map(dataStructure -> Int(dataStructure.version).toString())
        .prepend(Observable.just(loadingText)
        .replay(1);

    versionLabelText.connect();

Okay, there we go, the flashing is gone. Oh shit! Now two requests are going out again. By moving the replay up to after the stream bifurcated, each stream is subscribing and caching its item, so each one is triggering the HTTP response to get “produced”. Uggg… I have to keep that first replay to “share” the response to each derived stream and ensure each one gets it even if it came in before their own connect calls.

This is all the complexity we have to deal with to handle a simple Y-shaped network of streams to drive two labels on a user interface. Can you imagine building an entire even moderately complex app as an intricate network of streams, and have to worry about how “hot” or “cold” each edge in the stream graph is?

Is the honeymoon over yet?

All this highly divergent behavior is hidden behind a single interface called Observable, which intentionally obscures it from the users of the interface. When an object hands you an Observable to use in some way, you have no idea what kind of observable it is. That makes it difficult or impossible to track down or even understand why a system built out of reactive event streams is behaving the way it is.

This is the point where I throw up my hands and say wait wait wait… why am I trying to say an HTTP request is a stream of events? It’s not a stream of any sort. There’s only one response. How is that a “stream”? What could possibly make me think that’s the appropriate abstraction to use here?

Ohhh, I see… it’s asynchronous. Rx isn’t just a library with a powerful transformable event stream abstraction. It’s the “all purpose spray” of concurrency! Any time I ever need to do anything with a callback, which I put in because I don’t want to block a thread for a long-running process, apparently that’s a cue to turn it into an Observable and then have it spread to everything that gets triggered by that callback, and so on and so on. Great. I graduated from callback hell to Rx hell. I’ll have to consult Dante’s map to see if that moved me up a level or down.

In the next part, I’ll talk about whether any of the Rx stuff is worth salvaging, and why things went so off the rails.

REST SchmEST

On almost every project I’ve worked on, everyone has been telling themselves the web services are RESTful services. Most of them don’t really follow RESTful principles, at least not strictly. But the basic ideas of REST are the driving force of how the APIs are written. After working with them for years, and reading into what a truly RESTful interface is, and what the justification is, I am ready to say:

REST has no reason to exist.

It’s perfectly situated in a middle ground no one asked for. It’s too high-level to give raw access to a database to make arbitrary queries using an actual query language like SQL, and it’s too low-level to directly drive application activities without substantial amounts of request forming and response stitching and processing.

It features the worst of both worlds, and I don’t know what the benefit is supposed to be.

Well, let’s look at what the justification for REST is. Before RESTful services, the API world was dominated by “remote procedure call” (RPC) protocols, like XML-RPC, and later Simple Object Access Protocol (SOAP). The older devs I’ve worked with told horror stories about how painful it was to write requests to those APIs, and they welcomed the “simplicity” of REST.

The decision to use REST is basically the decision to not use an RPC protocol. However, if we look at the original paper for REST, it doesn’t mention RPC protocols a single time. It focuses on things like statelessness, uniformity, and cacheability. But since choosing this architectural style for an API is just as much about not choosing the alternative, the discussion began to focus on a SOAP vs. REST comparison.

On the wiki page for SOAP, it briefly mentions one justification in the comparison:

SOAP is less “simple” than the name would suggest. The verbosity of the protocol, slow parsing speed of XML, and lack of a standardized interaction model led to the dominance of services using the HTTP protocol more directly. See, for example, REST.

This point tends to come up a lot. For example, this page repeats the idea that by using the capabilities of HTTP “directly”, REST can get away with defining less itself, which makes it “simpler”. So it seems the idea is that protocols like SOAP add unnecessary complexity to APIs, by reinventing capabilities already contained in the underlying communication protocols. The RPC protocols were written to be application-layer agnostic. As such, they couldn’t take advantage of the concepts already present in HTTP. Once it became clear that all these RPC calls are being delivered over HTTP anyway, they’ve become redundant. We can instead simply use what HTTP gives us out of the box to design APIs.

See, for example, the answers on this StackOverflow question:

SOAP is a protocol on top of HTTP, so it bypasses a lot of HTTP conventions to build new conventions in SOAP, and is in a number of ways redundant with HTTP. HTTP, however, is more than sufficient for retreiving, searching, writing, and deleting information via HTTP, and that’s a lot of what REST is. Because REST is built with HTTP instead of on top of it, it also means that software that wants to integrate with it (such as a web browser) does not need to understand SOAP to do so, just HTTP, which has to be the most widely understood and integrated-with protocol in use at this point.

Bottom line, REST removes many of the most time-consuming and contentious design and implementation decisions from your team’s workflow. It shifts your attention from implementing your service to designing it. And it does so without piling gobbledygook onto the HTTP protocol.

It’s curious that this “SOAP reinvents what HTTP already gives us” argument did not appear in the original REST paper. It’s a bad argument, which leads directly to the no-man’s land between raw database access and high-level application interfaces.

HTTP does already gives us what we need to make resource-based requests to a server; in short, CRUD. They claim that a combination of the HTTP request method (GET, PUSH, DELETE, etc.), path elements, and query parameters, already do for us what the RPC protocols stuff into overcomplicated request bodies.

The problem with this argument is that what HTTP provides and what RPC provides are not the same, either in implementation or in purpose. Those features of HTTP (method, path, and query parameters) expose a much different surface than what RPC calls expose. RPC is designed to invoke code (procedure) on another machine, while HTTP is designed to retrieve or manipulate resources on another machine.

Somewhere along the line, the idea arose that REST tells you how to map database queries and transactions to HTTP calls. For example, a SELECT of a single row by ID from a table maps to a GET request with the table name being a path element, and the ID being the next (and last) path element. An INSERT with column-cell value pairs maps to a POST request, again with the table in the path, and this time no further path elements, and the value pairs as body form data.

This certainly didn’t come from the original paper, which doesn’t mention “database” a single time. It mentions resources. The notion most likely arose because REST was shoehorned into a replacement for RPC, used to build APIs that almost always sits on top of a database. If you’re supposed to “build your API server RESTfully”, and that server is primarily acting as a shell with a database at its kernel, then a reading of “REST principles”, which drives your API toward dumb data access (rather than execution of arbitrary code), will inevitably become a “HTTP -> SQL” dictionary.

The mapping covers basic CRUD operations on a database, but it’s not a full query language. Once you start adding WHERE clauses, simple ones may map to query parameters, but there’s no canonical way to do the mapping, and there’s no way to do more sophisticated stuff like subqueries. There’s not even a canonical way to select specific columns. Then there’s joining. Since neither selecting specific columns nor joining map to any HTTP concept, you’re stuck with an ORM style interface into the database, where you basically fetch entire rows and all their related rows all at once, no matter how much of that data you actually need.

The original paper specifically called out this limitation:

By applying the software engineering principle of generality to the component interface, the overall system architecture is simplified and the visibility of interactions is improved. Implementations are decoupled from the services they provide, which encourages independent evolvability. The trade-off, though, is that a uniform interface degrades efficiency, since information is transferred in a standardized form rather than one which is specific to an application’s needs. The REST interface is designed to be efficient for large-grain hypermedia data transfer, optimizing for the common case of the Web, but resulting in an interface that is not optimal for other forms of architectural interaction.

So, basically, this manifestation of “REST as querying through HTTP” gives us a lot less than what query languages already gave us, and nothing more. I’ll bet if you ask people why they picked REST over just opening SQL connections from the clients, they’ll say something about security, i.e. by supplying an indirect and limited interface to the database, REST allows you to build a sort of firewall in the server code on the way to actually turning the requests into queries. Well, you’re supposed to be able to solve that with user permissions. Either way, it’s nothing about the interface actually being nicer to work with or more powerful.

It’s only barely more abstract. You could probably make a fair argument it’s really not more abstract. REST is just a stunted query language. We can say this is a straw man, since the paper never said to do this. But unless the very suggestion to make application servers RESTful in general is a straw man (making the vast majority of “RESTful” services a misapplication of REST), I don’t see how it could have ended up any differently.

Claiming that this serves as a replacement for RPC fundamentally misunderstands what RPC protocols do. It’s in the name: remote procedure call. It’s a protocol to invoke a function on another machine. The protocol is there to provide a calling convention that works over network wires, just like a C compiler defines a calling convention for functions that works within a single process on a single machine. It defines how to select the function (by name), how to send the parameters, and how to receive the return value.

How much does HTTP help with this? Well, I guess you can put the class name, or function name (or both) into the URL path. But there’s no one way to do this that jumps out to me as obviously correct. The HTTP methods aren’t of much use (functions don’t, generally, correspond to the short list of HTTP methods), and query parameters are quite inappropriate for function parameters, which can be arbitrarily complex objects. Any attempt to take what SOAP does and move “redundant” pieces into HTTP mechanisms isn’t going to accomplish much.

We can, of course, send RPC calls over HTTP. The bulk, if not entirety, of the calling convention goes into the request body. By limiting HTTP to RESTful calls, we’re foregoing the advantage of the very difference between an API and direct querying: that it triggers arbitrary code on the server, not just the very narrowly defined type of work that a database can do. We can raise the abstraction layer and simply invoke methods on models that closely represent application concerns, and execute substantial amounts of business logic on top of the necessary database queries. To demand that APIs be “RESTful” is to demand that they remain mere mappings to dumb data accesses, the only real difference being that we’re robbed of a rich query language.

What you get is clients forced to do all the business logic themselves, and make inefficient coarse fetches of entire rows or hierarchies of data, or even worse a series of fetches, each incurring an expensive round trip to the server, to stitch together a JOIN on their side. When people start noticing this, they’ll start breaking the REST rules. They’ll design an API a client really wants, which is to handle all that business logic on the server, and you end up with a /api/fetchStuffForThisUseCase endpoint that has no correspondence whatsoever to a database query (the path isn’t the name of any table, and the response might be a unique conglomerate and/or mapping of database entities into a different structure).

That’s way better… and it’s exactly what this “REST as HTTP -> SQL” notion tries to forbid you from doing.

The middle of the road is, as usual, the worst place to be. If the client needs to do its own queries, just give it database access, and don’t treat REST as a firewall. It’s very much like the fallacy of treating network address translation as being a firewall. Even if it can serve that role, it’s not designed for it, and there are much better methods designed specifically for security. Handle security with user permissions. If you’re really concerned with people on mobile devices MITM attacking your traffic and seeing your SQL queries, have your clients send those queries over an end-to-end encrypted request.

If you’re a server/database admin and you’re nervous about your client developers writing insane queries, set up a code review process to approve any queries that go into client code.

If the client doesn’t need to do the business logic itself, and especially if you have multiple clients that all need the same business logic, then implement it all on the server and use an RPC protocol to let your clients invoke virtual remote objects. You’re usually not supposed to hand-write RPC calls anyways. That’s almost as a bad as handwriting assembly with a C compiler spec’s calling convention chapter open in your lap. That can all be automated, because the point of RPC protocols is that they map directly to function or method calls in a programming language. You shouldn’t have to write the low-level stuff yourself.

What, then, is the purpose of the various features of HTTP? Well, again it’s in the name: hypertext transfer protocol. HTTP was designed to enable the very first extremely rudimentary websites on the very first extremely rudimentary internet to be built and delivered. It’s designed to let you stick HTML files somewhere on a server, in such a way they can be read back from the same location where you stuck them, to update or delete them later, and to embed links among them in the HTML.

The only reason we’re still using HTTP for APIs is the same reason we’re still using HTML for websites: because that’s the legacy of the internet, and wholesale swapping to new technology is hard. Both of them completely outdated and mostly just get in our way now. Most internet traffic isn’t HTML pages anymore, it’s APIs, and it’s sitting on top of a communication protocol built for the narrow purpose of sticking HTML (to be later downloaded directly, like a file) onto servers. Even TCP is mostly just getting in our way, which is why it’s being replaced by things like QUIC. There’s really no reason to not run APIs directly on the transport protocol.

Rather than RPC, it’s really HTTP that’s redundant in this world.

Even for the traffic that is still HTML, it’s mostly there to bootstrap Javascript, and act as the final delivery format for views. Having HTML blend the concepts of composing view components and decorating text with styling attributes in, I believe, outdated and redundant in the same way.

To argue that APIs need to use such a protocol, to the point of being restricted to that protocol’s capabilities, makes no sense.

The original REST paper never talked about trying to map HTTP calls to database queries or transactions. Instead it focused on resources, which more closely correspond to (but don’t necessarily have to be) the filesystem on the server (hence why they are identified with paths in the URL). It doesn’t even really talk about using query parameters. The word “query” appears a single time, and not in the context of using them to do searches.

The driving idea of REST is more about not needing to do searches at all. Searches are ways to discover resources. But the “RESTful” way to discover resources is to first retrieve a resource you already know about, and that tells you, with a list of hyperlinks, what other resources exist and where to find them. If we were strict, a client using a RESTful API would have to crawl it, following hyperlinks to build up the data needed to drive a certain activity.

The central justifications of REST (statelessness, caching and uniformity) aren’t really ever brought up much in API design discussions… well, caching is, and I’ll get to that. RPC protocols can be as stateless (or stateful) as you want, so REST certainly isn’t required to achieve statelessness. Nor does sticking to a REST-style interface guarantee statelessness. Uniformity isn’t usually a design requirement, and since it comes at the cost of inefficient whole-row (or more) responses, it usually just causes problems.

That leaves caching, the only really valid reason I can see to make an API RESTful. However, the way REST achieves caching is basically an example of its “uniformity”: it sticks to standard HTTP mechanisms, which you get “for free” on almost any platform that implements HTTP, but it comes at the cost of being very restricted. For it to work, you have to express your query as an HTTP GET, with the specific query details encoded as query parameters. As I’ve mentioned, there’s not really a good way to handle complex queries like this.

Beside, what does HTTP caching do? It tells the client to reuse the previous response for a certain time range, and then send a request with an ETag to let the server respond with a 304 and save bandwidth by not resending an identical body. The first part, the cache age, can be easily done an any system. All you need is a client-side cache with a configurable age. The second part… well what does the server do when it gets a request with an ETag? Unless the query is a trivial file or row lookup, it has to generate the response and then compute the ETag and compare it. For example, any kind of search or JOIN is going to require the server to really hit the database to prove whether the response will change.

So, what are you really saving by doing this? Some bandwidth in the response. In most cases, that’s not the bottleneck. If we’re talking about a huge search that returns thousands (or more) of entities, and your customers are mostly using your app on slow cell networks in remote locations, then… sure, saving that response bandwidth is a big deal. But the more typical use case is that response bodies are small, bandwidth is plentiful, but the cloud resources needed to compute the response and prove it’s unchanged are what’s scarce.

You’ll probably do a much better job optimizing the system in this way by making sure you’re only requesting exactly what you need… the very capability that you lose with REST. This even helps with the response body size, which is going to be way bigger if you’re returning entire rows when all you need is a couple column values. Either that, or the opposite, where you basically dump entire blobs of the database onto the client so that it can do its querying locally, and just periodically asks for diffs (this also enables offline mode).

Again, the caching that REST gives us is a kind of no-man’s land middle ground that is suboptimal in all respects. It is, again, appropriate only for the narrow use case of hypertext driven links to resource files in a folder structure on a server that are downloaded “straight” (they aren’t generated or modified as they’re being returned).

The next time I have the authority to design an API, there’s either not going to be one (I’ll grant direct database access to the clients), or it will be a direct implementation of high-level abstract methods that can be mapped directly into an application’s views, and I’ll pick a web framework that automates the RPC aspect to let me build a class on one machine and call its methods from another machine. Either way, I’ll essentially avoid “designing” an API. I’ll either bypass the need and just use a query language, or I’ll write classes in an OOP language, decide where to slice them to run on separate machines, and let a framework write the requisite glue.

If I’m really bold I might try to sell running it all directly on top of TCP or QUIC.

The C++ Resource Management Model (a.k.a. Why I Don’t Want Your Garbage Collector)

My Language of Choice

“What’s your favorite programming language, Dan?”

“Oh, definitely C++”

Am I a masochist? Well, if I am, it’s irrelevant here. Am I just unfamiliar with all those fancy newer “high-level” languages? Nope, I don’t use C++ professionally. On jobs I’m writing Swift, Java, Kotlin, C# or even Ruby and Javascript. C++ is what I write my own apps in.

Am I just insane? Again, if I am, it’s not the reason for my opinion on this matter (at least from my possibly insane perspective).

C++ is an incredibly powerful language. To be fair, it has problems (what Bjarne Stroustrup calls “barnacles”). I consider 3 of them to be major. C++20 fixed 2 of them (the headers problem that makes gratuitous use of templates murder your compile time and forces you to distribute source code, fixed with modules, and the duck typing of templates that makes template error messages unintelligible, fixed with concepts). The remaining one is reflection, which we were supposed to get in C++20, but now it’s been punted to at least C++26.

But overall, I prefer C++ because it is so powerful. Of all the languages I’ve used, I find myself saying the least often in C++ “hmm, I just can’t do what I want to do in this language”. It’s not that I’ve never said that. I just say it less often than I do in other languages.

When this conversation comes up, someone almost always asks me about memory management. It’s not uncommon for people, especially Java/C# guys, to say, “when is C++ going to get a garbage collector?”

C++ had a garbage collector… or, rather, an interface for adding one. It was removed in C++23. Not deprecated, removed. Ripped out in one clean yank.

In my list of problems/limitations of C++, resource management (not memory management, I’ll explain that shortly) is nowhere on the list. C++ absolutely kicks every other language’s ass in this area. There’s another language, D, that follows the design of C++ but frees itself from the shackles of backward compatibility, and is in almost every way far more powerful. Why do I have absolutely no interest in it? Because it has garbage collection. With that one single decision, they ruined what could easily be the best programming language in existence.

I think the problem is a lot of developers who aren’t familiar with C++ assume it’s C with the added ability to stick methods on structs and encapsulate their members. Hence, they think memory management in C++ is the same as in C, and you get stuff like this:

Programmers working in languages without garbage collection (like C and C++) must implement manual memory management in their code.

Even the Wikipedia article for garbage collectors says:

Other languages were designed for use with manual memory management… for example, C and C++

I have a huge C++ codebase, including several generic frameworks, for my own projects. I can count the number of deletes I’ve written on two hands, maybe one.

The Dark Side of Garbage Collection

Before I explain the C++ resource management system, I’m going to explain what’s wrong with garbage collection. Now, “garbage collection” has a few definitions, but I’m talking about the most narrow definition: the “tracer”. It’s the thing Java, C# and D have. Objective-C and Swift don’t have this kind of “garbage collector”, they do reference counting.

I can sum up the problem with garbage collector languages by mentioning a single interface in each of the languages: IDisposable for C#, and Closeable (or Autocloseable) for Java.

The promise garbage collectors give me is that I don’t have to worry about cleaning stuff up anymore. The fact these interfaces exist, and work the way they do, reveals that garbage collectors are dirty liars. We might as well have named the interfaces Deletable, and the method delete.

Then, remember that I told you I can count the number of deletes I’ve written in tens of thousands of lines of C++ on one or two hands. How many of these effective deletes are in a C#/Java codebase?

Even if you don’t use these interfaces, any semantically equivalent “cleanup” call, whether you call it finish, discard, terminate, release, or whatever, counts as a delete. Now tell me, who has fewer of these calls? Java/C# or C++?

C++ wins massively, unless you’re writing C++ code that belongs in the late 90s.

Interestingly, I’ve found most developers assume when I say I don’t like garbage collectors that I’m going to start talking about performance (i.e. tracing is too slow/resource intensive), and it surprises them I say nothing about that and jump straight to these psuedo-delete interfaces. They don’t even know how much better things are in my world.

If you doubt that dispose/close patterns are the worst possible way to deal with resources, allow me to explain how they suffer from all the problems that manual pointers in C suffer from, plus more:

  • You have to clean them up, and it’s invisible and innocuous if you don’t
  • If you forget to clean them up, the explosion is far away (in space and time) from where the mistake was made
  • You have no idea if it’s your responsibility. In C, if you get a pointer from a function, what do you do? Call free when you’re done, or not? Naming conventions? Read docs? What if the answer is different each time you call a function!?
  • Static analysis is impossible, because a pointer that needs to be freed is syntactically indistinguishable from one that shouldn’t be freed
  • You can’t share pointers. Someone has to free it, and therefore be designated the sole true owner.
  • Already freed pointers are still around, land mines ready to be stepped on and blow your leg off.

Replace “pointer” and free with IDisposable/Closeable and Dispose/close respectively, and everything carries over.

The inability to share these types is a real pain. When the need arises, you have to reinvent a special solution. ADO.NET does this with database connections. When you obtain a connection, which is an IDisposable, internally the framework maintains a count of how many connections to are open. Since you can’t properly share an IDisposable, you instead “open” a new connection every time, but behind the scenes it keeps track of the fact an identical connection is already open, and it just hands you a handle to this open connection.

Connection pooling is purported to solve a different problem of opening and closing identical connections in rapid succession, but the need to do this to begin with is born out of the inability to create a single connection and share it. The cost of this is that the system has to guess when you’re really done with the connection:

If MinPoolSize is either not specified in the connection string or is specified as zero, the connections in the pool will be closed after a period of inactivity. However, if the specified MinPoolSize is greater than zero, the connection pool is not destroyed until the AppDomain is unloaded and the process ends.

This is ironic, because the whole point of IDisposable is to recover the deterministic release of scarce resources that is lost by using a GC. By this point, you might as well just hand the database connection to GC, and do the closing in the finalizer… except that’s dangerous (more on this later), and it also loses you any control over release (i.e. you can’t define a “period of inactivity” to be the criterion).

This is just reinvented reference counting, but worse: instead of expressing directly what you’re doing (sharing an expensive object, so that the last user releasing it causes it to be destroyed), you have to hack around the limitation of no sharing and write code that looks like it’s needlessly recreating expensive objects. Each time you need something like this, you have to rebuild it. You can’t write a generic shared resource that implements the reference counting once. People have tried, and it never works (we’ll see why later).

Okay, well hopefully we can restrict our use of these interfaces to just where they’re absolutely needed, right?

IDisposable/Closeable are zombie viruses. When you add one as a member to a class A, it’s not uncommon that the (or at least a) “proper” time to clean up that member is when the A instance is no longer used. So you need to make A an IDisposable/Closeable too. Anything holding an A as a member then likely needs to become an IDisposable/Closeable itself, and on and on. Then you have to write boilerplate, which can usually be generated by your IDE (that’s always a sign of a language defect, that a tool can autogenerate code you need but the compiler can’t), to have your Dispose/close just call Dispose/close on all IDisposable/Closeable members. Except that’s not always correct. Maybe some of those members are just being borrowed. Back to the docs!

Now you’re doing what C++ devs had to do in the 90s: write destructors that do nothing but call delete on all pointer members… except when they shouldn’t.

In fact, IDisposable/Closeable aren’t enough for the common case of hierarchies and member cleanup. A class might also hold handles to “native” objects that need to be cleaned up whenever the instance is destroyed. As I’ll explain in a moment, you can’t safely Dispose/close your member objects in a finalizer, but you can safely clean up native resources (sort of…). So you need two cleanup paths: one that cleans up everything, which is what a call to Dispose/close will do, and one that only does native cleanup, which is what the finalizer will trigger. But then, since the finalizer could get called after someone calls Dispose, you need to make sure you don’t do any of this twice, so you also need to keep track of whether you’ve already done the cleanup.

The result is this monstrosity:

protected virtual void Dispose(bool disposing)
{
    if (_disposed)
    {
        return;
    }

    if (disposing)
    {
        // TODO: dispose managed state (managed objects).
    }

    // TODO: free unmanaged resources (unmanaged objects) and override a finalizer below.
    // TODO: set large fields to null.

    _disposed = true;
}

I mean, come on! The word “Dispose” shows up as an imperative verb, a present participle, and a past participle. It’s a method whose parameter basically means “but sort of not really” (I call these “LOLJK” parameters). Where did I find this demonry? On Microsoft’s docs, as an example of a pattern you should follow, which means you won’t just see this once, but over and over.

Raw C pointers never necessitated anything that ridiculous.

For the love of God keep this out of C++. Keep it as far away as possible.

Now, the real question here isn’t why do we have to go through all this trouble when using IDisposable/Closeable. Those are just interfaces marking a uniform API for utterly manual resource management. We already know manual resource management sucks. The real question is: why can’t the garbage collector handle this? Why don’t we just do our cleanup in finalizers?

Because finalizers are horrible.

They’ve been completely deprecated in Java, and Microsoft is warning people to never write them. The consensus is now that you can’t even safely release native resources there . It’s so easy to get it wrong. Storing multiple native resources in a managed collection? The collection is managed, so you can’t touch it. Did you know finalizers can get called on objects while the scope they’re declared in is still running, which means they can get called mid-execution of one of their methods? And there’s more. Allowing arbitrary code to run during garbage collection can cause all sorts of performance problems or even deadlocks. Take a look at this and this thread.

Is this really “easier” than finding and removing reference cycles?

The prospect of doing cascading cleanup in finalizers fails because of how garbage collectors work. When I’m in a finalizer, I can’t safely assume anything in my instance is still valid except for native/external objects that I know aren’t being touched by the garbage collector. In particular, the basic assumption about a valid object is violated: that its members are valid. They might not be. Finalizers are intrinsically messages sent to half-dead objects.

Why can’t the garbage collector guarantee order? This is, I think, the biggest irony in all of this. The answer is reference cycles. It turns out neglecting to define an ordered topology of your objects causes some real headaches. Garbage collectors just hide this, encourage you to neglect the work of repairing cyclical references, and force you to always deal with the possibility of cycles even when you can prove they don’t exist. If those cyclical references are nothing but bags of bits taken from the memory pool, maybe it will work out okay. Maybe. As soon as you want any kind of well-ordered cleanup logic, you’re hosed.

It doesn’t even make sense to try to apply garbage collectors to non-memory resources like files, sockets, database connections, and so on, especially when you remember some of those resources are owned by entire machines, or even networks, rather than single processes. It turns out that “trigger a sequence to build up a structure, then trigger the exact opposite sequence in reverse order to tear it town” is a highly generic, widely useful paradigm, which we C++ guys call Resource Allocation Is Initialization.

Anything from opening and closing a file, to acquiring and releasing a mutex, to describing and committing an animation, can fall under this paradigm. Any situation you can imagine where there is a balanced “start” and “finish” logic, which is inherently hierarchical: if “starting” X really means to start A, B then C in that order, then “finishing” X will at least include “finishing” C, B then A in that order.

By giving up deterministic “cleanup” of your objects in a language, you’re depriving yourself of this powerful strategy, which goes way beyond the simple case of “deleting X, who was constructed out of A, B and C, means deleting C, B and A”. Deterministically run pairs of hierarchical setup/teardown logic are ubiquitous in software. Memory allocation and freeing is just a narrow example of it.

For this reason, garbage collection definitely is not what you want to have baked into a language, attempting to be the one-size-fits-all resource management strategy. It simply can’t be that, and then since the language-baked resource management is limited in what it can handle, you’re left totally out to dry, reverting to totally manual management, of all other resources. At best, garbage collection is something you can opt into for specific resources. That requires a language capability to tag variables with a specific resource management strategy. Ideally that strategy can be written in the language itself, using its available features, and shipped as a library.

I don’t know any language that could do this, but I know one that comes really close, and does allow “resource management as libraries” for every other management technique beside tracing.

What was my favorite language, again?

The Underlying Problem

I place garbage collection into the same category as ORMs: tools that attempt to hide a problem instead of abstract the problem.

We all agree manual resource management is bad. Why? Because managing resources manually forces us to tell a system how to solve a problem instead of telling it what problem to solve. There’s generally two ways to deal with the tedium of spelling out how. The first is to abstract: understand what exactly you’re telling the system to do, and write a higher level interface to directly express this information that encapsulates the implementation details. The other is to hide: try to completely take over the problem and “automagically” solve it without any guidance at all.

ORMs, especially of the Active Record variety, are an example of the second approach applied to interacting with a database. Instead of relieving you from wrestling with the how of mapping database queries to objects, it promises you can forget that you’re even working with a database. It hides database stuff entirely within classes that “look” and “act” like regular objects. The database interaction is under-the-hood automagic you can’t see, and therefore can’t control.

Garbage collection is the same idea applied to memory management: the memory releases are done totally automagically, and you are promised you can forget a memory management problem even exists.

Of course not really. Beside the fact, as I’ve explained, that it totally can’t manage non-memory reosurces, it also really doesn’t let you forget memory management exists. In my experience with reference counted languages like Swift, the most common source of “leaks” aren’t reference cycles, but simply holding onto references to unneeded stuff for too long. This is especially easy to do if you’re sticking references in a collection, and nothing is ever pruning the collection. That’s not a leak in the strict sense (an object unreachable to the program that can’t be deleted), but it’s a semantic leak with identical consequences. Tracers won’t help you with that.

All of these approaches suffer from the same problems: some percentage, usually fairly high (let’s say 85-90%) of problems are perfectly solved by these automagic engines. The remaining 10-15% are not, and the very nature of automagic systems that hide the problem is that they can’t be controlled or extended (doing so re-exposes the problem they’re attempting to hide). Therefore, nothing can be done to cover that 10-15%, and those problems become exponentially worse than they would have been without a fancy generic engine. You have to hack around the engine to deal with that 10-15%, and the result is more headaches than contending directly with the 85% ever would have caused.

Automagic solutions that hide the problem intrinsically run afoul of the open-closed principle. Any library or tool that violates the open-closed principle will make 85-90% of your problems super easy, and the remaining 10-15% total nightmares.

The absolute worst thing to do with automagic engines is bake them into a language. In one sense, doing so is consistent with the underlying idea: that the automagic solution really is so generic and such a panacea that it really deserves to be an ever-present, and unavoidable all-purpose approach. It also significantly exacerbates the underlying problem: that such “silver bullets” are never actually silver bullets.

I’ve been dunking on garbage collectors, but baking reference counting into a language is the same sort of faulty reasoning: that reference counting is the true silver bullet of resource management. Reference counting at least gives us deterministic release. We don’t have to wrestle with the abominations of IDisposable/Closeable. But the fact you literally can’t create a variable without having to manage an atomic integer is a real problem inside tight loops. As I’ll get into shortly, reference counting is the way to handle shared ownership, but the vast majority of variables in a program arent shared (and the ones that are usually don’t need to be). This causes a proliferation of unnecessary cyclic references and memory leaks.

What is the what, and the how, of resource management? Figuring out exactly what needs to get released, and where, is the how. The what is object lifetimes. In most cases, objects need to stay alive exactly as long as they are accessible to the program. The case of daemons that keep themselves alive can be treated as a separate exception (speaking of which, those are obnoxious in garbage collected languages, you have to stick them into a global variable). For something to be accessible to the program, it needs to be stored in a variable. In object-oriented languages, variables live inside other objects, or inside blocks of code, which are all called recursively by the entry function of a thread.

We can see that the lifetime problem is precisely the problem of defining a directed, non-cyclical graph of ownership. Why can there not be cycles? Not for the narrow reason garbage collectors are designed to address, which is that determining in a very “dumb” manner what is reachable and what is not fails on cycles. Cycles make the order of release undefined. Since teardown logic must occur in the reverse order of setup (at least in general), this makes it impossible to determine what the correct teardown logic is.

The missing abstraction in a language like C is the one that lets us express in our software what this ownership graph is, instead of just imagining it and writing out the implications of it (that this pointer gets freed here, and that one gets freed there).

The Typology of Ownership

We can easily list out the types of ownership relationships that will occur in a program. The simplest one is scope ownership: an object lives, and will only ever live, in a single variable, and therefore its lifetime is equal to the scope of that one variable. The scope of a variable is either the block of code it’s declared in (for “local” variables), or the object it’s a instance member of. The ownership is unique and static: there is one owner, and it doesn’t change.

Both code blocks and objects have cascading ownership, and therefore trigger cascading release. When a block of code ends, that block dies, which causes all objects owned by it to die, which causes all objects owned by those objects to die, and so on. The cascading nature is a consequence of the unique and static nature of the ownership, with the parent-child relationship (i.e. the direction of the graph) clearly defined at the outset.

Slightly more complex than this is when an object’s ownership remains unique at all times (we can guarantee there is only ever one variable that holds an object), but the owner can change, and thereby transfer from one scope to another. Function return values are a basic example. We call this unique ownership. The basic requirement of unique ownership is that only transfers can occur, wherein the original variable must release and no longer be a reference to the object when the transfer occurs.

The next level of complexity is to relax the requirement of uniqueness, by allowing multiple variables to be assigned to the same object. This gives us shared ownership. The basic requirement of shared ownership is that the object becomes unowned, and therefore cleaned up, when the last owner releases it.

That’s it! There’s no more to ownership. The owner either changes or it doesn’t. If it does change, either the number of owners can change or it can’t (all objects start with a single owner, so if it doesn’t change, it stays at 1). There’s no more to say.

However, we have to contend with the limitation of being directed. The graph of variable references is generally not directed. This is why we can’t just make everything shared, and every variable an owner of its assigned object. We get a proliferation of cycles, and that destroys the well-ordered cleanup logic, whether we can trace the graph to find disconnected islands or not.

We need to be able to send messages in both directions. Parents will send messages to their children, but children need to send messages to parents. To do this, a child simply needs a non-owning reference back to its parent. Now, the introduction of non-owning references is what creates this risk of dangling references… a problem guaranteed to not exist if every reference is owning. How can we be sure non-owning references are still valid?

Well, the reason we have to introduce non-owning references is to send messages up the ownership hierarchy, in reverse direction of the graph. When does a child have to worry if its parent is still alive? Well, definitely not in the case of unique ownership. In that case, the fact the child is still alive and able to send messages is already proof the (one, unique) parent is still around. The same applies for more distant ancestors. If an ownership graph is all unique, then a child can safely send a message to a great-grandparent, knowing that there’s no way he could still exist to send messages if any of his unique ancestors were gone.

This is no longer true when objects are shared. A shared object only knows that one of its owners is still alive, so it cannot safely send a message to any particular parent. And thus we have the partner to shared ownership, which is the weak reference: a reference that is non-owning and also can be safely checked before access to see if the object still exists.

This is an important point that does not appear to be well-appreciated: weak references are only necessary in the context of shared ownership. Weak references force the user to contend with the possibility of the object being gone. What should happen then? The most common tactic may be to do nothing, but that’s likely just a case of stuffing problems under the rug (i.e. avoiding crashing when crashing is better than undefined behavior). You have to understand what the correct behavior in both variations (object is still present, and object is absent) when you use weak references.

In summary, we have for ownership:

  1. Scope Ownership
  2. Unique Ownership
  3. Shared Ownership

And for non-owning references:

  1. “Unsafe” references
  2. Weak references

What we want is a language where we can tell it what it needs to know about ownership, and let it figure out from that when to release stuff.

Additionally, we want to be able to control both what “creating” and “releasing” a certain object entails. The cascading of scope-owned members is given, and we shouldn’t have to, nor should we be able to, modify this (to do so breaks the definition of scope ownership). We should also be able to add additional custom logic.

Once our language lets us express who’s an owner of what, everything else should be take care of. We should not have to tell the program when to clean stuff up. That should happen purely as a consequence of an object becoming unowned.

The Proper Solution

Let’s think through how we might try to solve this problem in C. A raw C pointer does not provide any information on ownership. An owned C pointer and a borrowed C pointer are exactly the same. There are two possibilities about ownership: the owner is either known at compile-time (really authorship time, which applies to interpreted languages too), or it’s known only at run-time. A basic example is a function that mallocs a pointer and returns it. The returned pointer is clearly an owning pointer. The caller is responsible for freeing it.

Whenever something is known at authorship time, we express it with the type system. If a function returns an int*, it should instead return a type that indicates it’s an owning pointer. Let’s call it owned_int_ptr:

struct owned_int_ptr
{
    int* ptr;
};

When a function returns an owned_int_ptr, that adds the information that the caller must free it. We can also define an unowned_int_ptr:

struct unowned_int_ptr
{
    int* ptr;
};

This indicates a pointer should not be freed.

For the case where it’s only known at runtime if a pointer is owned, we can define a dynamic_int_ptr:

struct dynamic_int_ptr
{
    int* ptr;
    char owning;
};

(The owning member is really a bool, but C doesn’t have a bool type, so we use a char where 0 means false and everything else means true.)

If we have one of these, we need to check owning to determine if we need to call free or not.

Now, let’s think about the problems with this approach:

  • We’d have to declare these pointer types for every variable type.
  • We have to tediously add a .ptr to every access to the underlying pointer
  • While this tells us whether we need to call free before tossing a variable, we still have to actually do it, and we can easily forget

For the first problem, a C developer would use macros. Macros are black magic, so we’d really like to find a better solution. Ignoring macros, none three of these problems can really be solved in C. We need to add some stuff to the language to make them properly solvable:

  • Templates
  • User-overridable * and -> operators
  • User-defined cleanup code that gets automatically inserted by the compiler whenever a variable goes out of scope

You see where I’m going, don’t you? (Unless you’re that unfamiliar with C++)

With these additions, the C++ solution is:

template<typename T> class auto_ptr
{

public:

  auto_ptr(T* ptr) : _ptr(ptr) 
  {
 
  }

  ~auto_ptr()
  {
      delete _ptr;
  }

  T* operator->() const
  {
      return _ptr;
  }

  T& operator*() const
  {
      return *_ptr;
  }

private:

  T* _ptr;
}

Welcome to C++03 (that’s the C++ released in 2003)!

By returning an auto_ptr, which is an owning pointer, you’ll get a variable that behaves identically to a raw pointer when you dereference or access members via the arrow operator, and that automatically deletes the pointer when the auto_ptr is discarded by the program (when it goes out of scope).

The last part is very significant. There’s something unique to C++ that makes this possible:

C++ has auto memory management!

This is what all those articles that say “C++ only has manual memory management” fail to recognize. C++ does have manual memory management (new and delete), but it also has a type of variable with automatic storage. These are local variables, declared as values (not as pointers), and instance members, also declared as values. This is usually considered equivalent to being stack-allocated, but that’s neither important nor always correct (a heap-allocated object’s members are on the heap, but are automatically deleted when the object is deleted).

The important part is auto variables in C++ are automatically destroyed at the end of the scope in which they are declared. This behavior is inherited from C, which “automatically” cleans up variables at the end of their scope.

But C++ makes a crucial enhancement to this: destructors.

Destructors are user defined code, whatever your heart desires, added to a class A that gets called any time an instance of A is deleted. That includes when an A instance with automatic storage goes out of scope. This means the compiler automatically inserts code when variables go out of scope, and we can control what that code is, as long as we control what types the variables are.

That’s the real garbage collection, and it’s the only garbage collection we actually need. It’s completely deterministic and doesn’t cost a single CPU cycle more than what it takes do the actual releasing, because the instructions are inserted (and can be inlined) at compile-time.

You can’t have destructors in a garbage collected language. Finalizers aren’t destructors, and the pervasive myth that they are (encouraged at least in C# by notating them identically to C++ destructors) has caused endless pain. You can have them in reference counted languages. So far, reference counted languages are on par with C++ (except for performance, nudging those atomic reference counts are expensive). But let’s keep going.

Custom Value Semantics

Why can’t we build our own “shared disposable” as a userland class in C#? Something like this:

class SharedDisposable<T> : IDisposable
{
  private class ControlBlock
  {
    T source;
    AtomicInt count;
  }

  SharedDisposable(IDisposable source)
  {
    _controlBlock = new()
    {
      source = source;
      count = 1;
    }
  }

  SharedDisposable(SharedDisposable other)
  {
    _controlBlock = other._controlBlock;
    _controlBlock.increment();
  }

  T get()
  {
    return _controlBlock.source;
  }

  void Dispose()
  {
    if(_controlBlock.decrementAndGet() == 0)
    {
      _controlBlock.source.Dispose()
    }
  }
}

One problem, of course, is that if the source IDisposable is accessible directly to anyone, they can Dispose it themselves. Sure, but really that problem exists for any “resource manager” class, including smart pointers in C++. The bigger problem is that if I do this:

function StoreSharedDisposable(SharedDisposable<MyClass> Incoming)
{
  this._theSharedThing = Incoming;
}

The thing that’s supposed to happen, namely incrementing the reference count, doesn’t happen. This is just a reference assignment. None of my code gets executed at the =. What I have to do is write this:

function StoreSharedDisposable(SharedDisposable<MyClass> Incoming)
{
  this._theSharedThing = new SharedDisposable(Incoming);
}

Like calling Dispose, this is another thing you’ll easily forget to do. We need to be able to require that assigning one SharedDisposable to another invokes the second constructor.

This is where C++ pulls ahead even of reference counted languages, and where it becomes, AFAIK, truly unique (except direct derivatives like D). A C++ dev will look at that second constructor for SharedDisposable and recognize it as a copy constructor. But it doesn’t have the same effect. Like most “modern” languages, C# has reference semantics, so assigning a variable involves no copying whatsoever. C++ has primarily value semantics, unless you specifically opt out with * or &, and unlike the limited value semantics (structs) in C# and Swift, you have total control over what happens on copy.

(If C#/Swift allowed custom copy constructors for structs, it would render the copy-on-write optimization impossible, and since you can only have value semantics, unless you tediously wrap a struct in a class, losing this optimization would mean a whole lot of unnecessary copying).

Speaking of this, there’s a big, big problem with auto_ptr. You can easily copy it. Then what? You have two auto_ptrs to the same pointer. Well, auto_ptrs are owning. You have two owners, but no sharing logic. The result is double delete. This is so severe a problem it screws up simply forwarding an auto_ptr through a function:

auto_ptr<int> forwardPointerThrough()
{
    auto_ptr<int> result = getThePointer();
    ...
    return result; // Here, the auto_ptr gets copied.  Then result goes out of scope, and its destructor is called, which deletes the underlying pointer.  You've now returned an auto_ptr to an already deleted pointer!
}

Luckily, C++ lets us take full control over what happens when a value is copied. We can even forbid copying:

template<typename T> class auto_ptr
{

public:

  ...

  auto_ptr(const auto_ptr& other) = delete;
  auto_ptr& operator=(const auto_ptr& other) = delete;

  ...
}

We also suppressed copy-assignment, which would be just as bad.

C++ again let’s us define ourselves exactly what happens when we do this:

SomeType t;
SomeType t2 = t; // We can make the compiler insert any code we want here, or forbid us from writing it.

This is the interplay of value semantics and user-defined types that let us take total control of how those semantics are implemented.

That helps us avoid the landmine of creating an auto_ptr from another auto_ptr, which means the underlying ptr now has two conflicting owners. Our attempt to pass an auto_ptr up a level in a return value will now cause a compile error. Okay, that’s good, but… I still want to pass the return value through. How can I do this?

I need some way for an auto_ptr to release its control of its _ptr. Well, let’s back up a bit. There’s a problem with auto_ptr already. What if I create an auto_ptr by assigning it to nullptr?

auto_ptr ohCrap = nullptr;

When this goes out of scope, it calles delete on a nullptr. auto_ptr needs to check for that case:

~auto_ptr()
{
    if(_ptr)
        delete _ptr;
}

With that fixed, it’s fairly obvious what I need to do to get an auto_ptr to not delete its _ptr when it goes out of scope: set _ptr to nullptr:

T* release() const
{
    T* ptr = _ptr;
    _ptr = nullptr;
    return ptr;
}

Then, to transfer ownership from one auto_ptr from another, I can do this:

auto_ptr<int> forwardPointerThrough()
{
    auto_ptr<int> result = getThePointer();
    ...
    return result.release();
}

Okay, I’m able to forward auto_ptrs, because I’m able to transfer ownership from one auto_ptr to another. But it sucks I have to add .release(). Why can’t this be done automatically? If I’m at the end of a function, and I assign one variable to another, why do I need to copy the variable? I don’t want to copy it, I want to move it.

The same problem exists if I call a function to get a return value, then immediately pass it by value to another function, like this:

doSomethingWithAutoPtr(getTheAutoPtr())

What the compiler does (or did) here is assign the result of getTheAutoPtr() to a temporary unnamed variable, then copy it to the incoming parameter into doSomethingWithAutoPtr. Since a copy happens, and we have forbidden copying an auto_ptr, this will not compile. We have to do this:

doSomethingWithAutoPtr(getTheAutoPtr().release())

But why is this necessary? The reason to call release is to make sure that we don’t end up with two usable auto_ptrs to the same object, both claiming to be owners. But the second auto_ptr here is a temporary variable, which is never assigned to a named variable, and is therefore unusable to the program except to be passed into doSomethingWithAutoPtr. Shouldn’t the compiler be able to tell that there’s never really two accessible variables? There’s only one, it’s just being transferred around.

This is really a specific example of a much bigger problem. Imagine instead of an auto_ptr, we’re doing this (passing the result of one function to another function) with some gigantic std::vector, which could be megabytes of memory. We’ll end up creating the std::vector in the function, copying it when we return it (maybe the compiler optimizes this with copy elision), and then copying it again into the other function. If the function it was passed to wants to store it, it needs to copy it again. That’s as many as three copies of this giant object when really there only needs to be one. Just as with the auto_ptr, the std::vector shouldn’t be copied, it should be moved.

This was solved with C++11 (released in 2011) with the introduction of move semantics. With the language now able to distinguish copying from moving, the unique_ptr was born:

template<typename T> class unique_ptr
{

public:

  unique_ptr(T* ptr) : _ptr(ptr) 
  {
 
  }

  unique_ptr(const unique_ptr& other) = delete; // Forbid copy construction
  
  unique_ptr& operator=(const unique_ptr& other) = delete; // Forbid copy assignment

  unique_ptr(unique_ptr&& other) : _ptr(other._ptr) // Move construction
  {
    other._ptr = nullptr;
  }

  unique_ptr& operator=(unique_ptr&& other) // Move assignment
  {
    _ptr = other._ptr;
    other._ptr = nullptr;
  }

  ~unique_ptr()
  {
    if(_ptr)
      delete _ptr;
  }

  T* operator->() const
  {
    return _ptr;
  }

  T& operator*() const
  {
    return *_ptr;
  }

  T* release()
  {
    T* ptr = _ptr;
    _ptr = nullptr;
    return ptr;
  }

private:

  T* _ptr;
}

Using unique_ptr, we no longer need to call release when simply passing it around. We can forward a return value, or pass a returned unique_ptr by value (or rvalue reference) from one function to another, and ownership is transferred automatically via our move constructor.

(We still define release in case we need to manually take over the underlying pointer).

We had to exercise all the capabilities of C++ related to value types, including ones that even reference counted languages don’t have, to build unique_ptr. There’s no way I could build a UniqueReference in Swift, because I can’t control, much less suppress, what happens when one variable is assigned to another. Since I can’t define unique ownership, everything is shared in a reference counted language, and I have to be way more careful about using unsafe references. What most devs do, of course, is make every unsafe reference a weak reference, which forces it to be optional, and make you contend with situations that may never arise and for which no proper action is defined.

C++ comes with scope ownership and unsafe references out of the box, and with unique_ptr we’ve added unique ownership as a library class. To complete the typology, we just add a shared_ptr and the corresponding weak_ptr, and we’re done. Building a correct shared_ptr similarly exercises the capability of custom copy constructors: we don’t suppress copying like we do on a unique_ptr, we define it to increment the reference count. Unlike the C# example, that changes the meaning of thisSharedPtr = thatSharedPtr, instead of requiring us to call something extra.

And with that, the typology is complete. We are able to express every type of ownership by selecting the right type for variables. With that, we have told the system what it needs to know to trigger teardown logic properly.

The vast majority of cleanup logic is cascading. For this reason, not only do we essentially never have to write delete (the only deletes are inside the smart pointer destructors), we also very rarely have to write destructors. We don’t, for example, have to write a destructor that simply deletes the members of a class. We just make those members values, or smart pointers, and the compiler ensures (and won’t let us stop) this cascading cleanup happens.

The only time we need to write a destructor is to tell the compiler how to do the cleanup of some non-memory resource. For example, we can define a database connection class to adapts a C database library to C++:

class DatabaseConnection
{

public:

  DatabaseConnection(std::string connectionString) :
    _handle(createDbConnection(connectionString.c_string())
  {

  }

  DatabaseConnection(const DatabaseConnection& other) = delete;

  ~DatabaseConnection()
  {
    closeDbConnection(_handle);
  }

private:

  std::unique_ptr<DbConnectionHandle> _handle;
}

Then, in any class A that holds a database connection, we simply make the DatabaseConnection a member variable. Its destructor will get called automatically when the A gets destroyed.

We can use RAII to do things like declare a critical section locked by a mutex. First we write a class that represents a mutex acquisition as a C++ class:

class AcquireMutex
{

public:

  AcquireMutex(Mutex& mutex) :
    _mutex(mutex)
  {
    _mutex.lock();
  }

  ~AcquireMutex()
  {
    _mutex.unlock();
  }

private:
  
  Mutex& _mutex;
}

Then to use it:

void doTheStuff()
{
  doSomeStuffThatIsntCritical();

  // Critical section
  {
    AcquireMutex acquire(_theMutex);

    doTheCriticalStuff();
  }

  doSomeMoreStuffThatIsntCritical();
}

The mutex is locked at the beginning of the scope by the constructor of AcquireMutex, and automatically unlocked at the end by the destructor of AcquireMutex. This is really useful, because it’s exception safe. If doTheCriticalStuff() throws an exception, the mutex still needs to be unlocked. Manually writing unlock after doTheCriticalStuff() will result in it never getting unlocked if doTheCriticalStuff() throws. But since C++ guarantees that when an exception is thrown and caught, all scopes between the throw and catch are properly unwound, with all local variables being properly cleaned up (including their destructors getting called… this is why throwing exceptions in destructors is a big no-no), doing the unlock in a destructor behaves correctly even in this case.

This whole paradigm is totally unavailable in garbage collected languages, because they don’t have destructors. You can do this in reference counted languages, but at the cost of making everything shared, which is much harder to reason correctly about than unique ownership, and the vast majority of objects are really uniquely owned. In C# this code would have to be written like this:

void DoTheStuff()
{
  DoSomeStuffThatIsntCritical();

  _theMutex.Lock();

  try
  {
    doTheCriticalStuff();
  }
  catch(_)
  {
    throw;
  }
  finally
  {
    _theMutex.Unlock()
  }

  doSomeMoreStuffThatIsntCritical();
}

Microsoft’s docs on try-catch-finally show a finally block being used for precisely the purpose of ensuring a resource is properly cleaned up.

In fact, this isn’t fully safe, because finally might not get called. To be absolutely sure the mutex is unlocked we’d have to do this:

void DoTheStuff()
{
  DoSomeStuffThatIsntCritical();

  _theMutex.Lock();

  Exception? error = null;

  try
  {
    doTheCriticalStuff();
  }
  catch(Exception e)
  {
    error = e;
  }
  finally
  {
    _theMutex.Unlock()
  }

  if(error)
    throw error;  

  doSomeMoreStuffThatIsntCritical();
}

Gross.

C# and Java created using/try-with-resources to mitigate this problem:

void DoTheStuff()
{
  DoSomeStuffThatIsntCritical();

  _theMutex.Lock();

  Exception? error = null;

  using(var acquire = new AcquireMutex(_mutex))
  {
    doTheCriticalStuff();
  }

  doSomeMoreStuffThatIsntCritical();
}

That solves the problem for relatively simple cases like this where a resource doesn’t cross scopes. But if you want to do something like open a file, call some methods that might throw, then pass the file stream to a method that will hold onto it for some amount of time (maybe kicking off an async task), using won’t help you because that assumes the file needs to get closed locally.

Adding using/try-with-resources was a great decision, and it’s not the garbage collector and receives no assistance from the garbage collector at all. They are special language features with new keywords. They never could have been added as library features. And they only simulate scope ownership, not unique or shared ownership. Adding them is an admission that the garbage collector isn’t the panacea it promised to be.

Tracing?

The basic idea here is not to bake a specific resource management strategy into the language, but to allow the coder to opt each variable into a specific resource management strategy. We’ve seen that C++ gives us the tools necessary to add reference counting, a strategy sometimes baked directly into languages, as an opt-in library feature. That begs the question: could we do this for tracing as well? Could we build some sort of traced_ptr<T> that will be deleted by a separate process running in separate thread, that works by tracing the graph of references and determines what is still accessible?

C++ is still missing the crucial capability we need to implement this. The tracer needs to be able to tell what things inside an object are pointers that need to be traced. In order to do that, it needs do be able to collect information about a type, namely what its members are, and their layout, so it can figure out which members are pointers. Well, that’s reflection. Once we get it, it will be part of the template system, and we could actually write a tracer where much of the logic that normally would happen at runtime would be worked out at compile time. The trace_references<T>(traced_ptr<T>& ptr) function would be largely generated at compile time for any T for which a traced_ptr<T> is used somewhere in our program. The runtime logic would not have to work out where the pointers to trace are, it would just have to actually trace them.

Once we get reflection, we can write a traced_ptr<T> class that knows whether or not T has any traced_ptr type members. The destructor of traced_ptr itself will do nothing. The tracer will periodically follow any such members, repeat the step for each of those, and voila: you get opt-in tracing. This is interesting because it greatly mitigates the problem that baked in tracing has, which is the total uncertainty about the state of an object during destruction. What can you do in the destructor for your class if you have traced_ptrs to it? Well, you can be sure everything except the traced_ptr members are still valid. You just can’t touch the traced_ptr members.

Since it is now your responsibility, and decision, to work out which members of a class will be owned by the tracer, you can draw whatever dividing line you want between deterministic and nondeterministic release. A class that holds both a file handle and other complex objects might decide that the complex objects will be traced_ptrs, but the file handle will be a unique_ptr. That way we don’t have to write a destructor at all, and the destructor the compiler writes for us will delete the file handle, and not touch the complex objects.

There may be problems with delegating only part of your allocations to a tracer. The other key part of a tracer is it keeps track of available memory. To make this work you’d probably also need to provide overrides of operator new and operator delete. But you may also be okay with relaxing the promises of traced references: instead of the tracer doing nothing until it “needs to” (when memory reaches a critical usage threshold), it just runs continuously in the background, giving you assurance you can build some temporary webs of objects that you know aren’t anywhere close to your full memory allotment, and be sure they’ll all be swept away soon after you’re done with them.

While this is a neat idea, I would consider it an even lazier approach to the ownership problem than a proliferation of shared and weak references. This is also more or less avoiding the ownership problem altogether. It may be a neat tool to have in our toolbelts, but I’d probably want a linter to warn on every usage just like with weak_ptrs, to make us think carefully about whether we can be bothered to work out actual ownership.

Conclusion

I have gone over all the options in C++ for deciding how a variable is owned. They are:

  • Scope ownership: auto variables with value semantics
  • Unique ownership: unique_ptr
  • Shared ownership: shared_ptr
  • Manual ownership: new and delete

Then there are two options for how to borrow a variable:

  • Raw (unowned) pointers/references
  • Weak references: weak_ptr

This list is in deliberate order. You should prefer the first option, and only go to the next option if that can’t work, and so on. These options really cover all the possibilities of ownership. Neither garbage collected nor reference counted languages give you all the options. Really, the first two are the most important. Resource management is far simpler when you can make the vast majority of cases scope or unique ownership. Unique ownership (that can cross scope boundaries) is, no pun intended, unique to C++.

For this reason, I have far fewer resource leaks in C++ than in other languages, far less boilerplate code to write, and the vast majority of dangling references I’ve encountered were caused by inappropriate use of shared ownership, due to me coming from reference counted languages and thinking in terms of sharing everything. Almost all my desires to use weak references were me being lazy and not fixing cyclical references (it’s not some subtle realization after profiling that they exist, it’s obvious when I write the code they are cyclical).

I wouldn’t add a garbage collector to that code if you paid me.

Do You Really Need Dynamic Dispatch?

Introduction

The “usual” polymorphism (a base class, plus multiple subclasses, with the base class’s methods being dynamically dispatched) is the standard way to handle variation in object-oriented code.  But it’s not the only choice, and in many cases it offers too much dynamism.  The point of dynamic dispatch is to have the code select a branch based on runtime conditions.  It is designed to allow the same code to trigger a different pathway each time it is run.  If you don’t actually need this type of late (runtime) binding, then you have other options for dispatching earlier than runtime.  The advantage of using early binding when it is appropriate is that the program correctly expresses constness: if something is the same on every execution, this should be reflected in the code.  Doing a runtime check in such a situation is needlessly pushing a potential bug out from compile/static analysis time to runtime.

A good example is situations where you have multiple implementations of some library available.  Maybe one implementation is available on one platform and another is available on another platform.  Or, one is a different manufacturer’s implementation and you’re experimenting with adopting it but not yet sure it’s ready for adoption.  In these situations, you have variation: your code should either call one implementation or another.  But is this variation really happening at runtime?  Or, is it happening at build time?  Are you choosing, when you build your program, which implementation is going to be used, for every execution?

The Dynamic Dispatch Solution

The obvious, or I might say naive, way to handle this is with typical polymorphism.  First I’ll create a package LibraryInterface that just contains the interface:

public interface LibraryFacade
{
	void doSomething();
	void doSomethingElse(); 
}

Then I’ll make two packages.  OldLibrary implements the interface with the implementation provided by the manufacturer CoolLibrariesInc.  NewLibrary implements it with one provided by the manufacturer SweetLibrariesInc.  Since these both reference the interface, both packages need to import LibraryInterface: 

import LibraryInterface;

class OldLibraryImplementation : LibraryFacade
{
	override void doSomething()
	{
		// Call CoolLibrariesInc stuff
	}

	override void doSomethingElse()
	{
		// Call CoolLibrariesInc stuff
	}

	private CoolLibrariesClass UnderlyingObject;
}
import LibraryInterface;

class NewLibraryImplementation : LibraryFacade
{
	override void doSomething()
	{
		// Call SweetLibrariesInc library stuff
	}

	override void doSomethingElse()
	{
		// Call SweetLibrariesInc library stuff
	}

	SweetLibrariesClass UnderlyingObject;
}

Then, somewhere, we have a factory method that gives us the right implementation class.  Naively we would put that in LibraryInterface:

public static class LibraryFactory
{
	public static LibraryFacade library();
}

But we would have to make it abstract.  We can’t do that because it’s static (it wouldn’t make sense anyways).  If we try to compile LibraryInterface with this, it will rightly complain that the factory method is missing its body.  The most common way to deal with this I’ve seen (and done myself many times) is to move the factory method into the application that uses this library.  Then I implement it according to which library I want to use:

import OldLibrary;
import NewLibrary;

public static class LibraryFactory
{
	// We’re using the new library right now
	public static LibraryFacade library()
	{
		return new NewLibraryImplementation();
		// return new OldLibraryImplementation();
	}
}

Then in my application I use the library by calling the factory to get the implementation and call the methods via the interface:

LibraryFacade library = LibraryFactory.library();
library.doSomething();

If I want to switch back to the old library, I don’t need to change any of the application code except for the factory method.  Cool!

This is already a bit awkward.  First of all, in order for the factory file to compile, we have to link in both the old and new libraries because of the import statements.  Importing both doesn’t break anything.  In fact, we have all the mechanisms needed to use both implementations in different parts of the app, or switch from one to the other mid-execution.  Do we really need this capability?  If the intention is that we pick one or the other at build time and stick with it, then wouldn’t it be ideal for the choice to be made simply by linking one library in, and better yet signal a build error if we try to link in both?

When It Starts to Smell

Let’s say our library interface is a little more sophisticated.  Instead of just one class, we have two, and they interact with each other:

public interface LibraryStore
{
	LibraryEntity FetchEntity(string Id);
	…
	void ProcessEntity(LibraryEntity Entity);
}

public interface LibraryEntity
{
	string Id { get; }
}

Now, we have the old library’s implementation of these entities, and presumably each one is wrapping some CoolLibrariesInc class:

internal class OldLibraryStore: LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		CoolLibraryEntity UnderlyingEntity = UnderlyingStore.RetrieveEntityForId(Id);
		return new OldLibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		if(Entity is not OldLibraryEntity OldEntity)
			throw new Exception(“What are you doing!?  You can’t mix library implementations!”);

		CoolLibraryEntity UnderlyingEntity = OldEntity.UnderlyingEntity;

		UnderlyingStore.DoWorkOnEntity(UnderlyingEntity);
	}

	…

	private CoolLibraryStore UnderlyingStore;
}

internal class OldLibraryEntity: LibraryEntity
{
	string Id => UnderlyingEntity.GetId();

	…

	internal CoolLibraryEntity UnderlyingEntity;
}

Notice what we have to do in ProcessEntity.  We have to take the abstract LibraryEntity passed in and downcast it to the one with the “matching” implementation.  What do we do if the downcast fails?  Well, the only sensible thing is to throw an exception.  We certainly can’t proceed with the method otherwise, and it certainly indicates a programming error.

Now, if you know me, you know the first thing I think when I see an exception (other than “thank God it’s actually failing instead of sweeping a problem under the rug”) is “does this need to be a runtime failure, or can I make it a compile time failure?”  The error is mixing implementations.  Is this ever okay?  No, it isn’t.  That’s a programming error every time it happens, period.  That means it’s an error as soon as I write the code that takes one implementation’s entity and passes it to another implementation’s store.  I need to strengthen my type system to prevent me from doing this.  Let’s make the Store interface parameterized by the Entity type it can work with:

public interface LibraryStore<Entity: LibraryEntity>
{
	Entity fetchEntity(string id);
	…
	void processEntity(Entity entity);
}

Then the old implementation is:

public class OldLibraryStore: LibraryStore<OldLibraryEntity>
{
	OldLibraryEntity FetchEntity(string Id)
	{
		CoolLibraryEntity UnderlyingEntity = UnderlyingStore.RetrieveEntityForId(Id);
		return new OldLibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(OldLibraryEntity Entity)
	{
		CoolLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.DoWorkOnEntity(UnderlyingEntity);
	}

	…

	private CoolLibraryStore UnderlyingStore;
}

I need to make the implementing classes public now as well.

Awesome, we beefed that exception up into a type error.  If I made that mistake somewhere, I’ll now get yelled at for trying to send a mere LibraryEntity (or even worse a OldLibraryEntity) to the NewLibraryStore.  I’ll probably need to adjust my client code to keep track of the fact it’s now getting an NewLibraryEntity back from FetchEntity instead of just a LibraryEntity.  Well… actually, I’ll first get yelled at for declaring a LibraryStore without specifying the entity type.  First I have to change LibraryStore to LibraryStore<NewLibraryEntity> .  And if I do that, I might as well just change them to NewLibraryStore.

Okay, but my original intention was to decouple my client code from which library implementation I have chosen.  I just destroyed that.  I have now forced the users of LibraryStore to care about which LibraryEntity they’re talking to, and by extension which LibraryStore.

Remember, we have designed a system that allows me to choose different implementations throughout my application code.  I can’t mix them, but I can switch between them.  If that’s what I need, then it is normal and expected that I force the application code to explicitly decide, at each use of the library, which one it’s going to use.  Well, that sucks, at least if my goal was to make selecting one (for the entire application) a one-line change.  This added “complexity” of the design correctly expresses the complexity of the problem.

The problem with this problem is it isn’t actually my problem.

(I’m really proud of that sentence)

I don’t need to be able to pick one implementation sometimes and another other times.  I don’t need all of this complexity!  What I need is a build-time selection of an implementation for my entire application.  That precludes the possibility of accidentally mixing implementations.  That simply can’t happen if the decision is made client-wide at build time.  My design needs to reflect that.

An Alternative Approach

The fundamental source of frustration is that I chose the wrong tool for this job.  Dynamic dispatch?  I don’t need or want the dispatch to be dynamic.  That means doing the implementation “selection” as implementations of an interface is the wrong choice. I don’t want the decision of where to bind a library call to be made at the moment that call is executed. I want it to be bound at the moment my application is built.

Let’s try something else.  I’m going to get rid of the LibraryInterface package altogether.  Then OldLibrary will contain these classes:

public class LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		CoolLibraryEntity UnderlyingEntity = UnderlyingStore.RetrieveEntityForId(Id);
		return new LibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		CoolLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.DoWorkOnEntity(UnderlyingEntity);
	}

	…

	private CoolLibraryStore UnderlyingStore;
}

public class LibraryEntity
{
	string Id => UnderlyingEntity.GetId();

	…

	internal CoolLibraryEntity UnderlyingEntity;
}

NewLibrary will contain these classes:

public class LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		SweetLibraryEntity UnderlyingEntity = UnderlyingStore.ReadEntity(Id);
		return new LibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		SweetLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.ApplyProcessingToEntity(UnderlyingEntity);
	}

	…

	private SweetLibraryStore UnderlyingStore;
}

public class LibraryEntity
{
	string Id => UnderlyingEntity.UniqueId;

	…

	internal SweetLibraryEntity UnderlyingEntity;
}

I’ve totally gotten rid of the class hierarchies.  There are simply two versions of each class, with each equivalent one named exactly the same.  That means they’ll clash if they are both included in a program.  That’s perfect, because if we accidentally link both libraries, we’ll get duplicate definition errors.

It took me a long time to realize this is even an option.  This is a “weird” way to solve the problem.  For one, the actual definition of the “interface” doesn’t exist anywhere.  It is, in a way, duplicated and stuffed into each package.  And, to be sure, there is an interface.  An “interface” just means any standard for how code is called.  In OO code, it means the public methods and members of classes.  If we want to be able to switch implementations of a library without changing client code, a well-defined “interface” for the libraries is precisely what we need.  So it’s natural we might think, “well if I need an interface, then I’m going to create what the language calls an interface!”  But “interface” in languages like C# and Java (and the equivalent “Protocol” in ObjC/Swift) doesn’t just mean “a definition for how to talk to something”.  It also signs you up for a specific way of binding calls.  It signs you up for late binding, or dynamic dispatch.

Defining a Build-Time Interface

The fact there is no explicit definition of the library interface isn’t just inconvenient.  We lose any kind of validation that we’ve correctly implemented the interface.  We could make a typo in one of the methods and it will compile just file, and it won’t be until we switch to that implementation and try to compile our app.  And the error we get won’t really tell us what happened.  It will tell us we’re trying to call a method that doesn’t exist, but it should really tell us the library class is supposed to have this method but it doesn’t.

This is the general problem with “duck typing“. If I pass a wrench to a method expecting a duck, or duck-like entity, instead of getting a sensible error like “you can’t pass this object here because it isn’t duck-like”, you get a wrench complaining that it doesn’t know how to quack. You have to reason backward to determine why a wrench was asked to quack in the first place. Also, this error is fundamentally undetectable until the quack message is sent, which means runtime, even though the error exists from the moment I passed a wrench to a method that declares it needs something duck-like. The point of duck typing is flexibility. But flexibility is equivalent to late error detection. That’s why too much flexibility isn’t helpful.

One way to solve this is to “define” the interface with tests.  What we’re testing is the type system, which means we don’t need to actually run anything.  We just need to compile something.  So we can write a test that simply tries to call each expected method with the expected parameters:

// There’s no [Test] annotation here!
void testLibraryStore()
{
    var Entity = (LibraryEntity)null!;

    string TestId =  Entity.Id;
} 

// There’s no [Test] annotation here!
void testLibraryStore()
{
    var Store = (LibraryStore)null!;

    Entity TestFetchEntity = Store.FetchEntity((string)null!);
    ProcessEntity((Entity)null!);
} 

You might think I’m crazy for typing that out, but really it isn’t that ridiculous, and it does exactly what we want (it also indicates I’m truly a C++ dev at heart).  I’ve highlighted that there are no [Test] annotations, which means this code won’t run when we hit the Test button.  That’s good, because it will crash.  We don’t want it to run.  We just want it to compile.  If that compiles, it proves our classes fulfill the intended interfaces.  If it doesn’t compile, then we’re missing something.

(If you aren’t already interpreting compilation errors as failed tests, it’s never too late to start)

What would be nice is if languages could give us a way to define an interface that isn’t dynamically dispatched.  What if I could do this:

interface ILibraryStore<Entity: ILibraryEntity>
{
	Entity FetchEntity(string Id);
	void ProcessEntity(Entity Entity);
}

interface ILibraryEntity
{
	string Id { get; }
}

I could put these in a separate LibraryInterface package.  Then in one library package:

public class LibraryStore: static ILibraryStore<LibraryEntity>
{
	LibraryEntity FetchEntity(string Id)
	{
		SweetLibraryEntity UnderlyingEntity = UnderlyingStore.ReadEntity(Id);
		return new LibraryEntity(UnderlyingEntity);
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		SweetLibraryEntity UnderlyingEntity = Entity.UnderlyingEntity;

		UnderlyingStore.ApplyProcessingToEntity(UnderlyingEntity);
	}

	…

	private SweetLibraryStore UnderlyingStore;
}

public class LibraryEntity: static ILibraryEntity
{
	string Id => UnderlyingEntity.UniqueId;

	…

	internal SweetLibraryEntity UnderlyingEntity;
}

By statically implementing the interface, I’m only creating a compile-time “contract” that I need to fulfill.  If I forget one of the methods, or misname one, I’ll get a compiler error saying I didn’t implement the interface.  That’s it.  But then if I have a method like this:

void DoStuff(ILibraryEntity Entity)
{
	string Id = Entity.Id; // What does the compiler do here?
}

I can’t pass a LibraryEntity in.  I can only bind a variable of type ILibraryEntity to an instance of a class if that class non-statically implements ILibraryEntity, because such binding (up-casting) is a form of type erasure: I’m telling the compiler to forget exactly which subtype the variable is, to allow code to work with any subtype.  For that to work, the methods on that type have to be dispatched dynamically.  The decision by language designers to equate “interfaces” with dynamic dispatch was quite reasonable!

That means in my client code I still have to declare things as LibraryStore and LibraryEntity.  In order to get the “build-time selection” I want, I still have to name the classes identically.  That is a signal to the compiler both that they cannot coexist in a linked product, and that they get automatically selected by choosing one to link in.  Then, there’s the problem with importing.  Since the packages are named differently, I’d have to change the import on every file that uses the library (until C# 10!).  Same with Java.  In fact, it’s a bit worse in Java.  Java equivocates namespaces with packages, so if the packages are named differently the classes must also be named differently, and they’ll coexist just fine in the application.  In either case, you can name the packages identically.  Then your build system will really throw a fit if you try to bring both of them in.

Is the notion of a “static” interface a pipe dream?  Not at all.  C++20 introduced essentially this very thing and called them conceptsCreating these compile-time only contracts is a much bigger deal for C++ than for other languages, because of templates.  In a language like C#, if I want to define a generic function that prints an array, I need an interface for stringifying something:

interface StringDescribable
{
	string Description { get; }
}

string DescribeArray<T: StringDescribable>(T[] Array)
{
	var Descriptions = Array
		.Select(StringDescribable.Description);

	return $”[{string.Joined(“, “, Descriptions)}]”;
}

This requires the Description method to be dynamically dispatched because of how generics work.  This code is compiled once, so the same code is executed each time I call this function, even with different generic types.  It therefore needs to dynamically dispatch the Description call to ensure it calls T’s implementation of it.  Fundamentally, a different Description implementation can get called each time this function is executed.  It’s a runtime variation, so it has to be late-bound.

The equivalent code with templates in C++ looks like this:

template<typename T> std::string describeArray(T[] array, size_t count)
{
	std::string descriptions[count];

	std::transform(
		array.begin(),
		array.end(),
		descriptions,
		[] (const T& element) { return element.description(); }
	);

	return StringUtils::joined(“, “, descriptions); // Some helper method we wrote to join strings
}

Notice the absence of any “constraint” on the template parameter T.  No such notion existed in C++ until C++20.  You don’t need to require that T implement some interface, because for each type T with which we instantiate this function, we get a totally separate compiled function.  If our code does this:

SomeType someTypeArray[10];
SomeOtherType someOtherTypeArray[20];
…
describeArray(someTypeArray);
describeArray(someOtherTypeArray);

Then the compiler creates and compiles the following two functions:

std::string describeArray(SomeType[] array, size_t count)
{
	std::string descriptions[count];

	std::transform(
		array.begin(),
		array.end(),
		descriptions,
		[] (const SomeType& element) { return element.description(); }
	}

	return StringUtils::joined(“, “, descriptions);
}

std::string describeArray(SomeOtherType[] array, size_t count)
{
	std::string descriptions[count];

	std::transform(
		array.begin(),
		array.end(),
		descriptions,
		[] (const SomeOtherType& element) { return element.description(); }
	}

	return StringUtils::joined(“, “, descriptions);
}

These are totally different.  They are completely unrelated code paths as far as the compiler/linker is concerned.  We could even specialize the template and write completely different code for one of the cases.

Both of these will compile fine as long as both SomeType and SomeOtherType have a const method called description that returns a std::string (it doesn’t even need to return std::string, it can return anything that is implicitly convertible to a std::string).  This is literal duck typing in action. Furthermore, declaring an “interface”, which in C++ is a class with only pure virtual methods, forces description to be dynamically dispatched, which forces every class implementing it to add a vtable entry for it.  If any such class has no virtual methods except this one, we suddenly have to create a vtable for those classes and add the vpointer to the instances, which changes their memory layout.  We probably don’t want that.

If I accidentally invoke describeArray with some class that doesn’t provide such a function, I get a very similar problem as our example with libraries.  It (correctly) fails to compile, but the error message we get is, uh… less than ideal.  It just tells us that it tried to call a method that doesn’t exist.  Any seasoned C++ dev knows that when you have template code several layers deep (and most STL stuff is many, many layers deep), mistakenly instantiating a template with a type that doesn’t fulfill whatever “contracts” the template needs results in some crazy error deep in the implementation guts.  You literally have to debug your code’s compilation (walk up a veritable stack trace of template instantiations) to find out what went wrong and why.  It sucks.  It’s the metaprogramming version of having terrible exception diagnostics and being reduced to looking at stack traces.

This even suffers a form of late error detection. Let’s say I write one template method where the template parameter T has no requirement to be “describable”, but in it I call describeArray with an array of T. This is an error that should be detected as soon as I compile after writing this method. But it won’t be detected until later, when I actually instantiate this template method with a non-describable parameter. It’s a compile-time error, but it’s still too late, and still in a sense detected when something is executed instead of when it is written (C++ templates are a metaprogramming stage: code that writes code, so templates are executed at build time to produce the non-templated C++ code that then gets compiled as usual).

And just like some sort of “static interface” like I proposed would help the compiler tell us the real error, so too does a compile-time contract fix this problem.  We need to introduce “constraints” like C#, but since templates are all compile-time, it’s not an ordinary interface/class.  It’s something totally different: a concept:

template<typename T> concept Describable = requires(const T& value)
{
	{ value.description() } -> std::string
}

Then we can use a concept as a constraint:

template<typename T> requires Describable<T> std::string describeArray(T[] array, size_t count)
…

But as long as we don’t have such, uh, concepts (heh) in other languages, we can use other techniques to emulate them, like “compilation tests”.  The solutions I gave are all a little awkward.  Naming two packages identically just to make them clash? Tests that just compile but don’t run?  Well, it’s either that, or all the force downcasting (because you know the downcast won’t fail, right?  Right?) you’ve had to do whenever you wrote an adapter layer.  I know which one I’d rather deal with.

Testing

Now, let’s talk about testing our application.  In our E2E tests, we’ll want to link in whatever Library the app is actually using.  But in more isolated unit tests, we’ll likely want to mock the Library.  If you use some mocking framework like Moq or Mockito, you know that the prerequisite to something being “mockable” is that it’s dynamically dispatched (an interface, abstract or at least virtual method).  You can try PowerMock or something similar, but that kind of runtime hacking isn’t always available, depending on the language/environment.  The problem is essentially the same.  When running for real, we want to call the real library.  When running in tests, we want to call a mock library.  That’s a variation.  Variations are typically solved with polymorphism, and that’s exactly what mocking frameworks do.  They take advantage of a polymorphic boundary to redirect your application’s calls to the mocking framework.

How do we handle mocking (or any kind of test double, for that matter) if we’re doing this kind of static compile/link time binding trick?

Interestingly enough, if you think about it, runtime binding is unnecessarily late in most cases here as well.  How often do you find yourself sometimes mocking a class and sometimes not?  If that class is part of your application, you’ll do that.  But an external class?  We’ll always mock that.  Every time our unit test rig runs, we’ll be mocking Library.  So when is the variation selected?  At build time!  We select it when we decide to build the test rig instead of building the application.

The solution looks the same.  We just create another version of the package that has mock classes instead of real ones:

public class LibraryStore
{
	LibraryEntity FetchEntity(string Id)
	{
		// Mocking stuff
	}
	
	void ProcessEntity(LibraryEntity Entity)
	{
		// Mocking stuff
	}
}

public class LibraryEntity
{
	string Id 
	{ 
		get
		{
			// Mocking stuff	
		}
	}
}

Unfortunately this would mean we have to reinvent mocking inside our classes.  A mocking framework exercises late binding in a more powerful way to allow it to record invocations and inject behavior.  We don’t need dynamic dispatch to do that, but we might still want it if we’re not going to rewrite or code-generate the mocking logic.  What if we made the “test” version of the library flexible enough to be molded by whatever test framework we want to use?

public class LibraryStore
{
	virtual LibraryEntity FetchEntity(string Id)
	{
		throw new Exception(“Empty stub”);
	}

	virtual void ProcessEntity(LibraryEntity Entity)
	{
		throw new Exception(“Empty stub”);
	}
}

public class LibraryEntity
{
	virtual string Id
	{
		throw new Exception(“Empty stub”);
	}
}

The “cleaner” choice would be to make the classes abstract, or even just make them interfaces:

public interface LibraryStore
{
	LibraryEntity FetchEntity(string Id);
	void ProcessEntity(LibraryEntity Entity);
}

public interface LibraryEntity
{
	string Id;
}

This will work fine unless we’re calling constructors somewhere, most likely for LibraryStore. In that case, the app won’t compile with this library linked in because it will be trying to construct interface instances. But what if we make it part of the contract that these classes can’t be directly constructed? Instead, they provide factory methods, and their constructors are private? That will grant us the flexibility to swap in abstract versions when we need them.

To add this to our interface definition “tests”, we would need to somehow test that the classes are not directly constructible. Testing negatives for compilation is tricky. You’d have to create a separate compilation unit for each negative case, and have your build system try to compile them and fail if compilation succeeds. Since you have to script that part, you might as well script the whole thing. You can write a “testFailsCompilation” script that takes a source file as an input, runs the compiler from the command line and checks whether it failed. In your project, the test suite would be composed of scripts that call the “testFailsCompilation” script with different source files.

That’s fine, but it probably doesn’t integrate with your IDE as well. There won’t be a convenient list of test cases with play buttons next to each one and a check mark or X box indicating which ones passed/failed in the last test run. Some boilerplate can probably fix that. If you can invoke the scripts from code, then you can write test cases to invoke the script and assert on its output. Where that might cause trouble is embedded programming (mobile development is an example) where tests can or must run on a different device, that may or may not have shell or permission to use it. Really, those tests are for the build stage, so you ought to define your test target to run on your build machine. If you can set that up, even your positive test cases will integrate better. Remember that what I suggested before were “test” methods that are actual [Test] cases. So they won’t show up in the list of tests either. If you split each one into its own source file, compile it with a script, then write [Test] cases that invoke each script, then you recover proper IDE integration. This will make hooking them into a CI/CD pipeline simpler as well.

With that, we have covered all the bases. Add this to your tool belt. If you need variation, you probably want polymorphism. But class hierarchies (including abstract interfaces) is only one specific type of polymorphism, which we can call runtime polymorphism. There are other types as well. What we discussed here is a form of static polymorphism. When you realize you need polymorphism, don’t just jump to the type we’re all trained to think of as the one type. Think about when the variation needs to be selected, and choose the polymorphism that makes decision at that time: no earlier, and no later.

What Type Systems Are and Are Not – Part 1

Introduction

Programmers are typically taught to think of the type system in languages (particularly object-oriented languages) as being a way to express the ontology of a problem.  This means it gives us a way to model whether one thing “is a” other thing.  The common mantra is that inheritance lets us express an “is a” relationship, while composition lets us express a “has a” relationship.  We see some common examples used to demonstrate this.  One is taxonomy: a dog is a mammal, which is an animal, which is a living creature.  Therefore, we would express this in code with a class LivingCreature, a class Animal that is a subclass of LivingCreature, a class Mammal that is a subclass of Animal, and a class Dog that is a subclass of Mammal.  Another one is shapes: a rectangle is a quadrilateral, which is a polygon, which is a shape.  Therefore, we similarly have a class hierarchy Shape -> Polygon -> Quadrilateral -> Rectangle.

There are problems with thinking of the type system in this way.  The biggest one is that the distinction between “is a” and “has a” is not as clear-cut as it is made out to be.  Any “is-a” relationship we can think of can be converted into a “has-a” expression, and vice versa.  A dog is a mammal, but that is equivalent to saying a dog has a class (a taxonomic class, as in “kingdom phylum class order…”, not a programming language class) that is equal to “mammal”.  A quadrilateral is a polygon, but that is equivalent to saying it has a side count, which equals 4.  We can define a rectangle with “has-a” expressions too: it has four angles, and all are equal to 90 degrees.

We can also express what we might typically model in code as “has-a” relationships with “is-a” statements as well.  A car has a color, which can equal different values like green, red, silver, and so on.  But we could also define a “green car”, which is a car.  We could, though we probably wouldn’t want to, model this with inheritance: define a subclass of Car called GreenCar, another called RedCar, another called SilverCar, and so on.

There are well-known examples of when something we might naturally think of as an “is-a” relationship turns out to be poorly suited for inheritance.  Probably the most common example is the “square-rectangle” relationship.  When asked if a square is a rectangle, almost everyone would answer yes.  And yet, defining Square to be a subclass of Rectangle can cause a lot of problems.  If I can construct a Rectangle with any length and width, I will end up with a Square if I pass in equal values, but the object won’t be an instance of Square.  I can try to mitigate this by making the Rectangle constructor private, and define a factory method that checks if the values are equal and returns a Square instance if they are.  This at least encapsulates the problem, but it remains that the type system isn’t really capturing the “squareness” of rectangles because the instance’s type isn’t at all connected to the values of length and width.

If these objects are mutable, then all bets are off.  You simply can’t model squareness with inheritance for the simple reason that an object’s type is immutable.  But if the length and width of a rectangle are mutable, then its “squareness” is also mutable.  A rectangle that was not a square can become a square, and vice versa.  But the instance’s type is set in stone at construction.  Most of the discussions I’ve seen around this focus on the awkwardness or arbitrariness of how to implement setLength and setWidth on a Square.  Should these methods even exist on Square, and if so, does setting one automatically set the other to keep them equal?  But I believe this awkwardness is a mere side effect of the underlying problem.  No matter what, if I can set the length and width of a Rectangle, I can make it become a Square, and yet it won’t be a Square according to the type system.  I simply can’t encapsulate the disconnect between the Square type and actual squareness (which is defined by equality of the side lengths) for mutable Rectangles.

So what do I do?  I use composition instead of inheritance.  Instead of Square being a subclass of Rectangle, Rectangle has a property isSquare, a bool, and it checks the equality of the two side lengths.  That way, the answer to the “is it a square” question correctly consults the side lengths and nothing else, and it can change during the lifetime of a particular rectangle instance.  This also correctly prevents me from even trying to “require” that a Rectangle is, and will be for the rest of its life, a Square; a promise that simply can’t be made with mutable Rectangles.

The same problem exists elsewhere on a type hierarchy of shapes.  If a Rectangle is a subclass of Quadrilateral, then the quadrilateral’s angles (or more generally, vertices) better not be mutable, or else we’ll end up turning a quadrilateral into a rectangle while the type system has no idea.  We have the same problem with immutable quadrilaterals that we would have to make the constructor private and check the angles/vertices in a factory method to see if we need to return a Rectangle (or even Square) instance.  The problem exists again if Quadrilateral is a subclass of Polygon.  If the number of sides/vertices is mutable, we can turn a Quadrilateral into a Triangle or Pentagon, and we’d similarly have to worry about checking the supplied vertices/sides in a factory method.

Types as Constness Constraints

We can see that, because whether something is an “is-a” or “has-a” relationship is not really a meaningful distinction (we can always express one as the other), this doesn’t tell us whether it is appropriate to express a concept using the type system of a programming language.  The square-rectangle example hints at what the real purpose of the type system is, and what criteria we should be considering when deciding whether to define a concept as a type.  The fundamental consideration is constness.  What do we know, 100%, that will never be different, at the time we write code?  When a certain block of code is executed, what are we sure is going to be the same every time it executes, vs. what could be different each time?  The type system is a technique for expressing these “known to always be the same” constraints.

Such constraints exist in all programming languages, not just compiled languages.  This is why I must resist the temptation to refer to them as “compile-time constants”.  There may be no such thing as “compile time”.  Really, they are author-time constraints.  They are knowledge possessed at the time the code is written, as opposed to when it is executed.

These constraints come in several forms.  If I call a function named doThisThing, I am expressing the knowledge, at authorship time, that a function called doThisThing exists.  Furthermore, this function has a signature.  It takes n parameters, and each of those parameters has a type.  This means every time the doThisThing function is executed, there will be n parameters of those types available.  This is a constant across every execution of the function.  What varies is the variables: the actual values of those parameters.  Those can be different every time the function executes, and therefore we cannot express the values as authorship-time constraints.

Where function signatures express what we know at authorship time about the invocation of a code block, the type system is a way of expressing what we know about a data block.  If we have a struct SomeStruct, with three members firstMember, secondMember and thirdMember of types int, float and string respectively, and we have an instance of SomeStruct in our code, we are saying that we know, for every execution of that code, that it is valid to query and/or assign any of those three members, with their respective types.  The other side of this coin is we know it is invalid to query anything else.  If we write someStruct.fourthMember, we know at authorship time that this is a programming error.  Fundamentally, we don’t have to wait to execute the code and have it throw an exception to discover this error.  The error exists and is plainly visible in the written code, simply by reading it.  The type system provides a parsable syntax that allows tools like the compiler to detect and report such an error.

Inheritance vs. Composition

The implication of this is that the fundamental question we should be asking when deciding to model something with a type is: are we modeling an authorship-time known constant?  If the need is to create a constraint where something is known to be true every time code is executed, the type system is the way to do that.  Inheritance and composition represent different levels of constness.  Or, rather, they represent different aspects of constness.  If I define a GreenCar as a subclass of Car, I am creating the ability to say I know this car is green, and will always be green.  If I instead define a Car to have a member called color, then I am saying I know that every car always has a color, but that color can change at any time.

What can I do with the composition approach that I can’t do with the inheritance approach?  I can change the color of a car.  What can I do with the inheritance approach that I can’t do with the composition approach?  Well, the exact opposite: I can guarantee a car’s color will never change, which allows me to do things like define a function that only takes a green car.  I can’t do that with the composition approach because I would be able to store a passed-in reference to a GreenCar, then later someone with the equivalent reference to the Car could change its color.

The more “const” things are, the more expressive I can be at compile time.  On the other hand, the less I can mutate things.  Both are capabilities that have uses and tradeoffs.  The more dynamic code is, the more it can do, but the less it can be validated for correctness (another way to think of it is the more code can do, the more wrong stuff it can do).  The more static code is, the less it can do, but the more it can be validated for correctness.  The goal, I believe, is to write code that is just dynamic enough to fulfill all of our needs, but no more, because extra dynamism destroys verifiability with no advantage.  It is also possible to allocate dynamism in different ways.  If we need code to be more flexible, instead of relaxing the constraints on existing code, we can create new code with different constraints, and concentrate the dynamism in a place that decides, at runtime, which of those two codepaths to invoke.

Now, you’re probably thinking: “that’s not why I would chose not to model a car’s color as types”.  The obvious reason why that would be insane is because the number of colors is, at least, going to be about 16.7 million.  So to exhaustively model color in this way, we’d have to write literally tens of millions of subclasses.  Even if we delegated that work to code generation, that’s just an absurd amount of source code to compile (and it would probably take forever).  It simply isn’t practical to do this with the type system.

Problems of practicality can’t be ignored, but it’s important to understand they are different than the problem of whether it would correctly express the constness of a constraint to model it with types.  This is because practicality problems are problems with the expressiveness of a language, including its type system.  These problems are often different across different languages, and can be eliminated in future iterations of languages in which they currently exist.  If it is simply inappropriate to use the type system because something is genuinely dynamic, this isn’t going to change across languages or language versions, nor is it something that could be sensibly “fixed” with a new language version.

To illustrate this, it is possible to practically model a color property with types in C++.  C++ has templates, and unlike generics in other languages, template parameters don’t have to be types.  They can be values, as long as those values are compile-time constants (what C++ calls a constexpr).  We can define a color as:

struct Color
{
  uint8_t r, g, b;
};

Then we can define a colored car as:

template<Color CarColor> class ColoredCar : public Car
{
  …
  Color color() const override { return CarColor };
  …
};

Then we can instantiate cars of any particular color, as long as the color is a constexpr:

ColoredCar<Color {255, 0, 0}> redCar;

We will presumably have constexpr colors defined somewhere:

namespace Colors
{
  static constexpr Color red {255, 0, 0};
  static constexpr Color yellow {255, 255, 0};
  static constexpr Color green {0, 255, 0};
  static constexpr Color blue {0, 0, 255};
  …
} 

Then we can write:

ColoredCar<Colors::red> redCar;

We can define a function that only accepts red cars:

function acceptRedCar(const ColoredCar<Color::red> redCar);

Of course none of this makes sense unless color is a const member/method on Car.  If a car’s color can change, it’s simply wrong to express it as a type, which indicates an immutable property of an object.

This kind of thing isn’t possible in any other language I know of.  So if you aren’t working in C++, you simply can’t use the type system for this, for the simple reason that the language isn’t expressive enough to allow it.  It may be a different type of problem, but it’s a problem nonetheless.  So the decision to use the type system to express something must take into account both whether it the thing being expressed is really an authorship-time constant, and whether the language’s type system is expressive enough to handle it.

Even in C++, there are other problems with modeling color this way.  Just like with the square-rectangle, if we construct a Car (not a ColoredCar), whose color happens to be red, it doesn’t automatically become a ColoredCar<Colors::red>.  A dynamic cast from Car to ColoredCar wouldn’t inspect the color member/method as we might expect it to.  We would have to ensure at construction that the correct ColoredCar type is selected with a factory method.  Now, there are likely other properties of a car we might want to model this way.  I often use cars as a way to demonstrate favoring composition over inheritance.  A car has a year, make, model, body type, drivetrain, engine, and so on.  Notice I said has a.  I could also say a car is a 2010 Honda Civic sedan, automatic transmission, V4.  The “classic” reason to avoid inheritance is the fact that “is-a” relationships are very often not simple trees.  We would need full-blown multiple inheritance, which would cause a factorial explosion of subtypes.  If I wanted to model all these aspects of a car with inheritance, I would really need something like:

template<uint Year, CarMake Make, CarModel<Make> Model, CarBodyType BodyType, TransmissionType TransmissionType, Engine Engine, Color CarColor> TypedCar : public Car
{
  ...
}

That’s already a pretty big mess, and it’s only that tidy (which isn’t so tidy) thanks to the unique power of C++ templates.  In any other language, you’re now talking about writing the classes 2010RedHondaCivicSedanAutoV4, 2011RedHondaCivicSedanAutoV4, 2010GreenHondaCivicSedanAutoV4, 2010RedHondaAccordSedanAutoV4, and so on more or less ad infinitum.  God forbid you’d actually start going down this path before you realize the lack of multiple inheritance blows the whole thing up even while ignoring the utter impracticality of it.

But this isn’t really enough.  With this two-level inheritance structure, I can either say to the compiler “I know this object is a car, and I know nothing else about it”, or “I know everything about this car, including its year, make, model, etc.”.  To be fully expressive I would want to be able to model between these extremes, like a car where I know its make, model and color, but I don’t know anything else.  I don’t even think C++ templates can generate the full web of partially specialized templates inheriting each other that you’d need for this (though I’m hesitant to assert that, I’ve been consistently shocked at what can be done, albeit in ridiculous and circuitous fashions, with C++ templates.  That doesn’t really change the conclusion here though).

So, long story short, it’s probably never a good idea to model a car’s color, or any of its other attributes, even if they are const, with the type system, because it’s not practical.  However, I want to emphasize that this really is a problem of practicality.  Future languages or language versions may become expressible enough to overcome these limitations. The point is we need to augment our “do I know this now, at authorship time, or only at runtime?” question, particularly when the answer “I know it at authorship time”, with another question: is it practical, given the language capabilities, to express a constraint with the type system? If the answer is no, then we fall back to composition and must sacrifice the automatic authorship time validation. We can instead do the validation with an automated test.

Demonstrating the Concepts

Okay, after all that theory, let’s go through an example to illustrate what the real lesson is here.  The type system is a tool available to you to turn runtime errors into compile time errors.  If there’s ever a point in your code where you need to throw an exception (and throwing an exception is definitely better than trying to cover up the problem and continue executing), think carefully about whether the error you’re detecting is detectable at authorship-time.  Are you catching an error in an external system, that you genuinely don’t control, or are you catching a programming error you might have made somewhere else in your code?  If it’s the latter, try to design your type system to make the compiler catch that error where it was written, without having to run the code.

If you ever read the documentation for a library and you see something like this:

You must call thisMethod before you call thisOtherMethod.  If you call thisOtherMethod first, you will get a OtherMethodCalledBeforeThisMethod exception

That’s a perfect example of not using the type system to its full advantage.  What they should have done was define one type that has only thisMethod on it, whose return value is another type that has only thisOtherMethod on it.  Then, the only way to call thisOtherMethod is to first call thisMethod to get an instance of the type that contains thisOtherMethod.

Let’s say we have a File class that works this way:

class File
{

public:

  File(const std::string& path); // Throws an exception if no file at that path exists, or if the path is invalid

  void open(bool readOnly); // Must call this before reading or writing

  std::vector<char> read(size_t count) const; // Must call open before calling this method;

  void write(const std::vector<char>& data); // Must call open with readOnly = false before calling this method

  void close(); // Must call when finished to avoid keeping the file locked.  Must not call before calling open
};

Now, let’s list all of the programming errors that could occur with using this class:

  1. Calling the constructor with an invalid path
  2. Calling the constructor with the path for a nonexistent file
  3. Calling read, write or close before calling open
  4. Calling write after calling open(true) instead of open(false)
  5. Calling read, write or open after calling close
  6. Not calling close when you’re done with the file

Think about all of these errors.  How many are genuinely runtime errors that we cannot know exist until the code executes?  There’s only one.  It’s #2.  #1 might be a runtime error if the path is built at runtime.  If the path is a compile-time constant, like a string literal, then we can know at compile-time if it’s an invalid path or not.  We can’t know whether a file at a particular valid path exists at the time of execution until we actually execute the code, and it can be different each time.  We simply must check this at runtime and emit a runtime error (exception) appropriately.  But the rest of those errors?  Those are all authorship-time errors.  It is never correct to do those things, which means we know at the time the code was written we did something wrong.

So, let’s use the type system to turn all of those errors, except #2, and #1 for dynamically built paths, into compile time errors.

First, let’s consider #1.  The underlying issue is that not every string is a valid file path.  Therefore, it’s not appropriate to use std::string as the type for a file path.  We need a FilePath type.  Now, we can build a FilePath from a string, including a dynamically built string, but we might not end up with a valid FilePath.  We can also build a FilePath in a way that’s guaranteed (or at least as close as the language allows) to be valid.  A valid file path is an array of path elements. What makes a valid path element depends on the platform, but for simplicity let’s assume that it’s any string made of one or more alphanumeric-only characters (ignoring drive letters and other valid path elements that can contain special characters). We can therefore define a FilePath as constructible from a std::vector of path elements:

class FilePath
{

public:

  FilePath::FilePath(const std::vector<FilePathElement>& pathElements) : _pathElements(pathElements) { }

  std::string stringValue() const
  {
    // Code to join the path elements by the separator “/“
  }

private:

  std::vector<FilePathElement> _pathElements;
};

Now, for the path elements, the tricky part is checking at compile-time that a constexpr string (like a string literal) is nonempty and alphanumeric.  I won’t go into the implementation, but the signature would look like this:

constexpr bool isNonEmptyAndAlphaNumeric(constexpr std::string& string);

Then, we would use this in the constructor for FilePathElement:

FilePathElement::FilePathElement(constexpr std::string& string)
{
  static_assert(isNonEmptyAndAlphaNumeric(string), “FilePathElement must be nonempty and alphanumeric”);
}

If you aren’t familiar with some of this C++ stuff, static_assert is evaluated at compile time, and therefore will cause a compiler error if the passed in expression evaluates to false.  This of course means the expression must be evaluatable at compile time, which is what the constexpr keyword indicates.  The constexpr on the string parameter indicates the passed in string must be a compile time constant.  That’s the only way we could possibly check that it’s alphanumeric at compile time.

We sometimes will need to construct a FilePathElement out of a dynamically built string.  But since we can’t confirm at compile time, we instead do a runtime check, and if needed create a runtime error:

FilePathElement::FilePathElement(const std::string& string)
{
  if(!isNonEmptyAndAlphaNumeric(string))
    throw std::invalid_argument(“FilePathElement must be nonempty and alphanumeric”);
}

Now we can define constructors for a FilePath that take strings.  Since we can’t know statically if it’s valid, it needs to throw a runtime exception:

FilePath::FilePath(const std::string& string)
{
  // Try to split the string into a vector of strings separated by “/“
  std::vector<std::string> strElements = splitString(string, “/“);

   // Try to convert each string element to a FilePathElement.  If any of them are invalid, this will cause an exception to be thrown
  std::transform(
    strElements.begin(), 
    strElements.end(), 
    std::back_inserter(_pathElements), 
    [] (const std::string& element) { return FilePathElement(element); // This constructor might throw }
  );
}

If you have no idea what’s going on in that std::transform call, this is just the STL’s low-level way of doing a collection map. It’s equivalent to this in Swift:

_pathElements = strElements
  .map({ strElement in 
    
    return FilePathElement(strElement); // This constructor might throw
  });

You might be thinking: couldn’t we make a FilePath constructor that takes a constexpr string and validates at compile time that it can be split into a vector of FilePathElements?  Maybe with C++17 or C++20, or earlier depending on how you to do it.  C++ is rapidly expanding the expressivity of compile-time computations.  Anything involving containers (which traditionally require heap allocation) at compile time is a brand spanking new capability.

Now, we can form what we know at compile-time is a valid FilePath:

FilePath filePath = {“some”, “location”, “on”, “my”, “machine”};

If we did this:

FilePath filePath = {“some”, “location?”, “on”, “my”, “machine”};

Then the second path element would hit the static_assert and cause a compiler error.

Okay, so what if we’re not using C++?  Well, then you can’t really prove the path elements are nonempty and alphanumeric at compile time.  You just have to settle for run-time checks for that.  You can at least define the FilePath type that is built from a list of path elements, but you can’t get any more assurance than this. The language just isn’t expressive enough. That’s an example of the practicality problem. Due to limitations of the language, we can’t bake the validity of a path element’s string into the FilePathElement type, and we therefore lose automatic compile errors if we accidentally try to create a file path from an invalid string literal. If we want static validation, we need to write a test for each place we construct path elements from string literals to confirm they don’t throw exceptions.

Okay, the next problem is #2.  That’s an inherently runtime problem, so we’ll deal with it by throwing an exception in the constructor for File.

Moving on, all the methods on File are always supposed to be called after open.  To enforce this statically, we’ll define a separate type, let’s say OpenFile, and move all the methods that must be called after open onto this type.  Then we’ll have open on File return an OpenFile:

class File
{

public:

  File(const std::string& path);

  OpenFile open(bool readOnly) 
  {
    // Code to actually open the file, i.e. acquire the lock 
    return OpenFile(*this, readOnly); 
  } 
};

class OpenFile
{

public:

  std::vector<char> read(size_t count) const;

  void write(const std::vector<char>& data); // Must have been called with readOnly = false

  void close(); // Must call when finished to avoid keeping the file locked.

  friend class File;

private:

  OpenFile(File& file, bool readOnly) : _file(file), _readOnly(readOnly) { }

  File& _file;
  bool _readOnly;
};

Notice that the constructor for OpenFile is private, but it makes File a friend, thus allowing File, and only File, to construct an instance of OpenFile.  This helps us guarantee that the only way to get ahold of an OpenFile is to call open on a File first.  Note that we can do the actual work to “open” a file (i.e. acquire the underlying OS lock on the file) in the constructor for OpenFile, instead of in the open method.  That’s an even better guarantee that this work must happen prior to being able to read, write or close a file.  Then, it won’t really matter if we make the constructor private, and open would just be syntactic sugar.

Now we have a compile-time guarantee that a file gets opened before it gets read/written to/closed.  We still have the problem that even if we pass in the literal true for readOnly and then call write, the failure will happen at runtime.  We need to move this to compile-time failure.  One idea would be to use constness for this purpose.  After all, we have already made read a const method and write a non-const method.  However, since const classes in C++ aren’t quite full-blown types (particularly, we can’t define a “const” constructor), this won’t really work here.  We need to make two separate types ourselves.  Then, we can split the open method into two variants for read-only and read-write:

class File
{

public:

  File(const std::string& path);

  ConstOpenFile openReadOnly() { return ConstOpenFile(*this); }
  OpenFile openReadWrite() { return OpenFile(*this); }
};

class ConstOpenFile
{

public:

  ConstOpenFile(File& file) : ConstOpenFile(file, true) { }

  std::vector<char> read(size_t count) const;

  void close(); // Must call when finished to avoid keeping the file locked.

protected:

  ConstOpenFile(File& file, bool readOnly) : _file(file)
  {
    // Code to acquire the lock on the file.  We pass in readOnly so we can acquire the right kind of lock
  }

  File& _file; // Do we even need to store this anymore?
};

class OpenFile : public ConstOpenFile
{

public:

  OpenFile(File& file) : ConstOpenFile(file, false) { }

  void write(const std::vector<char>& data);
};

We’re almost finished.  The remaining problems are preventing read, write and open from being called after calling close, and making sure close gets called when we’re done. Both of these problems boil down to: calling close must be the last thing that happens with an OpenFile. This is a perfect candidate for RAII.  We’re already acquiring a resource in initialization (a.k.a. construction).  This means we should release the resource in the destructor:

class ConstOpenFile
{

public:

  ConstOpenFile(File& file) : ConstOpenFile(file, true) { }
  ~ConstOpenFile()
  {
    // Code to release the lock on the file
  }

  std::vector<char> read(size_t count) const;

protected:

  ConstOpenFile(File& file, bool readOnly) : _file(file)
  {
    // Code to acquire the lock on the file.  We pass in readOnly so we can acquire the right kind of lock
  }

  File& _file;
};

By closing the file in the destructor (and only in the destructor), we’re making the guarantee that this won’t happen while we still have an instance on which to call other stuff like read and write (we could have a dangling reference, but that’s a more general problem that is solved with various tools to define and manage ownership and lifecycles), and the guarantee that it will happen once we discard the instance.

In reference counting languages like Obj-C and Swift, we can do this in the deinit.  In garbage collect languages like C# and Java, we don’t have as ideal of a choice.  We shouldn’t use the finalizer because it won’t get called until much later, if at all, and that will result in files being left open (and therefore locked, blocking anyone else from opening them) long after we’re done using them.  The best we can do is implement IDisposable (C#) or AutoCloseable (Java), and make sure we remember to call Dispose or close, or wrap the usage in a using (C#) or try-with-resources (Java) block.

And now, all those programming errors that can be detected at compile time, are being detected at compile time.

This is how you use the type system.  You use it to take any constraint you have identified is always satisfied every time your code gets executed, and express it in a way that allows it to be statically verified, thereby moving your runtime errors earlier to authorship-time.  The ideal we aim for is to get all program failures into the following two categories:

  • Failures in external systems we don’t and can’t control (network connections, other machines, third party libraries, etc.)
  • Static (compilation/linting) errors

In particular, we aim to convert any exception that indicates an error in our code into an error that our compiler, or some other static analysis tool, catches without having to execute the code. The best kinds of errors are errors that are caught before code gets executed, and one of the main categories of such errors are the ones you were able to tell the type system to find.

Examples Aren’t Specifications

“Specification by Example” is a particular implementation strategy of Behavior-Driven Development. The central justification, as far as I can tell, is expressed in the following snippet from the Wikipedia page:

Human brains are generally not that great at understanding abstractions or novel ideas/concepts when first exposed to them, but they’re really good at deriving abstractions or concepts if given enough concrete examples

Ironically enough, this is immediately followed with “citation needed”.

Anyone with experience teaching math will immediately understand what is off about this statement. The number of people who can see the first few numbers in a sequence, and deduce from that the sequence itself, is much much smaller than the number of people who can understand the sequence when they see it. If I endeavored to teach you what a derivative is by just showing you a few examples of functions and their derivatives, I would be shocked if you were able to “derive” the abstraction that way.

It’s not just a matter of raw intelligence. It is true that only the highly intelligent can engage in this kind of pattern recognition. This is, in fact, exactly what an IQ test is. But the bigger problem is that multiple sequences can have the same values for the first few elements. It is quite simply not enough information to deduce a sequence from a few examples of its elements.

Examples help illustrate an abstraction, and thereby make it easier to understand. First I present an abstraction to you. I explain that a derivative measures the rate of change of a function, in the limit as the change goes to zero. Then I show you examples to help you grasp it. I don’t do it the other way around, and I certainly don’t skip the part where I explain what a derivative is, and hope by simply seeing a few derivatives you’ll realize what I’m showing you.

The “specification by example” practices I’ve seen all recognize that it would be a terrible idea to have developers truly try to derive the product specification from examples. All of them supplement the examples with actual statements of the abstractions. They do what I said: explain a “rule”, then follow it with examples to help illustrate the rule. But then, out of some kind of confusion, the insistence is then to enshrine the examples as “the specification”, instead of the rules.

A good overview of how this practice is fit into BDD is given here. The practice of “example mapping” is applied to generate Gherkin scenarios for concrete examples of behavior. The essential practice is that Gherkin is written exclusively for concrete examples, and not for abstract rules.

Let’s go back to the Wikipedia article to see how a few cases of sleight-of-hand are applied in order to justify this. From the article:

With Specification by example, different roles participate in creating a single source of truth that captures everyone’s understanding.

This is, in fact, an argument for something completely different: elimination of the overlapping and largely redundant documents that different roles of a development organization maintain. It has nothing whatsoever to do with expressing specifications through concrete examples. A “single source of truth” is equally possible with specifications expressed directly. In fact, doing so is far better in this sense, because no interpretative burden is left on developers to get from what is documented to what is specified. Specifying directly by abstractions avoids each reader of the concrete examples deriving his own personal “source of truth” about what the examples mean.

We see this kind of thing a lot. The justification for scrum teams and ceremonies, apparently, is that it keeps manual testing load low. No, that’s the the justification for test automation. That has nothing to do with scrum teams. It is a very common practice to try to “trojan horse” some novel concept in by attaching it to another, unrelated and generally already widely lauded practice. Avoiding redundant documentation is already a good idea. It is not a reason to adopt an entirely unrelated practice of specification by example.

Continuing:

Examples are used to provide clarity and precision, so that the same information can be used both as a specification and a business-oriented functional test.

Examples don’t provide precision, they provide clarity at the expense of precision. This is the fundamental point of confusion here. Examples are not specifications. I can provide these examples of a business rule:

“If John enters a 2 digit number into the field and tries to submit it, he is unsuccessful”

“If John enters a 5 digit number into the field and tries to submit it, he is successful”

“If John enters an 8 digit number into the field and tries to submit it, he is unsuccessful”

There are so many ways to interpret what I’m really getting at with these examples, I can’t list them all. Is the rule that the number of digits must be between 3 and 7? 4 and 6? Exactly 5? No one would dare hand only this to programmers and expect them to produce the desired software. That’s why every “specification by example” system supplements these examples with an actual rule like, “the number of digits must be between 3 and 6”.

The imprecision of examples is exactly why they can’t be specifications. Examples don’t specify. They exemplify.

As for “business-oriented test”, that’s BDD and TDD. The specification should be the business requirement, not some technical realization of that requirement. The requirement should be tested, preferably with an automated test. None of that requires the specification to be expressed through concrete examples.

Continuing:

Any additional information discovered during development or delivery, such as clarification of functional gaps, missing or incomplete requirements or additional tests, is added to this single source of truth.

It is? Why? What makes that happen? Does specifying by example force developers to go back and amend the requirements when they discover something new? Of course not. Maybe they will, maybe they won’t. Hopefully they will. It’s good practice to document a discovery that led to a code change in the requirements. This has nothing to do with whether requirements are expressed directly (abstractly) or indirectly (through concrete examples).

Continuing:

When applied to required changes, a refined set of examples is effectively a specification and a business-oriented test for acceptance of software functionality. After the change is implemented, specification with examples becomes a document explaining existing functionality. As the validation of such documents is automated, when they are validated frequently, such documents are a reliable source of information on business functionality of underlying software. To distinguish between such documents and typical printed documentation, which quickly gets outdated,[4] a complete set of specifications with examples is called Living Documentation.

This is an argument for tests as specifications, wherein the specifications are directly used by the CI/CD system to enumerate the test suite. The problem is that “a refined set of examples” cannot effectively be a specification. The author of this paragraph actually understands this. That’s why he says “specification with examples” (emphasis mine), instead of “specification by examples”, which is what this article is supposed to be advocating for. That one change in preposition completely alters what is being discussed, and completely undermines their case.

There are multiple (usually infinitely many) specifications that would align with any finite set of examples. Concrete examples simply don’t map 1-1 to abstractions. There’s a reason why the human mind employs abstractions so pervasively. I can’t tell you I’m hungry by pointing to a girl who is eating, and also sitting at a table, and also reading something on her phone (am I telling you I’m hungry, or that I want to sit down, or that I want to play on my phone, or that I think that girl is cute?)

Like I keep saying, everyone actually knows this is nonsense. If anyone really believed “specification by example” were possible, they would deliver only a set of examples to a development team and tell them to get working. They don’t do that. In the Cucumber world of “example mapping”, the actual acceptance criteria are of such critical importance, they are elevated to the same first-class citizen status as examples, and placed directly into the .feature files.

The rules are placed in .feature files as commented out, unexecutable plain English. If those rules change, well, maybe someone will go update that comment. We could completely overhaul the actual scenarios, the Gherkin (which is executable code and entails at least some sort of code change), and not touch the rules, and everything would be fine. These rules don’t explain existing functionality at all. They’re just comments, and you can write anything in them. They’re as bad as code comments for documentation.

By sticking with the practice of writing Gherkin for examples, instead of rules, the Gherkin ceases to be the specification. That’s why the feature files have to be augmented with a bunch of plain English. That English is actually the specification. All that’s happening here is that the benefits of a DSL like Gherkin are not exploited. The specifications are written in English, which is ambiguous, vague and imprecise (particularly in the way most people use it). To whatever extent the examples help resolve these ambiguities (especially when those examples are written in Gherkin), it would be far more effective to write the rules in Gherkin. The whole point of Gherkin is that English is too flexible and imprecise of a language with which to express software specifications. Writing Gherkin in a way that requires it to be supplemented with plain English negates its benefit.

My point is not that examples are unhelpful. Quite to the contrary, examples are extremely helpful, and often crucial in arriving at the desired abstractions. But “specification by example” assigns an entirely inappropriate role to examples. The primary role of examples is to motivate the discovery of appropriate specification. Examples stimulate people, particularly the ones who define the specifications, to think more carefully about what exactly the specifications are. A counterexample can prove that a scenario is too generic, and that a “given” needs to be added to constrain its scope.

Let’s return to the initial quote from the article. In my experience, the inability to understand abstract specifications is a nearly nonexistent problem in software development. I don’t ever remember a case where a requirement was truly specified in unambiguous terms, and someone simply drew a blank while reading it (or even just misinterpreted it, which would require an objectively wrong reading of the words). Instead, requirements are vague, unclear, ambiguous, confusing, and incomplete. Here’s an example:

When the user logs in, he should see his latest activities

What does that mean exactly? What counts as an “activity”? How many of the latest ones should he see? Is there a maximum? How are they displayed to the user? How should they be ordered?

The problem here isn’t that this requirement is so mind-blowing that we need to employ the tactics of a college level math lecture to get anyone to comprehend it. The information simply isn’t there. This is a lousy requirement because it isn’t specific, which means it isn’t a specification.

Really, what the supplemental examples do is fill in information that is missing in the rule. I can supplement this requirement with an example:

Given a user Sally
Given Sally has made 3 purchases
Given Sally has changed her delivery address 2 times
Given Sally has changed her payment info 3 times
When Sally logs in
Then Sally sees, in descending order by date, her 3 purchases, 2 delivery address changes, and 1 payment info change

Okay, this example is hinting at more information. A purchase, a delivery address change, and a payment info change, are all examples of “activities”. Great. That was a missing detail in the “rule”. It specifies an ordering. It also seems that the recent activity is limited to 6 items. That was also a missing detail in the rule.

But I can interpret that differently. Maybe the rule is that there is no limit to the total number of activities shown, but there is a limit to only show 1 payment info change. Both of those rules fulfill this example. We need the actual rules.

Relying on examples in this manner is just a way to get by with vague and incomplete “rules”. In fact, if there is ever a perceived need to supplement a rule with examples, that is a very reliable proof that the rule is incomplete and needs to be improved. We can take the fact that the rule is enough on its own, no examples needed, as a bellwether for the completeness and specificity of the rules.

Making the examples the target for Gherkin, which is what turns into your acceptance tests, completely fails as a BDD/TDD mechanism. The fundamental process of development driven by behaviors and tests is that you don’t touch the production code unless and until, and only to the minimal extent that, a failing test requires it. If you’re only writing tests for specific examples, the minimum work you need to do to make those tests pass is to satisfy the examples, not the rules.

I could write code that simply hardcodes 3 purchases, 2 address changes and 1 payment info change into the “activities” view on the home screen. Doing so would almost certainly be easier than fetching the logged in user’s real list of activities, parsing and truncating them. That would make this test pass. Even if there are a couple more examples with different sets of example activities, I could still get away with hardcoding them. And to the extent that the examples are our “documentation”, this is correct. But I know that’s not what I am supposed to be doing, so eventually, even though all the tests are passing, I have to go in and start messing with production code to make it do what I understand is really what we want it to do. In this workflow, acceptance tests simply aren’t the driving force of the production code, in any sense. They revert to the old role of tests as merely being verifications.

(This hints at a bigger discussion about whether tests, even under the hood, should ever use hardcoded stub data. Doing so always risks a false positive when the production code also hardcodes the same data, but it’s a very common and quick-to-stand-up method of testing. If this is an implementation detail of the tests, at least the test self-documents that this hardcoded data isn’t part of the test definition, or the requirement, which is certainly much better than a test in which arbitrary hardcoded stub data is right there in the test and requirement definition. The problem of “I can make this test pass by hardcoding the same data in production code” is still present, but arguably at a much smaller risk of occurring, because it’s clear from reading the test that those stubbed values are fake and private to the test implementation. If you want to fully eliminate this problem, you should randomly generate the fake data as part of executing the test.)

The fact that different people come away with different understandings of what exactly the requirement means does not point to some defect in the human brain’s ability to comprehend abstractions. It points to a defect in the language of the requirement, which genuinely does not specify exactly what the requirement is. The pervasive problem is vague requirements, not developers who can’t understand what the product owner wants. The problem is a language problem, not a comprehension problem. That’s why the solution is a domain-specific language (Gherkin), not a brain transplant.

Examples are fine, and they can help. But they don’t get rid of the problem that the plain English business rule, I can almost guarantee you, is vague and ambiguous. Even if the ritual of communal exemplification causes the participants to all reach a shared understanding, it’s not going to help the next guy who comes along. And in case this isn’t clear, specifications are documentation, and documentation lives longer than any particular team member’s involvement. The whole point of the “document” is to be the thing anyone reads when they need to get an answer to something.

No one really believes examples can be the documentation. So when you insist on your Gherkin being only for examples, you necessitate plain English documentation. Is it better to have plain English documentation plus examples in Gherkin, than to just have plain English documentation? I’m sure it is. But both are far, far inferior to having all Gherkin documentation (the major exception here is visual requirements, in which case a picture is literally worth a thousand words. Visual requirements, aka fonts, colors, spacing, sizes, etc., are best expressed visually). The point of this is to produce true, precise specifications of the product. Plain English specifications aren’t precise enough, and examples (in any language) are even worse in this sense. Keeping examples around only allows you to get away with incomplete specifications. You shouldn’t need examples to supplement rules. The rules should be expressive and clear enough on their own. You can use examples to help arrive at that clear, expressive rule. Once you do, the examples are scaffolding and they can, and probably should, be torn down.

Specifications define. Examples illustrate. It will cause nothing but trouble to confuse the two.